Today we’re releasing WARC-GPT: an open-source, highly-customizable Retrieval Augmented Generation tool the web archiving community can use to explore the intersection between web archiving and AI. WARC-GPT allows for creating custom chatbots that use a set of web archive files as their knowledge base, letting users explore collections through conversation.
Using WARC-GPT, you can ask specific questions in natural language against a collection of WARC files. Rather than relying on keyword searches and metadata filters to sort through search results, WARC-GPT provides a new starting point for search using multi-document full-text search with summarization to explore the contents of web archives. WARC-GPT lists the sources used to generate the response and relevant text excerpts, which you can use to verify the information provided and identify points of interest within a collection of web archives.
This project is part of our ongoing series exploring how artificial intelligence changes our relationship to knowledge. The release of this experimental software will help us understand whether and how AI can help access and uncover web archives’ contents. Of course, this is simply a prototype – see our disclaimer in the repo.
If you want to run WARC-GPT yourself, head over to Github for installation instructions. If you are a library professional, researcher or tinkerer who wants to understand Retrieval Augmented Generation and how it can apply to specialized digital archives like web archives, read on.
Read more of "WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI"