-
Notifications
You must be signed in to change notification settings - Fork 7
Arquivo.pt in a nutshell: overview of services and activities
Arquivo.pt is a public and free service that enables anyone to search and access historical information preserved from the Web since the 1990s. Arquivo.pt contains billions of files collected from websites in several languages (about half of its users come from outside of Portugal).
Periodically, the Arquivo.pt system automatically collects and stores information published on the web. The Arquivo.pt hardware infrastructure is hosted at its own data-center and it is managed by full-time dedicated staff. The preservation workflow is performed through a large-scale information system distributed over about 100 servers.
The search services provided by Arquivo.pt include full-text search, image search, version history listing, advanced search and application programming interfaces (API) that facilitate the development of added-value applications by third parties. The preserved web information can also be automatically processed to perform Big Data research activities through a distributed processing platform for unstructured data (Hadoop), for instance for automatic web spam detection or to assess web accessibility for people with disabilities. Arquivo.pt has been also used to support research & development in areas such as Humanities or Social Sciences.
Arquivo.pt has been raising awareness about the importance of web preservation. It issued a set of recommendations to develop preservable sites and has been promoting free training sessions. The acquisition of skills on web preservation by professionals such as web developers enables them to deliver higher quality solutions to their clients, such as developing websites robust to link rot by implementing the arquivo404 mechanism that redirects broken links to web-archived versions of pages.
The Arquivo.pt Memorial preserves historical websites. However, anyone can suggest a website to be preserved or archive immediately a given web page.
Thematic exhibitions and collaborative collections have been developed to illustrate the utility of web archives as source of historical documentation. A list of all the collections preserved by Arquivo.pt is publicly available. The data sets generated to create these exhibitions or derived from the operation of the service are openly available.
The Arquivo.pt Award aims to annually promote innovative works based on the historical information preserved by Arquivo.pt.
Developing a web archive raised significant challenges in areas such as Web Archive Information Retrieval, User Experience or Quality Assurance. The members of the Arquivo.pt team have been publishing technical and scientific articles related to web archiving in open-access since 2008, including the book The Past Web: Exploring Web Archives (Green Open Access). All the developed software is available as free open source projects.
If you want to receive the most important news about Arquivo.pt subscribe our international mailing list.