Wasteback Machine is a JavaScript library for analysing archived web pages, measuring their size and composition to support retrospective, quantitative web research.
- Archive-agnostic access: Works with web archives that use the Memento Protocol and expose the unmodified archived page via the id_ endpoint.
- Page composition analysis: Analyses the full structure of an archived page, including HTML, stylesheets, scripts, images, fonts, and more.
- Resource inventory: Produces an optional structured list of all discovered resources with their URLs, types, and byte sizes.
- Byte-accurate measurement: Precisely measures the size of each resource, cleans stylesheets and scripts to remove archive-injected content, and excludes any resources that are not part of the original page.
- Completeness scoring: Calculates how completely an archived page and its resources were retrieved.
| Web Archive | Organisation | Web Archive ID ⭐️ |
|---|---|---|
| Arquivo.pt | 🇵🇹 FCCN/FCT | arq |
| Australia Web Archive (Trove) | 🇦🇺 National Library of Australia | awa |
| Webarchiv | 🇨🇿 National Library of the Czech Republic | cz |
| Government of Canada Web Archive | 🇨🇦 Library and Archives Canada | gcwa |
| Wayback Machine | 🇺🇸 Internet Archive | ia |
| Icelandic Web Archive (Vefsafn.is) | 🇮🇸 National and University Library of Iceland | iwa |
| Library of Congress Web Archive | 🇺🇸 Library of Congress | loc |
| National Library of Ireland Web Archive | 🇮🇪 National Library of Ireland | nliwa |
| New Zealand Web Archive | 🇳🇿 National Library of New Zealand | nzwa |
| PRONI Web Archive | 🇬🇧 The Public Record Office of Northern Ireland | pwa |
| Spletni Arhiv | 🇸🇮 National and University Library of Slovenia | slo |
| UK Government Web Archive (UKGWA) | 🇬🇧 The National Archives | ukgwa |
| 🇬🇧 British Library | ukwa |
⭐️ This ID is used to select the web archive you want to query.
If you maintain a web archive not currently supported, please contact us at overbrowsing@ed.ac.uk.
To install the Wasteback Machine as a dependency for your projects using NPM:
npm i @overbrowsing/wasteback-machineTo install the Wasteback Machine as a dependency for your projects using Yarn:
yarn add @overbrowsing/wasteback-machineThe Wasteback Machine provides two primary functions:
- Fetch available memento-datetimes within a specific web archive for a given URL and time range.
- Analyse a specific memento from a specific web archive to measure its page size and composition.
Get all mementos for https://nytimes.com between 1996 and 2025 from the Wayback Machine (ia)
import { getMementos } from "@overbrowsing/wasteback-machine";
const mementos = await getMementos(
"ia", // Web archive ID (ia = Wayback Machine)
"https://nytimes.uk", // Target URL
1996, // Start year
2025 // End year
);
console.log(mementos);[
'19961112181513',
'19961112181513',
'19961112181513',
'19961219002950'...
]Analyse https://nytimes.com from November 12, 1996 from the Wayback Machine (ia)
import { getMementoSizes } from "@overbrowsing/wasteback-machine";
const mementoData = await getMementoSizes(
"ia", // Web Archive ID (ia = Wayback Machine)
"https://nytimes.com", // Target URL
"19961112181513", // Memento datetime
{ includeResources: true } // Resource list (true/false)
);
console.log(mementoData);{
url: 'https://nytimes.com',
requestedMemento: '19961112181513',
memento: '19961112181513',
mementoUrl: 'https://web.archive.org/web/19961112181513if_/https://nytimes.com',
archive: 'Wayback Machine',
archiveOrg: 'Internet Archive',
archiveUrl: 'https://web.archive.org',
sizes: {
html: { bytes: 1653, count: 1 },
stylesheet: { bytes: 0, count: 0 },
script: { bytes: 0, count: 0 },
image: { bytes: 46226, count: 2 },
video: { bytes: 0, count: 0 },
audio: { bytes: 0, count: 0 },
font: { bytes: 0, count: 0 },
flash: { bytes: 0, count: 0 },
plugin: { bytes: 0, count: 0 },
data: { bytes: 0, count: 0 },
document: { bytes: 0, count: 0 },
other: { bytes: 0, count: 0 },
total: { bytes: 47879, count: 3 }
},
completeness: '100%',
resources: [
{
url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/index.gif',
type: 'image',
size: 45259
},
{
url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/free-images/marker.gif',
type: 'image',
size: 967
}
]
}The Wasteback Machine CLI lets you easily query web archives, fetch mementos for a given URL and date, and see page size, composition, and estimated emissions using CO2.js.
To initate Wasteback Machine CLI using NPM:
npm run cli1. Enter web archive ID ('help' to list archives or [Enter ↵] = Wayback Machine):
2. Enter URL to analyse:
3. Enter target year (YYYY):
4. Enter target month (MM or [Enter ↵] = 01):
5. Enter target day (DD or [Enter ↵] = 01):________________________________________________________
MEMENTO INFO
Memento URL: https://web.archive.org/web/19961112181513if_/https://nytimes.com
Web Archive: Wayback Machine
Organisation: Internet Archive
Website: https://web.archive.org
________________________________________________________
PAGE SIZE
Data: 46.76 KB
Emissions: 0.014 g CO₂e
Completeness: 100%
________________________________________________________
PAGE COMPOSITION
HTML
Count: 1
Data: 1653 bytes (3.5%)
Emissions: 0.000 g CO₂e
IMAGE
Count: 2
Data: 46226 bytes (96.5%)
Emissions: 0.013 g CO₂e
________________________________________________________For details of the underlying methodology, assumptions, and limitations, please refer to our paper DOI 10.1371/journal.pclm.0000767.
Wasteback Machine was developed as part of doctoral research at The University of Edinburgh’s Institute for Design Informatics.
Important
Wasteback Machine is provided for informational and research purposes only. The authors make no guarantees about the accuracy of the results and disclaim any liability for their use. Use of Wasteback Machine is subject to the terms of service of each respective web archive.
Contributions are welcome! Please submit an issue or a pull request.
The Wasteback Machine is licensed under Apache 2.0. For full licensing details, see the LICENSE file.