Wasteback Machine

What is Wasteback Machine?

Wasteback Machine is a JavaScript library for analysing archived web pages, measuring their size and composition to support retrospective, quantitative web research.

Features

Archive-agnostic access: Works with web archives that use the Memento Protocol and expose the unmodified archived page via the id_ endpoint.
Page composition analysis: Analyses the full structure of an archived page, including HTML, stylesheets, scripts, images, fonts, and more.
Resource inventory: Produces an optional structured list of all discovered resources with their URLs, types, and byte sizes.
Byte-accurate measurement: Precisely measures the size of each resource, cleans stylesheets and scripts to remove archive-injected content, and excludes any resources that are not part of the original page.
Completeness scoring: Calculates how completely an archived page and its resources were retrieved.

Supported Web Archives

Web Archive	Organisation	Web Archive ID ⭐️
Arquivo.pt	🇵🇹 FCCN/FCT	arq
Australia Web Archive (Trove)	🇦🇺 National Library of Australia	awa
Webarchiv	🇨🇿 National Library of the Czech Republic	cz
Government of Canada Web Archive	🇨🇦 Library and Archives Canada	gcwa
Wayback Machine	🇺🇸 Internet Archive	ia
Icelandic Web Archive (Vefsafn.is)	🇮🇸 National and University Library of Iceland	iwa
Library of Congress Web Archive	🇺🇸 Library of Congress	loc
National Library of Ireland Web Archive	🇮🇪 National Library of Ireland	nliwa
New Zealand Web Archive	🇳🇿 National Library of New Zealand	nzwa
PRONI Web Archive	🇬🇧 The Public Record Office of Northern Ireland	pwa
Spletni Arhiv	🇸🇮 National and University Library of Slovenia	slo
UK Government Web Archive (UKGWA)	🇬🇧 The National Archives	ukgwa
~~UK Web Archive~~ (Offline)	🇬🇧 British Library	ukwa

⭐️ This ID is used to select the web archive you want to query.

Adding a New Web Archive

If you maintain a web archive not currently supported, please contact us at overbrowsing@ed.ac.uk.

Installation

Using NPM

To install the Wasteback Machine as a dependency for your projects using NPM:

npm i @overbrowsing/wasteback-machine

Using Yarn

To install the Wasteback Machine as a dependency for your projects using Yarn:

yarn add @overbrowsing/wasteback-machine

Usage

The Wasteback Machine provides two primary functions:

Fetch available memento-datetimes within a specific web archive for a given URL and time range.
Analyse a specific memento from a specific web archive to measure its page size and composition.

1. Fetch Available Memento-datetimes

Get all mementos for https://nytimes.com between 1996 and 2025 from the Wayback Machine (ia)

import { getMementos } from "@overbrowsing/wasteback-machine";

const mementos = await getMementos(
  "ia", // Web archive ID (ia = Wayback Machine)
  "https://nytimes.uk", // Target URL
  1996, // Start year
  2025 // End year
);

console.log(mementos);

Example Output

[
  '19961112181513',
  '19961112181513',
  '19961112181513',
  '19961219002950'...
]

2. Analyse a Specific Memento

Analyse https://nytimes.com from November 12, 1996 from the Wayback Machine (ia)

import { getMementoSizes } from "@overbrowsing/wasteback-machine";

const mementoData = await getMementoSizes(
  "ia", // Web Archive ID (ia = Wayback Machine)
  "https://nytimes.com", // Target URL
  "19961112181513", // Memento datetime
  { includeResources: true } // Resource list (true/false)
);

console.log(mementoData);

Example Output

{
  url: 'https://nytimes.com',
  requestedMemento: '19961112181513',
  memento: '19961112181513',
  mementoUrl: 'https://web.archive.org/web/19961112181513if_/https://nytimes.com',
  archive: 'Wayback Machine',
  archiveOrg: 'Internet Archive',
  archiveUrl: 'https://web.archive.org',
  sizes: {
    html: { bytes: 1653, count: 1 },
    stylesheet: { bytes: 0, count: 0 },
    script: { bytes: 0, count: 0 },
    image: { bytes: 46226, count: 2 },
    video: { bytes: 0, count: 0 },
    audio: { bytes: 0, count: 0 },
    font: { bytes: 0, count: 0 },
    flash: { bytes: 0, count: 0 },
    plugin: { bytes: 0, count: 0 },
    data: { bytes: 0, count: 0 },
    document: { bytes: 0, count: 0 },
    other: { bytes: 0, count: 0 },
    total: { bytes: 47879, count: 3 }
  },
  completeness: '100%',
  resources: [
    {
      url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/index.gif',
      type: 'image',
      size: 45259
    },
    {
      url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/free-images/marker.gif',
      type: 'image',
      size: 967
    }
  ]
}

Wasteback Machine CLI

The Wasteback Machine CLI lets you easily query web archives, fetch mementos for a given URL and date, and see page size, composition, and estimated emissions using CO2.js.

Quick Start

To initate Wasteback Machine CLI using NPM:

npm run cli

CLI Prompts

1. Enter web archive ID ('help' to list archives or [Enter ↵] = Wayback Machine):
2. Enter URL to analyse:
3. Enter target year (YYYY):
4. Enter target month (MM or [Enter ↵] = 01):
5. Enter target day (DD or [Enter ↵] = 01):

Example Output

________________________________________________________

MEMENTO INFO

  Memento URL:    https://web.archive.org/web/19961112181513if_/https://nytimes.com
  Web Archive:    Wayback Machine
  Organisation:   Internet Archive
  Website:        https://web.archive.org

________________________________________________________

PAGE SIZE

  Data:           46.76 KB
  Emissions:      0.014 g CO₂e
  Completeness:   100%

________________________________________________________

PAGE COMPOSITION

  HTML
      Count:      1
      Data:       1653 bytes (3.5%)
      Emissions:  0.000 g CO₂e

  IMAGE
      Count:      2
      Data:       46226 bytes (96.5%)
      Emissions:  0.013 g CO₂e

________________________________________________________

Methodology

For details of the underlying methodology, assumptions, and limitations, please refer to our paper DOI 10.1371/journal.pclm.0000767.

Wasteback Machine was developed as part of doctoral research at The University of Edinburgh’s Institute for Design Informatics.

Disclaimer

Important

Wasteback Machine is provided for informational and research purposes only. The authors make no guarantees about the accuracy of the results and disclaim any liability for their use. Use of Wasteback Machine is subject to the terms of service of each respective web archive.

Contributing

Contributions are welcome! Please submit an issue or a pull request.

Licenses

The Wasteback Machine is licensed under Apache 2.0. For full licensing details, see the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
bin		bin
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wasteback Machine

What is Wasteback Machine?

Features

Supported Web Archives

Adding a New Web Archive

Installation

Using NPM

Using Yarn

Usage

1. Fetch Available Memento-datetimes

Example Output

2. Analyse a Specific Memento

Example Output

Wasteback Machine CLI

Quick Start

CLI Prompts

Example Output

Methodology

Disclaimer

Contributing

Licenses

About

Uh oh!

Languages

License

overbrowsing/wasteback-machine

Folders and files

Latest commit

History

Repository files navigation

Wasteback Machine

What is Wasteback Machine?

Features

Supported Web Archives

Adding a New Web Archive

Installation

Using NPM

Using Yarn

Usage

1. Fetch Available Memento-datetimes

Example Output

2. Analyse a Specific Memento

Example Output

Wasteback Machine CLI

Quick Start

CLI Prompts

Example Output

Methodology

Disclaimer

Contributing

Licenses

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages