Skip to content
Toby Steiner edited this page Jun 28, 2025 · 415 revisions
Thoth

The Thoth User Manual, for publishers and other creators of metadata records in Thoth, can be found here.

The wiki below provides an overview of Thoth's approach to Data and Metadata and its interactions with the Open Access Book Supply Chain. It also provides an overview of the Thoth Open Archiving Network.

Data and Metadata

In the digital realm, a Work usually consists of two constituent parts: data and metadata. The data comprise the contents of the publications, the information contained in it targeted at human readers, machine readers, or both. The metadata comprise all the data about the publication, such as its author, title, and subject classification.

Metadata are frequently also part of the Work. For example, the title and author are often mentioned on the opening pages, and the ISBN numbers are usually listed in the colophon. Despite this partial overlap, it is useful to distinguish between data and metadata, as they are handled in distinct manners in the Open Access book supply chain. There are several international Metadata Standards setting baseline quality criteria for metadata.

There are specific digital Data Formats and Metadata Formats that are supported by Thoth. An important subset of metadata is formed by Persistent Identifiers.

Open Access Book Supply Chain

Thoth operates at several level in the Open Access book supply chain. We employ here the categorization of key stakeholders and intermediaries proposed in Michael Clarke and Laura Ricci's 2021 report OA Books Supply Chain Mapping (Clarke & Ricci 2021).

Content Funders

Funders

Work records in Thoth allow Content Creators to add information about Funding by referencing an Institution by means of Persistent Identifiers as well as further grant program and project information. Content Funders are able to harvest these data through one of the Thoth Metadata Formats or our Open API.

Libraries

Libraries, both University Libraries and National Libraries, have become increasingly important Content Funders in the OA Book Supply Chain. Thoth is partially funded by library subscriptions through the Open Book Collective and in return Thoth provides high-quality metadata in a range of Metadata Formats including MARC 21 that libraries can ingest into their Library Management Systems.

Additionally, Thoth is working with University Libraries in the context of the Thoth Open Archiving Network.

Content Creators

Publishers

Thoth is primarily designed as a platform for Content Creators, in particular Open Access and Hybrid-Model Publishers. Thoth provides integrated services for the maintenance, management, and dissemination of metadata records in a wide variety of Metadata Formats to a large selection of Content Platforms and Catalogs and Indices.

Publishers may use one of the available commercial Title Management Platforms or Publishing Platforms, which allow authors, editors, and publishers to collaborate in a digital, in-browser environment. Thoth is currently collaborating collaborating with Open Monograph Press and Janeway to improve integration with their in-platform metadata management functionalities.

OA ebook publishers may also find here a list of Useful Tools for Publishers developed by Thoth users.

Whereas this wiki focuses on mainly on the digital OA book supply chain, many OA publishers also publish print books via one of the commercial Print Book Distributors.

Authors

Individuals authors are not a targeted user group of Thoth. They may manage their private bibliographic metadata on one of the available commercial or open source Bibliographic Reference Management Platforms and upload their research directly to one of the Green OA Repositories. Thoth currently supports the export of metadata to all available Bibliographic Reference Management Platforms via BibTeX.

Authors are also end-users of the metadata provided by Thoth by accessing Knowledge Graphs and Web-Scale Search Engines and using any of the Content Platforms to access publications during their research phase.

Content Platforms

Content Platforms usually host both metadata and data (ebook files, and often also cover files). Thoth clients who opt for either of the Thoth Plus packages are able to distribute, besides their metadata, their data to select Content Platforms.

OA Platforms and Repositories

OA Platforms and Repositories "have no underlying infrastructure for the buying and selling of books, and are intended to host exclusively free or OA content" (Clarke & Ricci 2021). Thoth currently supports the export of metadata to OAPEN.

Ebook Aggregators

Ebook Aggregators "license and consolidate titles from many publishers into one combined database, [… and] often combine OA and paid-access titles for greater discoverability and convenience" (Clarke & Ricci 2021). Thoth currently supports the export of metadata to Baobab ebooks, EBSCO eBooks, JSTOR, Project MUSE, and ProQuest Ebook Central.

Shadow Libraries

Shadow Libraries are online databases of readily available content that is normally obscured or otherwise not readily accessible. Such content may be inaccessible for a number of reasons, including the use of paywalls, copyright controls, or other barriers to accessibility placed upon the content by its original owners" (Wikipedia). Thoth currently does not support export of metadata to any of the shadow libraries, as they don't support automated ingest.

Consumer Ebook Platforms

Consumer Ebook Platforms "offer titles for an individual’s use and access, and do not actively support institutional or library integration" (Clarke & Ricci 2021). Thoth currently supports the export of metadata to Google Play Books.

Ebook Distributors

Ebook Distributors do not claim to offer a scholarly function, be that to research institutions or to the general public. Distributors repackage and normalize ebook metadata. Most ebook distributors operate some form of monetization scheme, which may not be hospitable to OA books. Thoth currently supports the export of metadata to OverDrive and RNIB Bookshare.

Catalogs and Indices

Catalogs and Indices usually only host metadata.

Third-party Content Indices

Third-party Content Indices are more specialized types of products that promote metadata curation and discovery. Thoth currently supports the export of metadata to DOAB.

Knowledge Bases

Knowledge Bases are library-agnostic global content indices. Thoth currently supports the export of metadata to BDSLive, OCLC Knowledge Base and EBSCO Knowledge Base.

Topic-specific bibliographies

Topic-specific Bibliographies are managed by scholarly organizations related to a specific field of inquiry.

Citation Indices

Citation Indices, such as OpenCitations, provide specific indexing for citations and references.

Preservation Repositories

Thoth Plus members additionally have their metadata and data exported to a set of Preservation Repositories under the Thoth Open Archiving Network initiative for long-term archiving purposes. These data and metadata are often preserved in specific Data Formats and Metadata Formats. Institutional repositories often operate through one of the available Repository Systems such as DSpace or Figshare. See also our blog posts here and here.

The following Preservation Repositories are currently connected to Thoth as part of the Thoth Open Archiving Network:

Content Consumers

Human Readers

Thoth is not directly oriented toward human readers. However, the data and much of the metadata stored in it is, and so are the Content Platforms and Catalogs and Indices listed above. We are constantly working with our clients to improve the quality of both, also with regards to the evolving Accessibility standards of digital objects.

Nonhuman Readers

With the development of Large Language Models and associated AI technologies, an increasing amount of internet traffic to ebooks is handled by nonhuman readers, also referred to as "bots." As of April 2024, nearly 50% of internet traffic was caused by bots. XML-based data formats such as ePub are friendlier to nonhuman readers than print-legacy formats such as PDF. As the database of Thoth is publicly exposed and contains well-structured data in a native JSON format, it most likely has many nonhuman readers.

Clone this wiki locally