WebNovelScraper

WebNovelScraper is a Node.js application to automatically download chapters from web novels as they become available and compile them into an EPUB format.

Features

Fetches chapters from a given starting URL.
Catches 429 errors and waits to try again.
Downloads cover image if available.
Compiles fetched chapters into an EPUB.
Real-time updates via WebSocket.
Simple Web UI for starting and monitoring download progress.
Fetches new chapters automagically through /cron endpoint
Checks RR for 'STUB' status

Prerequisites

Node.js (compatible with the latest version)
npm (Node package manager)
Cron (optional, if you want to automatically fetch new chapters, keep an eye on your logs in case you get blacklisted!)

Installation

Clone the repository:

git clone https://github.com/arshem/webnovelscraper.git
cd webnovelscraper

Install the necessary dependencies:

npm install axios express cheerio ejs epub-gen ws

Usage

Start the server:
```
node novel.js
```
Open your browser and navigate to http://localhost:3000.
Input the starting URL of the web novel in the provided UI and click "Start Download".
Monitor the real-time updates on the download progress.

How It Works

File Structure

public/: Contains static files, including downloaded chapters and cover images.
views/: Contains EJS templates for rendering the web interface.
index.js: Main Node.js script.
books.json: Created on first web novel scraping to help crons run

Main Components

Express Server: Serves the web interface and handles API requests.
Axios: Used for HTTP requests to fetch web novel content.
Cheerio: Parses and extracts content from HTML.
WebSocket: Provides real-time updates on download progress.
epub-gen: Compiles downloaded chapters into an EPUB format.

Endpoints

GET /: Serves the main web interface.
POST /download: Initiates the download process using the provided start URL.
GET /books: Used to generate a JSON file of books in public directory
GET /cron: Used to look for ongoing webnovels and download the latest chapters of each.

Functions

fetchChapter(url, sendUpdate): Fetches a single chapter and returns its content and the URL for the next chapter.
saveChapter(content, chapterNumber, directory, sendUpdate): Saves the fetched chapter as an HTML file.
downloadCoverImage(coverUrl, directory, sendUpdate): Downloads the cover image.
createEpub(title, author, directory, sendUpdate): Compiles downloaded chapters into an EPUB.
downloadChapters(title, author, startUrl, chapterRange, coverUrl, sendUpdate): Manages the download process for the chapters.
getTitlePage(url, sendUpdate): Fetches the title page to extract novel metadata.
extractChapterNumber(chapterText): Grabs the chapter number to write the files correctly.

Example

To download a novel, use its start URL in the web interface, for example:

Title Page: https://example.com/novel/123/

Contributing

Contributions are welcome! Please follow these steps to contribute:

Fork the repository.
Create your feature branch (git checkout -b feature/new-feature).
Commit your changes (git commit -am 'Add a new feature').
Push to the branch (git push origin feature/new-feature).
Create a new Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Feel free to provide feedback or report issues in the project's issue tracker.

Happy scraping! 📚

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
views		views
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
novel.js		novel.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebNovelScraper

Supported Sites:

Features

Prerequisites

Installation

Usage

How It Works

File Structure

Main Components

Endpoints

Functions

Example

Contributing

License

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

arshem/webNovelScraper

Folders and files

Latest commit

History

Repository files navigation

WebNovelScraper

Supported Sites:

Features

Prerequisites

Installation

Usage

How It Works

File Structure

Main Components

Endpoints

Functions

Example

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages