WebNovelScraper is a Node.js application to automatically download chapters from web novels as they become available and compile them into an EPUB format.
- webnovelpub.pro
- webnovelpub.co
- lightnovelworld.co
- lightnovelworld.com
- lightnovelhub.org
- lightnovelpub.com
- royalroad.com
- findnovel.net
- novelworm.net
- Fetches chapters from a given starting URL.
- Catches 429 errors and waits to try again.
- Downloads cover image if available.
- Compiles fetched chapters into an EPUB.
- Real-time updates via WebSocket.
- Simple Web UI for starting and monitoring download progress.
- Fetches new chapters automagically through /cron endpoint
- Checks RR for 'STUB' status
- Node.js (compatible with the latest version)
- npm (Node package manager)
- Cron (optional, if you want to automatically fetch new chapters, keep an eye on your logs in case you get blacklisted!)
-
Clone the repository:
git clone https://github.com/arshem/webnovelscraper.git cd webnovelscraper -
Install the necessary dependencies:
npm install axios express cheerio ejs epub-gen ws
-
Start the server:
node novel.js
-
Open your browser and navigate to http://localhost:3000.
-
Input the starting URL of the web novel in the provided UI and click "Start Download".
-
Monitor the real-time updates on the download progress.
public/: Contains static files, including downloaded chapters and cover images.views/: Contains EJS templates for rendering the web interface.index.js: Main Node.js script.books.json: Created on first web novel scraping to help crons run
- Express Server: Serves the web interface and handles API requests.
- Axios: Used for HTTP requests to fetch web novel content.
- Cheerio: Parses and extracts content from HTML.
- WebSocket: Provides real-time updates on download progress.
- epub-gen: Compiles downloaded chapters into an EPUB format.
GET /: Serves the main web interface.POST /download: Initiates the download process using the provided start URL.GET /books: Used to generate a JSON file of books inpublicdirectoryGET /cron: Used to look for ongoing webnovels and download the latest chapters of each.
fetchChapter(url, sendUpdate): Fetches a single chapter and returns its content and the URL for the next chapter.saveChapter(content, chapterNumber, directory, sendUpdate): Saves the fetched chapter as an HTML file.downloadCoverImage(coverUrl, directory, sendUpdate): Downloads the cover image.createEpub(title, author, directory, sendUpdate): Compiles downloaded chapters into an EPUB.downloadChapters(title, author, startUrl, chapterRange, coverUrl, sendUpdate): Manages the download process for the chapters.getTitlePage(url, sendUpdate): Fetches the title page to extract novel metadata.extractChapterNumber(chapterText): Grabs the chapter number to write the files correctly.
To download a novel, use its start URL in the web interface, for example:
Title Page: https://example.com/novel/123/
Contributions are welcome! Please follow these steps to contribute:
- Fork the repository.
- Create your feature branch (
git checkout -b feature/new-feature). - Commit your changes (
git commit -am 'Add a new feature'). - Push to the branch (
git push origin feature/new-feature). - Create a new Pull Request.
This project is licensed under the MIT License. See the LICENSE file for details.
Feel free to provide feedback or report issues in the project's issue tracker.
Happy scraping! 📚