This is a Search Engine all written in go, Crawl a website domain concurrently at the speed of light! then search what you want.
The crawler gets a chunk of the information retreived from the website, then uploads the infromation into a sqlite database,
Crawling is purposely set to chunks so it can easily be scalable way to assign 100-1000 workers to work together and conccurently crawl.
We have 2 types of workers:
Extract Worker
- processes a link and downloads it
- returns the contents and the links found DB worker
- gets infromation from extraction
- Uploads this into the database
I then call these paramaters
const (
MAX_WORKERS = 300 // maximum number of crawler workers
MAX_DB_WORKERS = 300 // maximum number of database workers
BATCH_SIZE = 150 // documents per batch
QUEUE_SIZE = 9000 // this is the buffer size of jobs
//determines how many jobs are allowed to be qued
)then this function uses these global paramters to assign the workers the task to work concurently
func crawlFullyConcurrent(db *sql.DB, seedURL string) (int, error)with this system of crawling conccreutly this app can scale very well. its just a matter of tuning the paramters for the website.
QUEUE_SIZE is a major bottle neck, if the website its looking at has more than 9000 links the program crashes
Init: Install and run go
go build
go run .
Once its done crawling visit:
http://localhost:8080

