GitHub - cecilyal/Project_WikipediaAPI: Collecting text from Wikipedia articles to find related articles.

The purpose of this project was to get all the page text from Machine Learning Category in Wikipedia and as many pages in the Business Software category. The text was then analyized through Latent semantic analysis in order to find the top related articles.

This project looked at two categories within the Wikipedia API, Machine Learning and Business Software. The first category, Machine Learning contained many pages and also subcategories that also had pages.

I used for loops and functions to go into all the subcategories to get all the Page IDs in order the pull then pull text from all the pages to analyze. In order to be able to have the for loops and functions work, I created a url that accessed the API to get the page IDs for the subcategoreis.

I was then able to use a function that retrieved the text from all the articles based on the page IDs I had gotten previously. I used this same method for both Wikipedia Categories.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Business Software Part 2.ipynb		Business Software Part 2.ipynb
Business Software.ipynb		Business Software.ipynb
Machine Learning.ipynb		Machine Learning.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

cecilyal/Project_WikipediaAPI

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages