This repository is mainly for projects I have done under Udemy-Learning Python for Data Analysis and Visualization. Udemy online data analyst program prepares me for a career as a data analyst by helping me learn to clean and organize data, uncover patterns and insights, draw meaningful conclusions, and clearly communicate critical findings. I am developing proficiency in Python and its data analysis libraries (Numpy, pandas, Matplotlib) and SQL as I build a portfolio of projects .
Tips: For data science projects with python, I would recomend you to install numpy , pandas , scipy , scikit learn , matplotlib , seaborn thest basic libraries.
Subjects Covered:
Anaconda: Learn to use Anaconda to manage packages and environments for use with Python Jupyter Notebook: Learn to use this open-source web application Data Analysis Process NumPy for 1 and 2D Data Pandas Series and Dataframes
The sinking of the Titanic is one of the most infamous shipwrecks in history. Check out the Kaggle Titanic Challenge at the following link: https://www.kaggle.com/c/titanic-gettingStarted
In this Project, i completed the entire analysis process,build a predictive model that answers the question: “What sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).
In this portfolio project was looking at data from the stock market, particularly some technology stocks. I used use pandas to get stock information,and visualize different aspects of it using seborn and matplotlib, and finally look at a few ways of analyzing the risk of a stock, based on its previous performance history. and also predicted future stock prices through a Monte Carlo method!
In this project,I analyze two datasets. The first data set was the results of political polls and the second data set was the Donor Data Set.I analyze this aggregated data sets and answer some questions.
Subjects Covered:
-Regression
-Multiple Linear Regression
-Logistic Regression
-Support Vector Machines
-Naive Bayes
-Decision Tree and Random Forests
-Natural Language Processing
Subjects Covered:
-Gather data from multiple sources, including gathering files, programmatically downloading files, web-scraping data, and accessing data from APIs -Import data of various file formats into pandas, including flat files (e.g. TSV), HTML files, TXT files, and JSON files -Store gathered data in a PostgreSQL database
-Assess data visually and programmatically using pandas -Distinguish between dirty data (content or “quality” issues) and messy data (structural or “tidiness” issues) Identify data quality issues and categorize them using metrics: validity, accuracy, completeness, consistency, and uniformity
-Identify each step of the data cleaning process (defining, coding,and testing) -Clean data using Python and pandas -Test cleaning code visually and programmatically using Python
Subjects Covered:
-Univariate exploration of data ( histogram , bar charts , Use axis limits and different scales ) -Bivariate exploration of data ( scatter plots , clustered bar charts , violin and bar charts , faceting ) -Multivariate exploration of data ( encodings , plot matrices , feature enginnering ) -Explanatory Visulizations ( story telling with data , polish plots , create slide deck )
After completing this cousre there are some more Data Projects i have worked on which are available on kaggale.
I used Analysis Procees to answer questions about the data and report my conclusions and Visualization in a report. The dataset spotify_songs contains key attributes of Spotify music, such as name, artist, album, genre, danceability, etc.
Here, I use the dataset of Walmart sales to forecast future sales using machine learning in Python. Linear regression use to forecast sales. Numpy, Pandas, Sklearn, Scipy, Seaborn Python libraries used in this program. I implement in three steps first to import libraries second by using that libraries prepare data and third forecast.¶