Skip to content

This repository contains the projects related to data collecting, assessing,cleaning,visualizations and analyzing

Notifications You must be signed in to change notification settings

chetan11-stack/Data-Project

Repository files navigation

Data-Analytics-Projects

This repository is mainly for projects I have done under Udemy-Learning Python for Data Analysis and Visualization. Udemy online data analyst program prepares me for a career as a data analyst by helping me learn to clean and organize data, uncover patterns and insights, draw meaningful conclusions, and clearly communicate critical findings. I am developing proficiency in Python and its data analysis libraries (Numpy, pandas, Matplotlib) and SQL as I build a portfolio of projects .

Tips: For data science projects with python, I would recomend you to install numpy , pandas , scipy , scikit learn , matplotlib , seaborn thest basic libraries.

Part 1 - Intro to Data Analysis

Subjects Covered:

Anaconda: Learn to use Anaconda to manage packages and environments for use with Python Jupyter Notebook: Learn to use this open-source web application Data Analysis Process NumPy for 1 and 2D Data Pandas Series and Dataframes

Project 1: Titanic - Machine Learning from Disaster

The sinking of the Titanic is one of the most infamous shipwrecks in history. Check out the Kaggle Titanic Challenge at the following link: https://www.kaggle.com/c/titanic-gettingStarted

In this Project, i completed the entire analysis process,build a predictive model that answers the question: “What sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

Project 2: Stock Market Analysis

In this portfolio project was looking at data from the stock market, particularly some technology stocks. I used use pandas to get stock information,and visualize different aspects of it using seborn and matplotlib, and finally look at a few ways of analyzing the risk of a stock, based on its previous performance history. and also predicted future stock prices through a Monte Carlo method!

Project 3: Election Data Project - Polls and Donors

In this project,I analyze two datasets. The first data set was the results of political polls and the second data set was the Donor Data Set.I analyze this aggregated data sets and answer some questions.

Part 2 - Machine Learnig

Subjects Covered:

-Regression

-Multiple Linear Regression

-Logistic Regression

-Support Vector Machines

-Naive Bayes

-Decision Tree and Random Forests

-Natural Language Processing

Part 3 - Data Extraction and Wrangling

Subjects Covered:

GATHERING DATA:

-Gather data from multiple sources, including gathering files, programmatically downloading files, web-scraping data, and accessing data from APIs -Import data of various file formats into pandas, including flat files (e.g. TSV), HTML files, TXT files, and JSON files -Store gathered data in a PostgreSQL database

ASSESSING DATA

-Assess data visually and programmatically using pandas -Distinguish between dirty data (content or “quality” issues) and messy data (structural or “tidiness” issues) Identify data quality issues and categorize them using metrics: validity, accuracy, completeness, consistency, and uniformity

CLEANING DATA

-Identify each step of the data cleaning process (defining, coding,and testing) -Clean data using Python and pandas -Test cleaning code visually and programmatically using Python

Part 4 - Data Visualisation

Subjects Covered:

-Univariate exploration of data ( histogram , bar charts , Use axis limits and different scales ) -Bivariate exploration of data ( scatter plots , clustered bar charts , violin and bar charts , faceting ) -Multivariate exploration of data ( encodings , plot matrices , feature enginnering ) -Explanatory Visulizations ( story telling with data , polish plots , create slide deck )

After completing this cousre there are some more Data Projects i have worked on which are available on kaggale.

Project: Spotify Song Attribute EDA

I used Analysis Procees to answer questions about the data and report my conclusions and Visualization in a report. The dataset spotify_songs contains key attributes of Spotify music, such as name, artist, album, genre, danceability, etc.

Project: Sales forecasting using Machine learning

Here, I use the dataset of Walmart sales to forecast future sales using machine learning in Python. Linear regression use to forecast sales. Numpy, Pandas, Sklearn, Scipy, Seaborn Python libraries used in this program. I implement in three steps first to import libraries second by using that libraries prepare data and third forecast.¶

About

This repository contains the projects related to data collecting, assessing,cleaning,visualizations and analyzing

Topics

Resources

Stars

Watchers

Forks