This repository is a complete collection of all five projects required for the Data Analysis with Python Developer Certification from freeCodeCamp.org.
This 300-hour certification program provided hands-on experience in data analysis by applying foundational Python libraries. Each project in this collection demonstrates proficiency in data manipulation, cleaning, visualization, time-series analysis, and statistical calculation.
Here is a summary of the projects included in this repository, each in its own directory.
- Description: A function that uses NumPy to calculate the mean, variance, standard deviation, max, min, and sum of a 3x3 matrix across both axes and for the flattened array.
- Key Libraries: NumPy
- Concepts: Array manipulation, statistical calculations, matrix operations.
- Description: An analysis of 1994 US Census data using Pandas. This project involved answering several descriptive questions about the dataset by filtering and grouping the data.
- Key Libraries: Pandas
- Concepts: Data cleaning, data filtering,
.groupby(),.value_counts(), boolean masking.
- Description: A visualization project using Matplotlib and Seaborn to analyze medical examination data. Tasks included creating a 'overweight' column, normalizing data, and generating a categorical plot and a correlation heatmap.
- Key Libraries: Pandas, Matplotlib, Seaborn
- Concepts: Data visualization, data cleaning, correlation matrix (heatmap), categorical plots (catplot).
- Description: A time-series analysis and prediction project. This involved plotting global sea level change since 1880, calculating two separate lines of best fit (one for all data, one for recent data), and predicting sea level rise through 2050.
- Key Libraries: Pandas, Matplotlib, SciPy
- Concepts: Linear regression (
scipy.stats.linregress), time-series analysis, data prediction.
- Description: A project to visualize and analyze time-series data from freeCodeCamp.org forum page views. This involved cleaning the data and creating line plots and box plots to identify yearly and monthly trends.
- Key Libraries: Pandas, Matplotlib, Seaborn
- Concepts: Time-series data, date parsing, box plots, line plots, data visualization.
The primary technologies used across these projects include:
- Python: The core programming language.
- Pandas: For data manipulation, cleaning, and analysis.
- NumPy: For high-performance scientific computing and array operations.
- Matplotlib: For creating static, animated, and interactive visualizations.
- Seaborn: A high-level interface for drawing attractive and informative statistical graphics.
- SciPy: For scientific and technical computing, specifically linear regression.
To run any of these projects locally, please follow the steps below.
- Python 3.8 or higher
- Git
-
Clone the repository:
git clone [https://github.com/](https://github.com/)[YOUR_USERNAME]/fcc-data-analysis-with-python-Collection.git cd fcc-data-analysis-with-python-Collection -
Navigate to a project directory:
# Example: cd 02-demographic-data-analyzer
-
Install dependencies: It is highly recommended to use a virtual environment.
# Create a virtual environment python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` # Install required packages pip install -r requirements.txt
-
Run the project: Each project's main logic is run from
main.py, which also imports the unit tests fromtest_module.py.python main.py
This project portfolio is licensed under the MIT License. See the LICENSE file for more details.
A special thank you to Quincy Larson and the entire freeCodeCamp.org team for creating this comprehensive and challenging curriculum, which provides invaluable, accessible education for all.