Skip to content

jwitcher3/data_cleaning_tools

Repository files navigation

🧹 Data Cleaning & EDA Tools

This repository contains reusable tools for quick and effective exploratory data analysis (EDA) and data cleaning, built with Python and Pandas. Ideal for analysts, data scientists, or anyone who works with messy tabular data regularly.


πŸ“‚ Structure

  • notebooks/: Interactive .ipynb version with full markdown explanations
  • scripts/: .py script version of the cleaning functions for reuse in projects

πŸ›  Features

  • Load and preview tabular data
  • Summarize data shape, types, and nulls
  • Count unique values and detect cardinality issues
  • Fill or drop missing values using strategies (mean, median, value)
  • Clean and standardize text columns
  • Rename and normalize column names
  • Export cleaned datasets

πŸ§ͺ Usage

In Notebook:

# Load and explore
df = pd.read_csv('your_file.csv')
df.shape
df.isnull().sum()

# Clean text
df = clean_text_cols(df, ['description', 'notes'])

# Fill missing values
df = fill_nulls(df, strategy='mean')

As Script:

Import the .py module and reuse functions.

from scripts.data_cleaning_tools import fill_nulls, clean_text_cols

πŸ“¦ Requirements

  • pandas
  • numpy
  • re (standard lib)

πŸ‘€ Author

James Witcher
LinkedIn

About

Tools built for cleaning data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published