Cloud Data Engineering & Analytics Pipeline

End-to-end data engineering and analytics project built on Google Cloud Platform and Databricks, showcasing a complete pipeline from data ingestion to analysis, machine learning, and visualization.

Objective

Design and implement a scalable cloud-based data pipeline to process, analyze, and visualize data using modern data engineering and analytics tools.

Tools & Technologies

Google Cloud Platform (GCS, BigQuery, Cloud Shell)
Databricks
Apache Spark (Spark SQL, DataFrames)
Spark MLlib
Looker Studio
SQL, Python (Jupyter Notebooks)

Workflow

Cloud Setup
Created a Google Cloud Storage (GCS) bucket and configured project resources.
Data Ingestion
Downloaded the dataset, uploaded it to GCS, and verified data integrity using Cloud Shell.
Data Manipulation & Querying
Imported data into BigQuery and executed analytical queries using:
- BigQuery Web Console
- Jupyter notebooks
Distributed Data Analysis
Loaded data into Spark DataFrames on Databricks and replicated analytical queries using:
- Spark SQL
- DataFrame operations
Data Enrichment
Applied a machine learning model using Spark MLlib to enhance the analysis.
Data Visualization
Built an interactive dashboard in Looker Studio to present insights (with optional visualization in Databricks).

Outcome

The project demonstrates how cloud storage, distributed computing, machine learning, and visualization tools can be integrated into a unified data pipeline for real-world analytics use cases.

Key Takeaway

This repository highlights practical experience in building and managing cloud-based data pipelines, combining data engineering and data analysis skills in a scalable environment.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
bigQuery.sql		bigQuery.sql
bigquery_nb.ipynb		bigquery_nb.ipynb
query_databricks.ipynb		query_databricks.ipynb
social_media_vs_productivity.csv		social_media_vs_productivity.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cloud Data Engineering & Analytics Pipeline

Objective

Tools & Technologies

Workflow

Outcome

Key Takeaway

About

Uh oh!

Releases

Packages

Languages

giuleo129/cloudComputing

Folders and files

Latest commit

History

Repository files navigation

Cloud Data Engineering & Analytics Pipeline

Objective

Tools & Technologies

Workflow

Outcome

Key Takeaway

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages