Skip to content
View MateoVR13's full-sized avatar
🧠
Focused
🧠
Focused

Highlights

  • Pro

Block or report MateoVR13

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MateoVR13/README.md

👋 Mateo Vergara Roa


🌐 English Version

👨‍💻 About Me

  • 🎓 Software Engineer (Program completed – degree pending)
  • 🛠️ Junior Data Engineer with hands-on experience building data pipelines for real-world datasets
  • 🔄 Focused on ETL/ELT workflows, data modeling, and analytics-ready data layers
  • 📊 Background in applied Machine Learning and data-driven research (3 published articles)
  • 💻 Strong proficiency in Python and SQL
  • 🌍 Fluent English (technical & professional)

🧩 What I Actually Do

  • Build and maintain ETL pipelines from raw data to structured datasets
  • Clean, validate, and transform real-world, noisy data
  • Design relational data models optimized for analytics and ML
  • Work with large tabular datasets using Python, SQL, and Spark
  • Prepare data layers consumed by ML models and analytical tools

🚀 Technologies

Languages


Data Engineering & Processing


Data Transformation & Analytics Layer


Databases & Storage


Cloud & Platforms (Foundational)


Machine Learning (Data Consumer Perspective)


Version Control & Dev Tools


💼 Core Skills (Data Engineer Oriented)

  • Programming: Python, SQL
  • Data Engineering:
    • ETL / ELT pipeline development
    • Batch-oriented data processing
    • Data cleaning, validation, and transformation
    • Relational data modeling
  • Big Data: Apache Spark, PySpark, Spark SQL
  • Orchestration: Apache Airflow (DAGs, scheduling, dependencies)
  • Transformations: dbt
  • Databases: PostgreSQL, MySQL, SQLite
  • Cloud Fundamentals: AWS / GCP
  • ML Support: feature engineering, dataset preparation, reproducible pipelines
  • Version Control: Git, GitHub

📄 Research & Applied Experience

  • Co-author of 3 applied Machine Learning articles using real experimental datasets
  • Worked with structured and semi-structured data extracted from scientific sources
  • Designed reproducible data preprocessing pipelines for modeling and evaluation
  • Strong focus on data quality, consistency, and interpretability

🇪🇸 Versión en Español

👨‍💻 Sobre mí

  • 🎓 Ingeniero de Software (programa finalizado – grado pendiente)
  • 🛠️ Data Engineer Junior con experiencia práctica en pipelines de datos
  • 🔄 Enfocado en procesos ETL/ELT, modelado de datos y capas analíticas
  • 📊 Experiencia en Machine Learning aplicado y datos reales (3 artículos publicados)
  • 💻 Dominio sólido de Python y SQL
  • 🌍 Inglés avanzado (técnico y profesional)

🧩 Qué hago en la práctica

  • Construcción de pipelines de datos desde fuentes crudas hasta datasets estructurados
  • Limpieza, validación y transformación de datos reales y ruidosos
  • Diseño de modelos relacionales para análisis y ML
  • Procesamiento de datos con Python, SQL y Spark
  • Preparación de datasets para consumo analítico y científico

🚀 Tecnologías

Lenguajes


Ingeniería de Datos


Transformación y Capa Analítica


💼 Habilidades Clave

  • Programación: Python, SQL
  • Ingeniería de Datos: pipelines ETL/ELT, procesamiento batch, modelado relacional
  • Big Data: Apache Spark, PySpark
  • Orquestación: Apache Airflow
  • Transformaciones: dbt
  • Bases de Datos: PostgreSQL, MySQL, SQLite
  • Cloud (base): AWS / GCP
  • Soporte a ML: preparación de datasets y features
  • Control de Versiones: Git, GitHub

Thanks for visiting! / ¡Gracias por pasar!

Pinned Loading

  1. Image_Clustering_ResNet50_KMeans Image_Clustering_ResNet50_KMeans Public

    Image Clustering with ResNet50 & K-Means: A Python tool that uses deep learning feature extraction and unsupervised learning to automatically organize image collections into meaningful groups. Buil…

    Jupyter Notebook 4

  2. Image-Clustering-VGG16-KMeans Image-Clustering-VGG16-KMeans Public

    Image Clustering with VGG16 and K-Means: A Python tool that uses deep learning and unsupervised learning to automatically organize image collections into meaningful clusters. Developed with TensorF…

    Jupyter Notebook 1

  3. IAgroscan IAgroscan Public

    IAgroscan: An AI-powered desktop application for detecting and diagnosing plant diseases in agricultural crops using computer vision. Features single and batch detection, comprehensive reporting, t…

    Python 2

  4. File-Organizer File-Organizer Public

    File Organizer is a PyQt6-based desktop application that efficiently moves all files from subfolders to a main folder and deletes empty directories. It ensures smooth file management while maintain…

    Python 1