Skip to content

Projects

Educations Projects

Sparkanos

image

From Zero to Know What

image

Publics Projects

Here you can find some of my main public projects.

Warning

All projects were built with public or fake data.


See my technical videos. Youtube


Deploy Google Kubernetes Engine

Imagemi

Technologies Used: GCP, GKE

  • Cluster deployment via gcloud
  • Creation of the ingress controller
  • Creation of the application

Repo


Deploy GCP With Terraform

Imagemi

Technologies Used: Terraform, GCP, Compute Engine

  • Creating the declarative script
  • Machine deployment

Video | Repo


Deploy Cloud Run (Google Cloud Platform)

Imagemi

Technologies Used: GCP, Github Actions, Cloud Run

  • Automated deployment with GitHub Actions
  • Image versioning in the Registry
  • Application deployment

Video | Repo


Composer (Airflow on Google Cloud Plataform)

Imagemi

Technologies Used: GCP, Composer, GCS

  • Tutorial on resource creation
  • Building a DAG
  • Running on Composer

Video | Repo


Run Container With Docker Compose

Imagemi

Technologies Used: Docker, Compose, Postgres, Minio

  • Creating a declarative Docker Compose file
  • Hiding sensitive words
  • Running 2 containers (Postgres and MinIO)

Video | Repo


Spark Docker vs Fabric vs Databricks

Imagemi

Technologies Used: Docker, Jupyter, Databricks, Fabric

  • Creating a Spark application in Docker that generates fake data every 10 seconds
  • Creating a Spark application in Docker that reads files from MinIO
  • Creating a Spark application in Docker that generates a bar chart
  • Running the same applications on Databricks
  • Running the same applications on Fabric

Video | Repo


Databricks Connect API

Imagemi

Technologies Used: Databricks, API, Delta Table

  • Import necessary libraries
  • Connect to the API
  • Display results
  • Create database (via Spark SQL script)
  • Materialize data in delta format

Video


Spark Structured Streaming

Imagemi

Technologies Used: Spark, Postgres, Json File

  • Scan a folder near real time (every 5 seconds)
  • Add checkpoint
  • Process data and write to Postgres

Video


Data Lake vs Lakehouse

Imagemi

Technologies Used: Minio, Trino

  • Difference between Data Lake vs Lakehouse
  • File virtualization with Trino
  • SQL querying on CSV files

Video


Delta Table - Time Travel

Imagemi

Technologies Used: Delta Table

  • Use of Jupyter Notebook
  • PySpark generating and creating CSV in the landing layer
  • PySpark writing a Delta table in the bronze layer
  • Data alteration in the Delta table
  • Navigating between versions of the Delta table

Video


Delta Table - Schema Evolution

Imagemi

Technologies Used: Delta Table

  • Use of Jupyter Notebook
  • PySpark generating and creating CSV in the landing layer
  • PySpark writing a Delta table in the bronze layer
  • Schema change in the Delta table
  • Practical demonstration of schema evolution

Video


Process Near Real Time Google Cloud

Imagemi

Technologies Used: Pub/Sub, Cloud Function, BigQuery

  • Simulation of sending messages every 1 minute
  • Topic creation in Pub/Sub
  • Function creation in Cloud Functions
  • Writing data to BigQuery

Video | Repo


Analytics Near Real Time

Imagemi

Technologies Used: Python, Kafka, Streamlit

  • Fake data generator with Python
  • Near real time processing with Kafka
  • Data analysis with Streamlit

Video


Incremental Update with Python

Imagemi

Technologies Used: Python

  • Creating a fake table with 1 million rows in SQL Server
  • Building the Python notebook
  • Full load to the destination (Postgres)
  • Table comparison (Source vs. Destination)
  • Change analysis
  • Applying Upsert (Insert + Update)
  • Validating inserted and updated data

Video | Repo


Python Extract Data from WEB

Imagemi

Technologies Used: Python, Selenium

  • Automate file extraction

Video


Data Quality Soda with Alert for Slack

Imagemi

Technologies Used: SQL Server, Big Query, Soda

  • Data ingestion into BigQuery
  • Creating ingestion quality control with Soda
  • Integration with Slack

Video | Repo


Testing with Pytest

Imagemi

Technologies Used: Python, Pytest

  • Tool architecture
  • Data catalog for Kafka topics
  • Data catalog for Postgres tables
  • Orchestration with Airflow
  • Unit testing

Video | Repo


Data Catalog

Imagemi

Technologies Used: Open Metadata, Postgres, Kafka

  • Tool architecture
  • Data catalog for Kafka topics
  • Data catalog for Postgres tables
  • Orchestration with Airflow
  • Unit testing

Video


Data Catalog - Metadata Version

Imagemi

Technologies Used: Open Metadata, Postgres, Kafka

  • Catalog with Open Metadata
  • Execution of a DAG in Airflow
  • Visibility into table versions
  • Visibility of deleted tables

Video


Data Ingestion SQLServer to BigQuery with Python

Imagemi

Technologies Used: Python, SQLServer, BigQuery

  • Extract data from SQLServer
  • Load data to BigQuery

Video | Repo


Data Ingestion Postgres to BigQuery with Python

Imagemi

Technologies Used: Python, Postgres, BigQuery

  • Extract data from Postgres
  • Load data to BigQuery

Video | Repo


Data Ingestion MongoDB to BigQuery with Python

Imagemi

Technologies Used: Python, MongoDB, BigQuery

  • Extract data from MongoDB
  • Load data to BigQuery

Video | Repo


Data Ingestion Excel to BigQuery with Python

Imagemi

Technologies Used: Python, Excel, BigQuery

  • Extract data from Excel
  • Load data to BigQuery

Video | Repo


Data Ingestion CSV to BigQuery with Python

Imagemi

Technologies Used: Python, CSV, BigQuery

  • Extract data from CSV
  • Load data to BigQuery

Video | Repo


Data Ingestion API to BigQuery with Python

Imagemi

Technologies Used: Python, API, BigQuery

  • Extract data from API
  • Load data to BigQuery

Video | Repo


Free SQL course

Imagemi

Technologies Used: T-SQL, SQL Server

  • Basic SQL course
  • Installing and using SQL Server
  • Best pratices with T-SQL

Video | Repo


Federated query with Trino

Imagemi

Technologies Used: Trino, Postgres, Minio, SQLServer

  • Creating tables in Postgres, SQL Server, and S3
  • Inserting data into the three sources
  • Performing a federated query with Trino

Video | Repo