Projects¶
Educations Projects¶
Sparkanos¶
From Zero to Know What¶
Publics Projects¶
Here you can find some of my main public projects.
Warning
All projects were built with public or fake data.
See my technical videos. Youtube¶
Deploy Google Kubernetes Engine¶
Technologies Used: GCP, GKE
¶
- Cluster deployment via gcloud
- Creation of the ingress controller
- Creation of the application
Deploy GCP With Terraform¶
Technologies Used: Terraform, GCP, Compute Engine
¶
- Creating the declarative script
- Machine deployment
Deploy Cloud Run (Google Cloud Platform)¶
Technologies Used: GCP, Github Actions, Cloud Run
¶
- Automated deployment with GitHub Actions
- Image versioning in the Registry
- Application deployment
Composer (Airflow on Google Cloud Plataform)¶
Technologies Used: GCP, Composer, GCS
¶
- Tutorial on resource creation
- Building a DAG
- Running on Composer
Run Container With Docker Compose¶
Technologies Used: Docker, Compose, Postgres, Minio
¶
- Creating a declarative Docker Compose file
- Hiding sensitive words
- Running 2 containers (Postgres and MinIO)
Spark Docker vs Fabric vs Databricks¶
Technologies Used: Docker, Jupyter, Databricks, Fabric
¶
- Creating a Spark application in Docker that generates fake data every 10 seconds
- Creating a Spark application in Docker that reads files from MinIO
- Creating a Spark application in Docker that generates a bar chart
- Running the same applications on Databricks
- Running the same applications on Fabric
Databricks Connect API¶
Technologies Used: Databricks, API, Delta Table
¶
- Import necessary libraries
- Connect to the API
- Display results
- Create database (via Spark SQL script)
- Materialize data in delta format
Spark Structured Streaming¶
Technologies Used: Spark, Postgres, Json File
¶
- Scan a folder near real time (every 5 seconds)
- Add checkpoint
- Process data and write to Postgres
Data Lake vs Lakehouse¶
Technologies Used: Minio, Trino
¶
- Difference between Data Lake vs Lakehouse
- File virtualization with Trino
- SQL querying on CSV files
Delta Table - Time Travel¶
Technologies Used: Delta Table
¶
- Use of Jupyter Notebook
- PySpark generating and creating CSV in the landing layer
- PySpark writing a Delta table in the bronze layer
- Data alteration in the Delta table
- Navigating between versions of the Delta table
Delta Table - Schema Evolution¶
Technologies Used: Delta Table
¶
- Use of Jupyter Notebook
- PySpark generating and creating CSV in the landing layer
- PySpark writing a Delta table in the bronze layer
- Schema change in the Delta table
- Practical demonstration of schema evolution
Process Near Real Time Google Cloud¶
Technologies Used: Pub/Sub, Cloud Function, BigQuery
¶
- Simulation of sending messages every 1 minute
- Topic creation in Pub/Sub
- Function creation in Cloud Functions
- Writing data to BigQuery
Analytics Near Real Time¶
Technologies Used: Python, Kafka, Streamlit
¶
- Fake data generator with Python
- Near real time processing with Kafka
- Data analysis with Streamlit
Incremental Update with Python¶
Technologies Used: Python
¶
- Creating a fake table with 1 million rows in SQL Server
- Building the Python notebook
- Full load to the destination (Postgres)
- Table comparison (Source vs. Destination)
- Change analysis
- Applying Upsert (Insert + Update)
- Validating inserted and updated data
Python Extract Data from WEB¶
Technologies Used: Python, Selenium
¶
- Automate file extraction
Data Quality Soda with Alert for Slack¶
Technologies Used: SQL Server, Big Query, Soda
¶
- Data ingestion into BigQuery
- Creating ingestion quality control with Soda
- Integration with Slack
Testing with Pytest¶
Technologies Used: Python, Pytest
¶
- Tool architecture
- Data catalog for Kafka topics
- Data catalog for Postgres tables
- Orchestration with Airflow
- Unit testing
Data Catalog¶
Technologies Used: Open Metadata, Postgres, Kafka
¶
- Tool architecture
- Data catalog for Kafka topics
- Data catalog for Postgres tables
- Orchestration with Airflow
- Unit testing
Data Catalog - Metadata Version¶
Technologies Used: Open Metadata, Postgres, Kafka
¶
- Catalog with Open Metadata
- Execution of a DAG in Airflow
- Visibility into table versions
- Visibility of deleted tables
Data Ingestion SQLServer to BigQuery with Python¶
Technologies Used: Python, SQLServer, BigQuery
¶
- Extract data from SQLServer
- Load data to BigQuery
Data Ingestion Postgres to BigQuery with Python¶
Technologies Used: Python, Postgres, BigQuery
¶
- Extract data from Postgres
- Load data to BigQuery
Data Ingestion MongoDB to BigQuery with Python¶
Technologies Used: Python, MongoDB, BigQuery
¶
- Extract data from MongoDB
- Load data to BigQuery
Data Ingestion Excel to BigQuery with Python¶
Technologies Used: Python, Excel, BigQuery
¶
- Extract data from Excel
- Load data to BigQuery
Data Ingestion CSV to BigQuery with Python¶
Technologies Used: Python, CSV, BigQuery
¶
- Extract data from CSV
- Load data to BigQuery
Data Ingestion API to BigQuery with Python¶
Technologies Used: Python, API, BigQuery
¶
- Extract data from API
- Load data to BigQuery
Free SQL course¶
Technologies Used: T-SQL, SQL Server
¶
- Basic SQL course
- Installing and using SQL Server
- Best pratices with T-SQL
Federated query with Trino¶
Technologies Used: Trino, Postgres, Minio, SQLServer
¶
- Creating tables in Postgres, SQL Server, and S3
- Inserting data into the three sources
- Performing a federated query with Trino