

Hello World! I'm Michael Shoemaker
Senior Data Analyst | Teacher | Content Creator | .5x Programmer
I build practical, production-like data engineering systems — orchestration, storage, transformations, serving, and observability — then explain the decisions behind them.
Core skills: Python · SQL · Linux · Airflow · Spark · BigQuery · Docker · GCP
🚀 Projects
All repos →

Airflow in Docker on GCP setup with Terraform
A mostly automated way to setup Airflow on a GCP VM with Terraform

Daily Weather Capture into Big Query using dlt
Getting historic weather data costs money. Getting recent weather data is free. So why not capture it yourself and build up a dataset?

YouTube Semantic Search
Ask questions across a YouTube playlist; returns time-coded hits with RAG.
🎥 Videos
Channel →
How to Install Tesseract OCR on Windows and use it with Python
Quick walkthrough for setting up Tessaract OCR on Windows

Removing Warning from Python 3.12+ for pip installs
Get rid of rule to not install packages system wide in Python Version >=3.12

Mounting an External Hard Drive in a Raspberry Pi
This is fun and allows you to make a Raspberry Pi with HUGE storage

Create a Dataflow Diagram with Draw.io
People like shiny things. To make your project "pop", here is a quick walkthrough for how to create an animated dataflow diagram with draw.io

Capture Daily Weather with dlt and Big Query
Getting historic weather data costs money. Getting recent weather data is free. So why not capture it yourself and build up a dataset?

Pulling Chicago Crime Data into Duckdb with dlt
dlt is a great way to quickly pull data from a Socrata API. Here is a quick walkthrough of how to do just that using the Chicago Crime Dataset
✍️ Articles
Medium →
Check for BIOS Update with Python on Linux Mint
Running a custom PC Build running Linux usually doesn't have a great utility to auto check for Bios Updates. Let's make our own. :-)

Set up Remote PostgreSQL on Raspberry Pi
Great way to learn Linux, SSH AND have a remote Postgres Instance to Play with

Pulling Data from the Chicago Data Portal with Sodapy
The Chicago Data Portal has a bunch of great data, but pulling it from the API can be a bit of a pain.

Quick Setup of Flights Data to Learn SQL
This article is a quick tutorial for how to setup and get going with the Kaggle 2015 Flight Delays and Cancellations Data.

Setup Docker on Linux Mint 20.X
Linux Mint is AWESOME. BUT it is also build on top of Ubuntu which can make installng things like Docker a bit tricky using the typical commands. Use this Article to get up and going quickly.