Available for opportunities

Eugene Wanjau

Data Engineer building scalable pipelines and ML systems — bridging the gap between raw data and production intelligence.

Apache Spark
dbt + Airflow
MLflow
Python / PySpark
Docker / K8s
PostgreSQL
About

Data at the core,
ML on the horizon.

I'm Eugene, a Data Engineer based in Nairobi, Kenya — currently working at Chambers Federation, where I build and maintain data infrastructure that supports agricultural commodity traceability across DRC and Rwanda.

My day-to-day spans everything from designing ETL pipelines and backend APIs to maintaining production PostgreSQL databases. I'm actively growing toward ML Engineering and MLOps — focused on taking models out of notebooks and into reliable, observable production systems.

When I'm not engineering pipelines, I'm building internal tooling with Django and React, exploring PySpark for large-scale data processing, and studying the patterns that separate great ML systems from fragile ones.

3+
Years experience
10+
Data systems built
2
Countries data coverage
Coffee consumed
Expertise

Tools of the trade.

Data Engineering
Designing batch and streaming pipelines that move, transform, and serve data at scale with reliability.
PySpark Apache Airflow dbt Kafka ETL/ELT
🗄️
Databases & Storage
Schema design, query optimization, and data warehouse patterns for both OLTP and OLAP workloads.
PostgreSQL BigQuery Redis Parquet Delta Lake
🤖
ML Engineering
Feature engineering, model training pipelines, and serving infrastructure using modern ML tooling.
scikit-learn MLflow Feature Stores pandas DVC
🚀
MLOps & DevOps
CI/CD for ML, model monitoring, experiment tracking, and containerized deployment workflows.
Docker GitHub Actions Kubernetes Prometheus Grafana
🛠️
Backend & APIs
Building robust Django REST APIs, Celery task queues, and full-stack applications that serve data products.
Django / DRF FastAPI Celery React REST / GraphQL
📊
Analytics & Viz
Transforming raw data into dashboards and reports that drive agricultural and business decisions.
Python / R Metabase Power BI Plotly ggplot2
Experience

Where I've built.

Data Engineer & Full-Stack Developer
2023 — Present
Chambers Federation · Nairobi, Kenya
Architected and built the CF Traceability Platform — a farm-to-export commodity traceability system covering coffee and cocoa supply chains across DRC and Rwanda. Designed the database schema, REST API, ETL pipelines, and two React portals (desktop clerk and mobile field agent). Implemented EUDR compliance data exports and Odoo ERP sync via XML-RPC. Managed IT infrastructure including Microsoft 365 provisioning, email security, and biometric attendance systems.
Django / DRF PostgreSQL 15 Celery / Redis React Python JWT Auth Docker
Data Analyst — Agricultural Data
2022 — 2023
COPROAD Cooperative · DRC (Remote)
Built a unified farmer data management dashboard tracking 506 cocoa farmers across three collection groups. Designed data collection workflows, cleaned and structured field survey data, and produced geospatial and production analytics reports for cooperative leadership and international buyers.
Python R PostgreSQL HTML / JS Data Viz
IT Systems & Infrastructure
2021 — 2022
Chambers Federation · Nairobi, Kenya
Managed organization-wide IT operations including domain email provisioning, Microsoft 365 administration, server-level email security (spam filtering, DNS records), and employee onboarding SOPs. Developed internal tooling and documentation to standardize operations across a multi-country team.
Microsoft 365 cPanel / Namehero DNS / SPF / DKIM IT SOPs
Projects

Things I've shipped.

🔄
Data Engineering
Commodity Data Pipeline
Batch ELT pipeline ingesting cooperative-level cocoa and coffee production data from field agents, transforming with dbt, and loading to a PostgreSQL warehouse. Orchestrated with Apache Airflow with alerting and data quality checks via Great Expectations.
PySpark Airflow dbt PostgreSQL Great Expectations
🌱
ML Engineering
Crop Yield Prediction Model
End-to-end ML pipeline predicting cocoa yield per cooperative based on farmer plot data, weather signals, and historical production. Tracked with MLflow, containerized for inference, and served via a FastAPI endpoint consumed by the traceability platform.
Python scikit-learn MLflow FastAPI Docker
📊
Analytics
COPROAD Farmer Tracker
A unified dashboard tracking 506 cocoa farmers across three collection groups in DRC. Color-coded group badges, group-adapted profile layouts, GPS plot data, and production analytics. Delivered as a standalone HTML/JS app for offline-capable field use.
JavaScript HTML / CSS R Geospatial
🔬
MLOps
ML Experiment Tracker
A lightweight MLflow + DVC setup for versioning datasets, tracking experiments, and comparing model performance across runs. Integrated with GitHub Actions for automated retraining triggers on new data commits.
MLflow DVC GitHub Actions Python Dagster
Contact

Let's build together.

Open to data engineering roles, ML/MLOps opportunities, and interesting collaborations. Based in Nairobi — happy to work remotely.