Skip to content

Data Engineering & Platform Modernization

For teams needing to build, scale, or migrate their data operations. I design and implement the robust pathways that make data usable, reliable, and timely.

  • ETL/ELT Pipeline Development: Building scalable batch and streaming workflows using Google Cloud Dataflow, Apache Spark, Python, Scala, and complex SQL.
  • Platform Migration & Modernization: Transitioning legacy on-premise environments (e.g., AS400, legacy Hadoop) or brittle data estates into modern Warehouse (BigQuery) or Lakehouse cloud architectures.
  • Data Debt Remediation: Fixing fragile pipelines, optimizing expensive SQL queries, and restructuring inefficient data models to improve reliability and reduce operational costs.
  • Event-Driven Data Systems: Designing reliable real-time ingestion and processing streams using Pub/Sub and scalable microservices.
  • Vendor-Agnostic Data Platforms: Designing and deploying fully open-source data stacks. If your goal is to avoid vendor lock-in or control infrastructure costs, I can build robust lakehouses using technologies like Apache Iceberg, Spark, Airflow, and dbt deployed on Kubernetes, independent of any specific cloud provider.
  • Custom Ingestion & Web Scraping: Developing robust, automated web scraping and custom API extraction pipelines to securely ingest alternative or third-party data sources.
  • Data Modeling & Quality: Implementing robust modeling layers (e.g., dbt) and ensuring governance and data quality are built-in from the start.