Skip to content

Data Engineering & High-Performance Stack

The Modern Data Stack: separating storage from compute, prioritizing developer experience.

Resource Description
Apache Iceberg Dominant open table format—adopted by Snowflake, AWS, and Databricks for lakehouse architectures
Delta Lake Mature table format with deep Databricks integration and ACID transactions on data lakes
Apache Hudi Best for upsert capabilities and incremental processing with streaming data sources
Dagster My orchestration platform of choice—treats data assets as software products
dbt The transformation standard—enables testing, versioning, and documentation in SQL pipelines
Polars Rust-powered DataFrames—order of magnitude faster than Pandas for large datasets
DuckDB Embedded OLAP database—local analytics without infrastructure overhead
Trino Distributed SQL query engine for interactive analytics across heterogeneous data sources
Great Expectations Data quality framework for defining, monitoring, and validating data expectations
MinIO High-performance S3-compatible object storage for on-premise or hybrid cloud setups