Data Engineering & High-Performance Stack
The Modern Data Stack: separating storage from compute, prioritizing developer experience.
| Resource | Description |
|---|---|
| Apache Iceberg | Dominant open table format—adopted by Snowflake, AWS, and Databricks for lakehouse architectures |
| Delta Lake | Mature table format with deep Databricks integration and ACID transactions on data lakes |
| Apache Hudi | Best for upsert capabilities and incremental processing with streaming data sources |
| Dagster | My orchestration platform of choice—treats data assets as software products |
| dbt | The transformation standard—enables testing, versioning, and documentation in SQL pipelines |
| Polars | Rust-powered DataFrames—order of magnitude faster than Pandas for large datasets |
| DuckDB | Embedded OLAP database—local analytics without infrastructure overhead |
| Trino | Distributed SQL query engine for interactive analytics across heterogeneous data sources |
| Great Expectations | Data quality framework for defining, monitoring, and validating data expectations |
| MinIO | High-performance S3-compatible object storage for on-premise or hybrid cloud setups |