Modern Data Engineering
Evaluation of high-performance data transformation, orchestration, and
storage formats for modern data lakehouses.
| Tool |
Status |
Value |
| Apache Airflow |
ADOPT |
Industry standard for complex, enterprise-grade DAG orchestration. |
| dbt |
ADOPT |
The standard for modular, versioned, and tested SQL transformations. |
| Dagster |
TRIAL |
Asset-based orchestration focusing on software-defined data assets. |
| Airbyte |
ADOPT |
The open-source standard for EL (Extract-Load) pipelines. |
| Kappa Architecture |
TRIAL |
Stream-first architecture treating all data as events. Simplifying the stack by removing batch-layer redundancy. |
Processing & Dataframes
| Tool |
Status |
Value |
| Polars |
ADOPT |
High-performance Rust-powered DataFrames for efficient local processing. |
| Pandas |
ADOPT |
The standard for data manipulation and analysis in Python. |
Compute & Database Engines
| Engine |
Status |
Why |
| Google BigQuery |
ADOPT |
Fully managed, serverless enterprise data warehouse. My primary engine for analytics at scale. |
| Databricks |
ASSESS |
Unified analytics platform. Assessing for specific Spark-heavy workloads and Delta Lake integration. |
| DuckDB |
ADOPT |
In-process OLAP database for fast, local analytics and data profiling. |
| Apache Flink |
ASSESS |
Evaluating for complex, stateful stream processing with sub-second latency requirements. |
| Apache Spark (Dataproc) |
ADOPT |
The legacy powerhouse for massive batch processing and migration of on-prem Hadoop workloads to GCP. |
| ClickHouse |
ASSESS |
High-performance columnar database for real-time analytical queries and user-facing dashboards. |
| Neo4j |
TRIAL |
Specialized graph database for complex relationship analysis and network modeling. |
Operational & Cloud Databases
| Database |
Status |
Why |
| PostgreSQL |
ADOPT |
The world's most advanced open-source relational database. My default for structured data. |
| Snowflake |
ADOPT |
Cloud-native data warehouse with excellent separation of compute and storage. |
| Redis |
ADOPT |
High-performance in-memory data store. Essential for caching and real-time features. |
| MongoDB |
TRIAL |
Document-oriented NoSQL database. Assessing for specific flexible-schema use cases. |
| SQLAlchemy |
ADOPT |
The definitive Python SQL toolkit and Object Relational Mapper (ORM). |
| Format |
Status |
Why |
| Apache Iceberg |
ADOPT |
The dominant open table format for cloud-native data lakehouses. |
| Apache Parquet |
ADOPT |
The universal columnar storage format for high-performance analytics. |
| Apache Avro |
ADOPT |
Row-oriented format. The gold standard for streaming data serialization (Kafka/PubSub). |