know.2nth.ai โ€บ Data โ€บ data โ€บ warehousing
data/warehousing ยท Sub-domain hub

Where the data
actually lives.

The storage and modeling layer underneath every dashboard, every ML pipeline, every report. PostgreSQL is the default; ClickHouse, DuckDB, BigQuery, and Snowflake fill specific shapes.

5
Databases
1
Live explainer
01 ยท Databases

The warehousing toolkit.

Five databases that cover different points on the OLTP-to-OLAP spectrum. PostgreSQL is live and is the default for new builds; the rest land as content ships.

PostgreSQL
Live
data/warehousing/postgresql

The default relational database. ACID, extensions (PostGIS, pgvector, TimescaleDB), JSON, full-text search. The warehouse Frappe, Superset, and most 2nth builds run on.

ClickHouse
Stub
data/warehousing/clickhouse

Column-oriented OLAP for sub-second analytics on billions of rows. The engine of choice when PostgreSQL with materialized views isn't fast enough for the query shape.

DuckDB
Stub
data/warehousing/duckdb

In-process analytical SQL. The SQLite of OLAP โ€” no server, just a library. Runs inside Python scripts, dbt, notebooks, and Cloudflare Workers (WASM build).

BigQuery
Stub
data/warehousing/bigquery

Google's serverless warehouse. Pay-per-query, petabyte-scale, zero infrastructure to manage. The warehouse of choice when the team wants SQL and nothing else.

Snowflake
Stub
data/warehousing/snowflake

Multi-cloud warehouse with separation of storage and compute. Scale compute independently of storage; pay for what you run. The enterprise multi-cloud answer.

02 ยท The shared pattern

OLTP vs OLAP โ€” and the databases that blur the line.

Every database in this sub-hub sits somewhere on the OLTP-to-OLAP spectrum. Understanding where each one lives makes the choice obvious for most workloads.

OLTP โ†’ mixed โ†’ OLAP

OLTP (Online Transaction Processing) means many small, fast reads and writes โ€” an order being created, a user logging in, a row being updated. Postgres is native OLTP: row-oriented storage, ACID transactions, B-tree indexes, connection pooling. It's built for the operational workload.

OLAP (Online Analytical Processing) means fewer but larger queries โ€” "sum revenue by region for the last 90 days across 500 million rows." Column-oriented storage (ClickHouse, BigQuery, Snowflake) is dramatically faster for this shape: it reads only the columns the query touches, compresses tightly, and parallelises across many cores.

The interesting territory is the middle. PostgreSQL covers it surprisingly well with materialized views, partial indexes, and TimescaleDB hypertables. Most teams never need to leave Postgres. The others become relevant when specific shapes emerge: sub-second OLAP at massive scale (ClickHouse), zero-infra serverless (BigQuery), in-process analysis without a server (DuckDB), multi-cloud storage/compute separation (Snowflake).

# The spectrum

OLTP          Mixed                    OLAP
  โ”‚               โ”‚                         โ”‚
PostgreSQL โ”€โ”€โ”€โ”€ PostgreSQL + โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ClickHouse
               extensions              BigQuery
               (pgvector,             Snowflake
               TimescaleDB,           DuckDB
               mat. views)

The practical guide: start on PostgreSQL. Add materialized views for dashboard queries. If you're still slow after that with real data volumes, look at ClickHouse. If you need zero infrastructure, look at BigQuery. If you need in-process analysis in a script or notebook, reach for DuckDB.

Database Best for Not for Infrastructure
PostgreSQL General purpose โ€” OLTP + light OLAP with extensions Sub-second queries on 100B+ rows Self-hosted / Cloud SQL / RDS / Supabase
ClickHouse High-cardinality analytics on huge event streams Transactional writes, small datasets Self-hosted / ClickHouse Cloud
DuckDB In-process analysis, local files (Parquet, CSV), notebooks Concurrent multi-user writes Embedded โ€” no server
BigQuery Petabyte-scale serverless SQL, GCP ecosystem Low-latency transactional queries, cost-sensitive small data Fully managed โ€” Google Cloud
Snowflake Multi-cloud enterprise warehouse, independent scaling Cost-sensitive workloads, simple OLTP Fully managed โ€” AWS / Azure / GCP
03 ยท Related branches

Where the warehousing branch connects.

Warehousing is the substrate that analytics, ML pipelines, and agent memory all sit on. It connects upward to the data hub, laterally to analytics tools, and outward to edge infrastructure.