Warehousing — know.2nth.ai

01 · Databases

The warehousing toolkit.

Five databases that cover different points on the OLTP-to-OLAP spectrum. PostgreSQL is live and is the default for new builds; the rest land as content ships.

PostgreSQL

Live

data/warehousing/postgresql

The default relational database. ACID, extensions (PostGIS, pgvector, TimescaleDB), JSON, full-text search. The warehouse Frappe, Superset, and most 2nth builds run on.

ClickHouse

Stub

data/warehousing/clickhouse

Column-oriented OLAP for sub-second analytics on billions of rows. The engine of choice when PostgreSQL with materialized views isn't fast enough for the query shape.

DuckDB

Stub

data/warehousing/duckdb

In-process analytical SQL. The SQLite of OLAP — no server, just a library. Runs inside Python scripts, dbt, notebooks, and Cloudflare Workers (WASM build).

BigQuery

Stub

data/warehousing/bigquery

Google's serverless warehouse. Pay-per-query, petabyte-scale, zero infrastructure to manage. The warehouse of choice when the team wants SQL and nothing else.

Snowflake

Stub

data/warehousing/snowflake

Multi-cloud warehouse with separation of storage and compute. Scale compute independently of storage; pay for what you run. The enterprise multi-cloud answer.

02 · Decision guides

When to migrate, when to offload, when to leave it alone.

Comparison and migration guides that sit alongside the database leaves. Each one is a concrete decision-path with cost math, week-by-week shape, and honest two-sided framing — not just "feature X vs feature Y."

Postgres vs Microsoft SQL

Live

data/warehousing/postgres-vs-mssql

The BI-offload play that drops your SQL Server licence by 35-70% without touching production. Two-paths framing (offload first, full migration later), function-by-function comparison, concrete licence math for a 16-core SMB, week-by-week migration shape.

03 · The shared pattern

OLTP vs OLAP — and the databases that blur the line.

Every database in this sub-hub sits somewhere on the OLTP-to-OLAP spectrum. Understanding where each one lives makes the choice obvious for most workloads.

OLTP → mixed → OLAP

OLTP (Online Transaction Processing) means many small, fast reads and writes — an order being created, a user logging in, a row being updated. Postgres is native OLTP: row-oriented storage, ACID transactions, B-tree indexes, connection pooling. It's built for the operational workload.

OLAP (Online Analytical Processing) means fewer but larger queries — "sum revenue by region for the last 90 days across 500 million rows." Column-oriented storage (ClickHouse, BigQuery, Snowflake) is dramatically faster for this shape: it reads only the columns the query touches, compresses tightly, and parallelises across many cores.

The interesting territory is the middle. PostgreSQL covers it surprisingly well with materialized views, partial indexes, and TimescaleDB hypertables. Most teams never need to leave Postgres. The others become relevant when specific shapes emerge: sub-second OLAP at massive scale (ClickHouse), zero-infra serverless (BigQuery), in-process analysis without a server (DuckDB), multi-cloud storage/compute separation (Snowflake).

# The spectrum

OLTP          Mixed                    OLAP
  │               │                         │
PostgreSQL ──── PostgreSQL + ──────── ClickHouse
               extensions              BigQuery
               (pgvector,             Snowflake
               TimescaleDB,           DuckDB
               mat. views)

The practical guide: start on PostgreSQL. Add materialized views for dashboard queries. If you're still slow after that with real data volumes, look at ClickHouse. If you need zero infrastructure, look at BigQuery. If you need in-process analysis in a script or notebook, reach for DuckDB.

Database	Best for	Not for	Infrastructure
PostgreSQL	General purpose — OLTP + light OLAP with extensions	Sub-second queries on 100B+ rows	Self-hosted / Cloud SQL / RDS / Supabase
ClickHouse	High-cardinality analytics on huge event streams	Transactional writes, small datasets	Self-hosted / ClickHouse Cloud
DuckDB	In-process analysis, local files (Parquet, CSV), notebooks	Concurrent multi-user writes	Embedded — no server
BigQuery	Petabyte-scale serverless SQL, GCP ecosystem	Low-latency transactional queries, cost-sensitive small data	Fully managed — Google Cloud
Snowflake	Multi-cloud enterprise warehouse, independent scaling	Cost-sensitive workloads, simple OLTP	Fully managed — AWS / Azure / GCP

04 · Related branches

Where the warehousing branch connects.

Warehousing is the substrate that analytics, ML pipelines, and agent memory all sit on. It connects upward to the data hub, laterally to analytics tools, and outward to edge infrastructure.

← Data data/analytics → tech/cloudflare/d1 · soon tech/cloudflare/hyperdrive · soon tech/frappe/framework →

Where the dataactually lives.

The warehousing toolkit.

When to migrate, when to offload, when to leave it alone.

OLTP vs OLAP — and the databases that blur the line.

OLTP → mixed → OLAP

Where the warehousing branch connects.

Where the data
actually lives.