▶What is ClickHouse and why is it so fast?
ClickHouse = columnar database (stores data by column, not row). Example: 'SELECT sum(revenue) FROM events' → reads only revenue column (1% of data) instead of all columns. Compression: similar values compress 10-100x. Vectorized execution: processes data in batches (CPU cache-friendly). Result: 100-1000x faster than row-oriented (PostgreSQL, MySQL) for analytics.
▶Is ClickHouse a replacement for Snowflake/BigQuery?
Overlapping use cases, different trade-offs. ClickHouse = better performance for simple queries on massive data (ad analytics). Snowflake = better for complex queries across multiple datasets (data warehouse). BigQuery = best if data already in Google Cloud. Pick based on: (1) query complexity, (2) cloud preference, (3) budget.
▶How do I get data into ClickHouse?
Methods: (1) Kafka integration (real-time streaming), (2) Bulk insert from files (CSV, Parquet), (3) ReplacingMergeTree (upserts for near-real-time), (4) HTTP API. ClickHouse native INSERT is fast (parallelize to 100+ clients). Typical: stream from Kafka, land in ClickHouse, query in <1s on billions of rows.
▶What's ReplicatedMergeTree and when do I need it?
MergeTree = ClickHouse's default table engine (optimized for time-series, compressed). ReplicatedMergeTree = distributed across multiple servers (high availability). Use: (1) single-server for dev/test, (2) replicated for production (failover if server dies). Complexity: requires ZooKeeper (coordination service).
▶Can ClickHouse do real-time analytics?
Yes. Kafka integration → ClickHouse in ~100ms latency. Unlike traditional DW (refresh every hour), ClickHouse enables real-time dashboards. Example: Uber can see ride requests + driver availability live (powered by ClickHouse).
▶What's the learning curve for SQL in ClickHouse?
ClickHouse SQL = 95% standard SQL. Differences: (1) GROUP BY is mandatory if selecting non-aggregated columns, (2) HAVING clause syntax differs slightly, (3) data types are different (DateTime instead of TIMESTAMP). If you know SQL, you'll be productive in 2-3 weeks.
▶What salary for ClickHouse expertise?
Data engineer ($110-150k) + ClickHouse = $150-200k. Analytics engineer ($100-140k) + ClickHouse = $140-190k. Rare skill: maybe 1000 ClickHouse specialists globally (vs 100k Snowflake engineers). High demand, scarce supply = premium salaries. Tech companies competing for ClickHouse talent.