▶What's the difference between Snowflake and BigQuery?
Snowflake is cloud-agnostic (AWS/Azure/GCP), uses virtual warehouses with independent compute scaling, offers time-travel (query historical snapshots), and dominates enterprise adoption ($50B valuation, 8000+ customers). Pricing: per-second compute + storage. BigQuery is Google-native, serverless (no warehouse management), uses slots for reserved compute, and excels in Google Cloud environments. Pricing: per-query with monthly commitments for high volume. Choose Snowflake for multi-cloud flexibility and team control over compute. Choose BigQuery for Google Cloud-first orgs wanting zero infrastructure management.
▶How do I optimize costs in Snowflake vs BigQuery?
Snowflake: (1) Right-size virtual warehouses, don't run X-Large for small queries; (2) Use clustering on high-cardinality columns to skip full table scans; (3) Understand query impact, smaller warehouse = slower but cheaper; (4) Monitor per-user spend with WAREHOUSE_METERING_HISTORY; (5) Suspend warehouses when idle. BigQuery: (1) Use partitioned tables to scan fewer bytes (costs are per-byte scanned, not compute time); (2) Use clustering for high-cardinality columns; (3) Run BI Engine for cached results; (4) Estimate query cost before running with `--dry_run` flag; (5) Use monthly commitments if volume >100TB/month. Snowflake: fixed per-second billing, easier to budget. BigQuery: usage-based, harder to predict but cheaper for ad-hoc queries.
▶Should I use ELT or ETL with these warehouses?
ELT (Extract-Load-Transform) is the modern default: load raw data into the warehouse first, then transform with SQL (dbt). Why: (1) Faster initial load (no pre-processing); (2) Flexible schema evolution (schema-on-read); (3) All transformations in SQL using warehouse compute; (4) Version-controlled dbt models; (5) Easy to backfill and test. ETL (Extract-Transform-Load) = transform before loading (old pattern, still used for legacy systems). Both Snowflake and BigQuery are optimized for ELT. Use dbt as your transformation framework, it generates SQL, runs on schedule, and integrates with both.
▶What's partitioning and clustering? Which should I use?
Partitioning: divide a table by a column (usually date) so queries only scan relevant partitions. Example: `PARTITION BY DATE(created_at)`, query '2026-01-01' only scans Jan 2026 data, not all years. Clustering: sort data within partitions by one or more columns (high-cardinality: user_id, product_id). Both reduce bytes scanned = lower cost + faster queries. Snowflake: clustering is automatic (Z-order) but you specify cluster key. BigQuery: partitioning is mandatory for large tables, clustering is optional. Use partitioning for temporal data (dates, timestamps). Use clustering for high-cardinality columns (user_id, account_id, product_id). Combine both: partition by date, cluster by user_id.
▶When should I use Snowflake/BigQuery vs Postgres?
Postgres: transactional OLTP (inserts/updates/deletes), relational consistency (ACID), small-to-medium datasets (1-10TB), low latency (ms), complex JOINs. Snowflake/BigQuery: analytical OLAP (massive SELECTs), denormalized schemas (fact tables), 100TB-EB scale, batch queries (seconds to minutes OK), compressed storage, cheap retention. Example: use Postgres as your operational database (customer orders, inventory), stream/batch to Snowflake for analytics (revenue by region, cohort analysis). Postgres = source of truth for transactional data. Snowflake/BigQuery = analytics engine for historical analysis + ML.
▶How do I implement time travel in Snowflake?
Time travel = query a table as it was at a specific time in the past. Snowflake: `SELECT * FROM table_name AT (TIMESTAMP => '2026-01-01 10:00:00'::timestamp_tz)` or `BEFORE (STATEMENT => query_id)`. Retention: 90 days by default (configurable). Use cases: (1) accidental DELETE recovery; (2) auditing data changes; (3) A/B testing by querying different points in time; (4) debugging data pipeline bugs. Cost: minimal, just READS, no extra storage. BigQuery: no native time travel, but snapshots exist (table decorators) with cheaper storage: `SELECT * FROM table@1234567890000` (milliseconds since epoch). Snowflake time travel is a major advantage for audit/recovery workflows.
▶How do dbt models fit into Snowflake/BigQuery?
dbt (data build tool) = your transformation framework. Write SQL SELECT statements, dbt compiles to INSERT/CREATE TABLE/CREATE VIEW. Workflow: (1) raw data lands in Snowflake/BigQuery (Fivetran/Airbyte does EL); (2) dbt models transform raw → mart tables; (3) dbt manages dependencies (model A depends on model B); (4) dbt runs on schedule (daily) or on-demand; (5) dbt tracks lineage and tests data quality (e.g., `not_null`, `unique` checks). Both Snowflake and BigQuery have first-class dbt support, dbt adapters handle SQL dialect differences. Use dbt for: schema evolution, version control (git), documentation (auto-generated), testing, deployment CI/CD.