βΆWhat's the difference between JOINs (INNER, LEFT, RIGHT) and when do I use each?
INNER JOIN returns rows where condition matches in both tables (intersection). LEFT JOIN returns all rows from left table + matching rows from right table (nulls if no match). RIGHT JOIN is the reverse. Use INNER when you only want matched records (e.g., users with orders). Use LEFT when you want all from one table and optional data from another (e.g., all users, even without orders). Avoid RIGHT JOIN, rewrite as LEFT JOIN with swapped tables for clarity. Example: SELECT users.name, orders.total FROM users LEFT JOIN orders ON users.id = orders.user_id returns all users and their orders (or null if no orders).
βΆHow do window functions (ROW_NUMBER, RANK, LAG) unlock complex analytics?
Window functions perform calculations over a set of rows (partition) without collapsing them like GROUP BY does. ROW_NUMBER assigns unique numbers per partition (useful for deduplication or ranking). RANK allows ties (12, 12, 14 if two rows tied). LAG/LEAD access previous/next row values (useful for comparisons like month-over-month growth). Example: SUM(revenue) OVER (PARTITION BY user_id ORDER BY date) calculates cumulative revenue per user without grouping. These unlock analytics like cohort retention, funnel steps per user, running totals. Learn these to 10x your SQL impact.
βΆWhat are indexes and why do they matter for query performance?
Index = sorted copy of a column (B-tree structure) that speeds up lookups but slows writes. Without an index, queries scan every row (slow on 100M row tables). With an index, lookup is logarithmic (~30 disk reads for 1B rows vs 1B scans). Strategy: (1) Use EXPLAIN ANALYZE to see how many rows a query scans. (2) If scans >> returned rows, you're missing an index. (3) Index your WHERE conditions first (most common filters). (4) Compound indexes match query shape (order matters). (5) Avoid over-indexing (each slows inserts). Example: CREATE INDEX idx_users_email ON users(email) speeds up WHERE email = 'x@y.com' dramatically. Monitor index bloat and rebuild periodically.
βΆSQL vs NoSQL, when do I choose each?
SQL (PostgreSQL, MySQL): structured data, ACID transactions (all-or-nothing writes), complex queries with JOINs, data integrity via foreign keys. Use for banking, e-commerce, CRM, anything relational. NoSQL (MongoDB, Redis): flexible schemas, horizontal scaling, eventual consistency, nested data. Use for logs, caches, documents, IoT sensors. Hybrid approach: PostgreSQL for transactional core (users, orders, payments), MongoDB for logs/profiles, Redis for sessions. Choose SQL by default unless you have a specific reason (scale beyond single server, unstructured data, high write throughput).
βΆWhat are CTEs (Common Table Expressions) and how do they improve query readability?
CTE = temporary named result set (WITH clause) that makes complex queries readable. Instead of nesting subqueries (hard to parse), you build step-by-step. Example: WITH monthly_revenue AS (SELECT DATE_TRUNC('month', date) month, SUM(amount) revenue FROM sales GROUP BY month) SELECT * FROM monthly_revenue WHERE revenue > 100000. CTEs are also useful for recursion (hierarchical data like org charts). Benefits: (1) Reads top-to-bottom like pseudocode. (2) Easier debugging (test each CTE independently). (3) Reuse the CTE multiple times in the same query. Learn CTEs to write queries others can understand.
βΆWhat's the difference between normalization and denormalization, and when do I denormalize?
Normalization = breaking data into separate tables to eliminate redundancy (3NF: no transitive dependencies). Example: store city as foreign key, not in every user row. Prevents update anomalies (change city name once, not 1M times). Denormalization = storing redundant data (e.g., user name in orders table) to speed up queries by avoiding JOINs. Trade-off: Normalized = slower reads (more JOINs), faster writes. Denormalized = faster reads, slower writes, data consistency issues. Best practice: Normalize your operational database (production), denormalize for analytics (data warehouse, BI tools). Most companies use both: PostgreSQL for transactional OLTP, Snowflake/BigQuery for analytical OLAP (denormalized fact tables).