SQL for Data Analysis: Advanced Queries, Patterns & Real-World Analytics

Introduction: SQL is the Analyst’s Superpower

Most developers see SQL as a tool for fetching data. You write SELECT statements. You join tables. You move on.

They miss the real power of SQL: analytical thinking.

SQL is fundamentally an analytical language. It’s designed to ask questions of data and get answers.

Here’s the difference:

Application queries (what most developers write):

SELECT * FROM users WHERE id = 123
SELECT * FROM orders WHERE user_id = 123

Simple. Direct. Fast. Returns one thing.

Analytical queries (what analysts write):

SELECT 
  user_id,
  COUNT(*) as order_count,
  AVG(amount) as avg_order_value,
  MAX(created_at) as last_order_date,
  ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) as rank
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5
ORDER BY avg_order_value DESC;

Complex. Reveals patterns. Answers questions. Returns insights.

Most backend engineers can’t write the second query. But if you can, you become someone who understands their system at a deeper level.

This guide teaches you to write powerful analytical queries. Not CRUD operations. Not simple reports.

Real analytical SQL that reveals what’s actually happening in your system.

Chapter 1: SQL Fundamentals for Analysis

Before advanced patterns, master the basics.

Group By & Aggregation

The foundation of analysis.

-- Total orders per user
SELECT 
  user_id,
  COUNT(*) as order_count,
  SUM(amount) as total_spent,
  AVG(amount) as avg_order_value
FROM orders
GROUP BY user_id;

Key aggregates:

COUNT(*) - Number of rows
COUNT(DISTINCT column) - Unique values
SUM(column) - Total
AVG(column) - Average
MIN(column) - Minimum
MAX(column) - Maximum
STDDEV(column) - Standard deviation
PERCENTILE_CONT(0.5) - Median (varies by DB)

Having Clause (Filter After Grouping)

-- Users who spent more than $1000
SELECT 
  user_id,
  SUM(amount) as total_spent
FROM orders
GROUP BY user_id
HAVING SUM(amount) > 1000;

Key difference:

WHERE - Filters BEFORE grouping (rows)
HAVING - Filters AFTER grouping (groups)

Order By with Limits

-- Top 10 users by spending
SELECT 
  user_id,
  SUM(amount) as total_spent
FROM orders
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 10;

Chapter 2: Window Functions - The Game Changer

Window functions let you do per-row calculations with aggregate context.

ROW_NUMBER - Ranking

-- Rank users by spending
SELECT 
  user_id,
  SUM(amount) as total_spent,
  ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) as rank
FROM orders
GROUP BY user_id;

Result:

user_id | total_spent | rank
--------|-------------|------
user_5  | 5000        | 1
user_3  | 3500        | 2
user_1  | 2000        | 3

RANK - With Ties

-- Rank with ties (multiple people can have rank 1)
SELECT 
  user_id,
  total_spent,
  RANK() OVER (ORDER BY total_spent DESC) as rank
FROM (
  SELECT user_id, SUM(amount) as total_spent
  FROM orders
  GROUP BY user_id
) sub;

Difference:

ROW_NUMBER() - Always unique (1, 2, 3, 4, 5)
RANK() - With ties (1, 1, 3, 4, 5)
DENSE_RANK() - Compressed (1, 1, 2, 3, 4)

LAG & LEAD - Compare Rows

-- Month-over-month comparison
SELECT 
  month,
  revenue,
  LAG(revenue) OVER (ORDER BY month) as prev_month_revenue,
  revenue - LAG(revenue) OVER (ORDER BY month) as change
FROM monthly_revenue
ORDER BY month;

Result:

month   | revenue | prev_month | change
--------|---------|------------|-------
2026-01 | 10000   | NULL       | NULL
2026-02 | 12000   | 10000      | 2000
2026-03 | 11500   | 12000      | -500

SUM() OVER - Running Total

-- Running total of revenue
SELECT 
  date,
  daily_revenue,
  SUM(daily_revenue) OVER (ORDER BY date) as cumulative_revenue
FROM daily_sales
ORDER BY date;

Result:

date       | daily | cumulative
-----------|-------|----------
2026-01-01 | 1000  | 1000
2026-01-02 | 1500  | 2500
2026-01-03 | 800   | 3300

PARTITION BY - Group Within Groups

-- Rank products within each category
SELECT 
  category,
  product_name,
  sales,
  RANK() OVER (PARTITION BY category ORDER BY sales DESC) as rank_in_category
FROM products;

Result:

category | product | sales | rank
---------|---------|-------|-----
Electronics | Phone | 5000 | 1
Electronics | Laptop | 3000 | 2
Electronics | Tablet | 2000 | 3
Clothing | Shirt | 800 | 1
Clothing | Pants | 600 | 2

Percentile Calculations

-- Percentile rank (what % are below this person?)
SELECT 
  user_id,
  total_spent,
  PERCENT_RANK() OVER (ORDER BY total_spent) as percentile_rank,
  NTILE(4) OVER (ORDER BY total_spent) as quartile
FROM (
  SELECT user_id, SUM(amount) as total_spent
  FROM orders
  GROUP BY user_id
) sub;

Chapter 3: Common Table Expressions (CTEs)

Make complex queries readable.

Basic CTE

-- Find high-value customers and their order patterns
WITH high_value_customers AS (
  SELECT 
    user_id,
    COUNT(*) as order_count,
    SUM(amount) as total_spent,
    AVG(amount) as avg_order_value
  FROM orders
  GROUP BY user_id
  HAVING SUM(amount) > 5000
)
SELECT 
  user_id,
  order_count,
  total_spent,
  avg_order_value,
  total_spent / order_count as cost_per_order
FROM high_value_customers
ORDER BY total_spent DESC;

Multiple CTEs

-- Analyze customer lifecycle
WITH monthly_customers AS (
  SELECT 
    DATE_TRUNC('month', created_at) as month,
    COUNT(DISTINCT user_id) as new_customers
  FROM users
  GROUP BY DATE_TRUNC('month', created_at)
),
monthly_orders AS (
  SELECT 
    DATE_TRUNC('month', created_at) as month,
    COUNT(*) as order_count,
    SUM(amount) as revenue
  FROM orders
  GROUP BY DATE_TRUNC('month', created_at)
)
SELECT 
  c.month,
  c.new_customers,
  o.order_count,
  o.revenue,
  CASE 
    WHEN c.new_customers > 0 
    THEN o.revenue / c.new_customers 
    ELSE 0 
  END as revenue_per_new_customer
FROM monthly_customers c
LEFT JOIN monthly_orders o ON c.month = o.month
ORDER BY c.month;

Recursive CTE

-- Generate date range without data table
WITH RECURSIVE date_range AS (
  SELECT '2026-01-01'::date as date
  UNION ALL
  SELECT date + INTERVAL '1 day'
  FROM date_range
  WHERE date < '2026-12-31'::date
)
SELECT date FROM date_range;

Chapter 4: Advanced Aggregation Patterns

Multiple Aggregations

-- Comprehensive user analysis
SELECT 
  user_id,
  COUNT(*) as order_count,
  SUM(amount) as total_spent,
  AVG(amount) as avg_order_value,
  MIN(amount) as min_order,
  MAX(amount) as max_order,
  STDDEV(amount) as order_value_stddev,
  COUNT(DISTINCT DATE_TRUNC('day', created_at)) as active_days,
  MAX(created_at) as last_order_date
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 0;

Conditional Aggregation

-- Revenue by order status
SELECT 
  SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) as completed_revenue,
  SUM(CASE WHEN status = 'pending' THEN amount ELSE 0 END) as pending_revenue,
  SUM(CASE WHEN status = 'cancelled' THEN amount ELSE 0 END) as cancelled_revenue,
  COUNT(CASE WHEN status = 'completed' THEN 1 END) as completed_orders,
  COUNT(CASE WHEN status = 'pending' THEN 1 END) as pending_orders
FROM orders;

Result:

completed_revenue | pending_revenue | cancelled_revenue | completed_orders | pending_orders
------------------|-----------------|-------------------|------------------|---------------
500000            | 50000           | 5000              | 250              | 50

Distribution Analysis

-- Distribution of order values
SELECT 
  CASE 
    WHEN amount < 50 THEN 'Small (< $50)'
    WHEN amount < 100 THEN 'Medium ($50-$100)'
    WHEN amount < 500 THEN 'Large ($100-$500)'
    ELSE 'Enterprise (> $500)'
  END as order_size,
  COUNT(*) as order_count,
  SUM(amount) as revenue,
  ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) as percent_of_orders
FROM orders
GROUP BY order_size
ORDER BY COUNT(*) DESC;

Chapter 5: Time Series Analysis

Period Comparisons

-- Daily, weekly, monthly comparison
SELECT 
  DATE_TRUNC('day', created_at) as day,
  DATE_TRUNC('week', created_at) as week,
  DATE_TRUNC('month', created_at) as month,
  COUNT(*) as order_count,
  SUM(amount) as daily_revenue
FROM orders
GROUP BY 
  DATE_TRUNC('day', created_at),
  DATE_TRUNC('week', created_at),
  DATE_TRUNC('month', created_at)
ORDER BY day DESC;

Year-over-Year (YoY) Comparison

-- Compare same month last year
SELECT 
  DATE_PART('month', created_at) as month,
  DATE_PART('year', created_at) as year,
  COUNT(*) as order_count,
  SUM(amount) as revenue,
  LAG(SUM(amount)) OVER (
    PARTITION BY DATE_PART('month', created_at) 
    ORDER BY DATE_PART('year', created_at)
  ) as previous_year_revenue
FROM orders
GROUP BY 
  DATE_PART('month', created_at),
  DATE_PART('year', created_at)
ORDER BY year DESC, month;

Cohort Analysis

-- Analyze customer cohorts by signup month
SELECT 
  DATE_TRUNC('month', u.created_at) as signup_month,
  DATE_TRUNC('month', o.created_at) as order_month,
  COUNT(DISTINCT u.id) as customers
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY 
  DATE_TRUNC('month', u.created_at),
  DATE_TRUNC('month', o.created_at)
ORDER BY signup_month DESC, order_month DESC;

Chapter 6: Join Patterns for Analysis

Multi-Join Analysis

-- Order details with user and product info
SELECT 
  o.id as order_id,
  u.email,
  p.name as product_name,
  p.category,
  o.quantity,
  o.amount,
  o.created_at,
  CASE 
    WHEN o.created_at >= NOW() - INTERVAL '30 days' THEN 'Recent'
    WHEN o.created_at >= NOW() - INTERVAL '90 days' THEN 'Moderate'
    ELSE 'Old'
  END as recency
FROM orders o
INNER JOIN users u ON o.user_id = u.id
INNER JOIN products p ON o.product_id = p.id
WHERE o.created_at >= NOW() - INTERVAL '180 days'
ORDER BY o.created_at DESC;

Left Join for Missing Data Analysis

-- Find users who never ordered
SELECT 
  u.id,
  u.email,
  u.created_at,
  COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.email, u.created_at
HAVING COUNT(o.id) = 0
ORDER BY u.created_at DESC;

Self-Join for Comparisons

-- Find users with similar spending patterns
SELECT 
  u1.user_id as user1,
  u2.user_id as user2,
  u1.total_spent as user1_spent,
  u2.total_spent as user2_spent,
  ABS(u1.total_spent - u2.total_spent) as difference
FROM (
  SELECT user_id, SUM(amount) as total_spent
  FROM orders
  GROUP BY user_id
) u1
CROSS JOIN (
  SELECT user_id, SUM(amount) as total_spent
  FROM orders
  GROUP BY user_id
) u2
WHERE u1.user_id < u2.user_id
  AND ABS(u1.total_spent - u2.total_spent) < 100
ORDER BY difference;

Chapter 7: Subqueries for Complex Analysis

Subquery in WHERE

-- Find orders from top 10 customers
SELECT 
  o.*
FROM orders o
WHERE o.user_id IN (
  SELECT user_id
  FROM orders
  GROUP BY user_id
  ORDER BY SUM(amount) DESC
  LIMIT 10
)
ORDER BY o.created_at DESC;

Subquery in FROM (Derived Table)

-- Analyze customer segments
SELECT 
  segment,
  COUNT(*) as customer_count,
  AVG(total_spent) as avg_spending,
  MIN(total_spent) as min_spending,
  MAX(total_spent) as max_spending
FROM (
  SELECT 
    user_id,
    SUM(amount) as total_spent,
    CASE 
      WHEN SUM(amount) > 10000 THEN 'VIP'
      WHEN SUM(amount) > 1000 THEN 'Regular'
      ELSE 'Casual'
    END as segment
  FROM orders
  GROUP BY user_id
) customer_segments
GROUP BY segment
ORDER BY AVG(total_spent) DESC;

Correlated Subquery

-- For each order, find its rank within that day
SELECT 
  o.id,
  o.amount,
  o.created_at,
  (
    SELECT COUNT(*)
    FROM orders o2
    WHERE DATE(o2.created_at) = DATE(o.created_at)
      AND o2.amount >= o.amount
  ) as rank_in_day
FROM orders o
ORDER BY o.created_at DESC;

Chapter 8: Query Optimization for Analysis

Index Strategy

-- Good indexes for analytical queries
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
CREATE INDEX idx_orders_amount_created ON orders(amount DESC, created_at);
CREATE INDEX idx_orders_status_date ON orders(status, DATE(created_at));

EXPLAIN PLAN

-- Understand query performance
EXPLAIN ANALYZE
SELECT 
  user_id,
  COUNT(*) as order_count,
  SUM(amount) as total_spent
FROM orders
WHERE created_at >= '2026-01-01'
GROUP BY user_id;

Look for:

Seq Scan (slow, full table scan)
Index Scan (fast, uses index)
High cost numbers (slow)

Query Optimization Techniques

-- SLOW: Subquery in SELECT (runs per row)
SELECT 
  user_id,
  (SELECT COUNT(*) FROM orders o2 WHERE o2.user_id = u.id) as orders
FROM users u;

-- FAST: Use JOIN and aggregation
SELECT 
  u.id,
  COUNT(o.id) as orders
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id;

Materialized Views

-- For complex, frequently-used queries
CREATE MATERIALIZED VIEW customer_summary AS
SELECT 
  user_id,
  COUNT(*) as order_count,
  SUM(amount) as total_spent,
  AVG(amount) as avg_order_value,
  MAX(created_at) as last_order_date
FROM orders
GROUP BY user_id;

-- Refresh when data changes
REFRESH MATERIALIZED VIEW customer_summary;

-- Query is now fast
SELECT * FROM customer_summary WHERE total_spent > 1000;

Chapter 9: Real-World Analysis Examples

Example 1: Churn Analysis

-- Identify at-risk customers (no order in last 60 days)
WITH last_order_date AS (
  SELECT 
    user_id,
    MAX(created_at) as last_order,
    CURRENT_DATE - MAX(created_at)::date as days_since_order
  FROM orders
  GROUP BY user_id
)
SELECT 
  u.id,
  u.email,
  u.created_at as signup_date,
  lod.last_order,
  lod.days_since_order,
  CASE 
    WHEN lod.days_since_order > 60 THEN 'At Risk'
    WHEN lod.days_since_order > 30 THEN 'Warning'
    ELSE 'Active'
  END as status
FROM users u
LEFT JOIN last_order_date lod ON u.id = lod.user_id
ORDER BY days_since_order DESC;

Example 2: Product Performance

-- Analyze product performance metrics
SELECT 
  p.id,
  p.name,
  p.category,
  COUNT(o.id) as times_ordered,
  SUM(o.quantity) as total_quantity,
  SUM(o.amount) as total_revenue,
  AVG(o.amount) as avg_order_value,
  ROUND(100.0 * COUNT(DISTINCT o.user_id) / 
    (SELECT COUNT(DISTINCT user_id) FROM users), 2) as percent_of_customers,
  RANK() OVER (PARTITION BY p.category ORDER BY SUM(o.amount) DESC) as rank_in_category
FROM products p
LEFT JOIN orders o ON p.id = o.product_id
GROUP BY p.id, p.name, p.category
ORDER BY total_revenue DESC;

Example 3: Customer Lifetime Value (CLV)

-- Calculate and segment by lifetime value
WITH customer_metrics AS (
  SELECT 
    u.id,
    u.email,
    u.created_at,
    COUNT(o.id) as lifetime_orders,
    SUM(o.amount) as lifetime_value,
    AVG(o.amount) as avg_order_value,
    MAX(o.created_at) as last_purchase,
    CURRENT_DATE - MAX(o.created_at)::date as days_since_purchase
  FROM users u
  LEFT JOIN orders o ON u.id = o.user_id
  GROUP BY u.id, u.email, u.created_at
)
SELECT 
  id,
  email,
  created_at,
  lifetime_orders,
  lifetime_value,
  avg_order_value,
  days_since_purchase,
  CASE 
    WHEN lifetime_value >= (SELECT percentile_cont(0.75) WITHIN GROUP (ORDER BY lifetime_value) FROM customer_metrics) THEN 'VIP'
    WHEN lifetime_value >= (SELECT percentile_cont(0.50) WITHIN GROUP (ORDER BY lifetime_value) FROM customer_metrics) THEN 'Regular'
    WHEN lifetime_value >= (SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY lifetime_value) FROM customer_metrics) THEN 'Casual'
    ELSE 'New/Inactive'
  END as segment
FROM customer_metrics
ORDER BY lifetime_value DESC;

Chapter 10: Performance Patterns

Avoid N+1 Queries

-- WRONG: Multiple queries
-- Query 1: SELECT * FROM users
-- Query 2 (per user): SELECT COUNT(*) FROM orders WHERE user_id = ?
-- Result: 1 + N queries

-- CORRECT: Single query with aggregation
SELECT 
  u.id,
  u.email,
  COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.email;

Use UNION for Multiple Conditions

-- Combine multiple analyses
SELECT 'High Value' as segment, COUNT(*) as count
FROM orders
WHERE amount > 1000
UNION ALL
SELECT 'Medium Value', COUNT(*)
FROM orders
WHERE amount BETWEEN 100 AND 1000
UNION ALL
SELECT 'Low Value', COUNT(*)
FROM orders
WHERE amount < 100;

Batch Comparisons

-- Compare multiple periods at once
SELECT 
  'January 2026' as period,
  COUNT(*) as orders,
  SUM(amount) as revenue
FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2026-02-01'
UNION ALL
SELECT 
  'February 2026',
  COUNT(*),
  SUM(amount)
FROM orders
WHERE created_at >= '2026-02-01' AND created_at < '2026-03-01'
UNION ALL
SELECT 
  'March 2026',
  COUNT(*),
  SUM(amount)
FROM orders
WHERE created_at >= '2026-03-01' AND created_at < '2026-04-01';

Chapter 11: Statistical Analysis with SQL

Percentiles and Quartiles

-- Distribution analysis
SELECT 
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) as q1,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY amount) as median,
  PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) as q3,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) as p95,
  PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY amount) as p99,
  STDDEV(amount) as std_dev,
  AVG(amount) as mean
FROM orders;

Outlier Detection

-- Find statistical outliers
WITH stats AS (
  SELECT 
    AVG(amount) as mean_amount,
    STDDEV(amount) as std_amount
  FROM orders
)
SELECT 
  o.id,
  o.amount,
  ROUND((o.amount - s.mean_amount) / s.std_amount, 2) as z_score
FROM orders o, stats s
WHERE ABS((o.amount - s.mean_amount) / s.std_amount) > 3
ORDER BY z_score DESC;

Correlation Analysis

-- Simple correlation approximation
SELECT 
  SUM((amount - (SELECT AVG(amount) FROM orders)) * 
      (quantity - (SELECT AVG(quantity) FROM orders))) /
  SQRT(
    SUM(POWER(amount - (SELECT AVG(amount) FROM orders), 2)) *
    SUM(POWER(quantity - (SELECT AVG(quantity) FROM orders), 2))
  ) as correlation
FROM orders;

Chapter 12: Building Analytical Dashboards

Executive Summary Query

-- Dashboard: Key metrics
SELECT 
  (SELECT COUNT(*) FROM users) as total_users,
  (SELECT COUNT(*) FROM orders) as total_orders,
  (SELECT SUM(amount) FROM orders) as total_revenue,
  (SELECT AVG(amount) FROM orders) as avg_order_value,
  (SELECT COUNT(DISTINCT user_id) FROM orders) as customers_with_orders,
  (SELECT COUNT(*) FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days') as orders_last_30_days,
  (SELECT SUM(amount) FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days') as revenue_last_30_days;

Trend Report Query

-- Dashboard: Monthly trends
SELECT 
  DATE_TRUNC('month', created_at) as month,
  COUNT(*) as order_count,
  SUM(amount) as revenue,
  COUNT(DISTINCT user_id) as unique_customers,
  ROUND(SUM(amount) / COUNT(*), 2) as avg_order_value,
  LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at)) as prev_month_orders,
  LAG(SUM(amount)) OVER (ORDER BY DATE_TRUNC('month', created_at)) as prev_month_revenue
FROM orders
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month DESC;

Appendix A: SQL Functions Reference

Aggregates:

COUNT(), SUM(), AVG(), MIN(), MAX()
STDDEV(), VARIANCE()
PERCENTILE_CONT(), PERCENTILE_DISC()

Window Functions:

ROW_NUMBER(), RANK(), DENSE_RANK()
LAG(), LEAD()
SUM() OVER(), AVG() OVER()
PARTITION BY, ORDER BY

Date Functions:

DATE_TRUNC(), DATE_PART()
CURRENT_DATE, NOW()
INTERVAL

Conditional:

CASE WHEN ... THEN ... END
COALESCE(), NULLIF()

String:

CONCAT(), SUBSTR(), UPPER(), LOWER()

Appendix B: Query Writing Checklist

Appendix C: Common Query Patterns

Top N per Group:

SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) as rn
  FROM products
) WHERE rn <= 3;

Running Total:

SELECT date, sales, SUM(sales) OVER (ORDER BY date) as cumulative FROM daily_sales;

Period-over-Period:

SELECT month, revenue, 
  LAG(revenue) OVER (ORDER BY month) as prev_month,
  revenue - LAG(revenue) OVER (ORDER BY month) as change
FROM monthly_revenue;

Customer Segments:

SELECT user_id,
  CASE 
    WHEN total_spent > 10000 THEN 'VIP'
    WHEN total_spent > 1000 THEN 'Regular'
    ELSE 'Casual'
  END as segment
FROM (SELECT user_id, SUM(amount) as total_spent FROM orders GROUP BY user_id) sub;

Conclusion: SQL Mastery is a Superpower

Most developers think SQL is just about CRUD operations. Fetch data, manipulate it, move on.

They miss the real power: analytical thinking through SQL.

When you can write complex analytical queries, you become someone who understands their system deeply. You can answer business questions with data. You can spot trends, anomalies, opportunities.

That’s not a technical skill. That’s business acumen.

Start practicing these patterns. Write queries that answer real questions about your business. The deeper you go, the more valuable you become.

SQL is waiting. Master it.