Home Services Portfolio Blog Trainings Contact Hire Me

If you write SQL regularly, there's a good chance you've run into situations where GROUP BY forces you to lose row-level detail. You want a total for each department, but you also want to see every employee's salary alongside it. Or you need month-over-month growth without a self-join. Or you need to deduplicate records while keeping the most recent one.

Window functions solve all of these problems. They let you perform calculations across a set of rows that are related to the current row — without collapsing your result set. I use them in almost every analytics project I work on, and in this guide, I'll walk you through the ones that matter most with realistic examples you can adapt to your own data.

For every example below, I'll be working with this sample sales table:

-- Sample data: sales table
-- id | salesperson | region  | sale_date  | amount
-- ---|-------------|---------|------------|-------
--  1 | Alice       | East    | 2025-01-15 |  1200
--  2 | Bob         | West    | 2025-01-18 |   850
--  3 | Alice       | East    | 2025-02-10 |  2100
--  4 | Carol       | East    | 2025-02-14 |  1750
--  5 | Bob         | West    | 2025-02-22 |   900
--  6 | Alice       | East    | 2025-03-05 |  1800
--  7 | Carol       | East    | 2025-03-12 |  2200
--  8 | Bob         | West    | 2025-03-20 |  1100
--  9 | Alice       | East    | 2025-04-02 |  1500
-- 10 | Carol       | East    | 2025-04-18 |  1900
-- 11 | Bob         | West    | 2025-04-25 |  1300
-- 12 | Dave        | West    | 2025-04-28 |   750

1 What Are Window Functions and Why They Matter

A window function performs a calculation across a "window" of rows defined by the OVER() clause. Unlike GROUP BY, it does not collapse rows into groups. Every row in your result set is preserved, and the computed value is added as a new column.

Here's the difference. Suppose you want each salesperson's total alongside their individual sales. With GROUP BY, you'd lose the individual rows:

-- GROUP BY: you get one row per salesperson (individual rows are gone)
SELECT salesperson, SUM(amount) AS total_sales
FROM sales
GROUP BY salesperson;

-- salesperson | total_sales
-- ------------|------------
-- Alice       |        6600
-- Bob         |        4150
-- Carol       |        5850
-- Dave        |         750

With a window function, you keep every row and add the total as a new column:

-- Window function: every row is preserved
SELECT
    salesperson,
    sale_date,
    amount,
    SUM(amount) OVER (PARTITION BY salesperson) AS total_sales
FROM sales;

-- salesperson | sale_date  | amount | total_sales
-- ------------|------------|--------|------------
-- Alice       | 2025-01-15 |   1200 |        6600
-- Alice       | 2025-02-10 |   2100 |        6600
-- Alice       | 2025-03-05 |   1800 |        6600
-- Alice       | 2025-04-02 |   1500 |        6600
-- Bob         | 2025-01-18 |    850 |        4150
-- Bob         | 2025-02-22 |    900 |        4150
-- ...         | ...        |    ... |         ...

That's the core idea. You get the analytical power of aggregation without sacrificing the granularity of your data. This is why window functions are indispensable for reporting, dashboards, and any analysis where context matters.

2 The Anatomy of a Window Function

Every window function follows this general structure:

function_name(...) OVER (
    [PARTITION BY column1, column2, ...]
    [ORDER BY column3, column4, ...]
    [ROWS BETWEEN ... AND ...]
)

Let me break down each component:

  • function_name — The function to apply: ROW_NUMBER(), RANK(), SUM(), LAG(), etc.
  • OVER() — This is what makes it a window function. Without it, SUM(amount) is just a regular aggregate. With OVER(), it becomes a window function.
  • PARTITION BY — Optional. Divides the result set into groups (partitions). The function resets for each partition. Think of it as a "group by" that doesn't collapse rows.
  • ORDER BY — Optional (but required for ranking and offset functions). Defines the order of rows within each partition.
  • ROWS BETWEEN — Optional. Defines the exact "frame" of rows the function considers. Useful for running totals and moving averages.

If you skip PARTITION BY, the window spans the entire result set. If you skip ORDER BY, the function considers all rows in the partition at once (no running calculation).

3 ROW_NUMBER() — Assigning Unique Row Numbers

ROW_NUMBER() assigns a sequential integer to each row within a partition. No ties — every row gets a unique number. This is one of the most useful window functions because it unlocks a powerful deduplication pattern.

Basic Usage

SELECT
    salesperson,
    sale_date,
    amount,
    ROW_NUMBER() OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) AS sale_sequence
FROM sales;

-- salesperson | sale_date  | amount | sale_sequence
-- ------------|------------|--------|---------------
-- Alice       | 2025-01-15 |   1200 |             1
-- Alice       | 2025-02-10 |   2100 |             2
-- Alice       | 2025-03-05 |   1800 |             3
-- Alice       | 2025-04-02 |   1500 |             4
-- Bob         | 2025-01-18 |    850 |             1
-- Bob         | 2025-02-22 |    900 |             2
-- ...

The Deduplication Pattern

This is a pattern I use constantly. Say you have a customer_emails table with duplicate records, and you want to keep only the most recent email per customer:

-- Deduplicate: keep only the latest email per customer
WITH ranked AS (
    SELECT
        customer_id,
        email,
        updated_at,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id
            ORDER BY updated_at DESC
        ) AS rn
    FROM customer_emails
)
SELECT customer_id, email, updated_at
FROM ranked
WHERE rn = 1;

By partitioning by customer_id and ordering by updated_at DESC, the most recent record gets rn = 1. Filter for that, and duplicates disappear. This is cleaner than GROUP BY with a subquery, and it works reliably in every major SQL engine.

4 RANK() and DENSE_RANK() — Ranking With Ties

While ROW_NUMBER() always assigns unique numbers, RANK() and DENSE_RANK() handle ties differently:

  • RANK() — Tied rows get the same rank, but the next rank skips. (1, 2, 2, 4)
  • DENSE_RANK() — Tied rows get the same rank, and the next rank does not skip. (1, 2, 2, 3)

Leaderboard Example

SELECT
    salesperson,
    SUM(amount) AS total_sales,
    RANK() OVER (ORDER BY SUM(amount) DESC) AS rank_position,
    DENSE_RANK() OVER (ORDER BY SUM(amount) DESC) AS dense_rank_position
FROM sales
GROUP BY salesperson;

-- salesperson | total_sales | rank_position | dense_rank_position
-- ------------|-------------|---------------|--------------------
-- Alice       |        6600 |             1 |                  1
-- Carol       |        5850 |             2 |                  2
-- Bob         |        4150 |             3 |                  3
-- Dave        |         750 |             4 |                  4

When would you use each? I use DENSE_RANK() when I want to find the "top N" results without gaps. For example, "give me salespeople in the top 3 tiers" — with RANK(), if two people tie for 2nd, there's no 3rd, and you'd get fewer results than expected. DENSE_RANK() avoids that problem.

Ranking Within Groups

-- Rank salespeople within each region
SELECT
    salesperson,
    region,
    SUM(amount) AS total_sales,
    DENSE_RANK() OVER (
        PARTITION BY region
        ORDER BY SUM(amount) DESC
    ) AS region_rank
FROM sales
GROUP BY salesperson, region;

-- salesperson | region | total_sales | region_rank
-- ------------|--------|-------------|------------
-- Alice       | East   |        6600 |           1
-- Carol       | East   |        5850 |           2
-- Bob         | West   |        4150 |           1
-- Dave        | West   |         750 |           2

5 LAG() and LEAD() — Accessing Previous and Next Rows

LAG() looks at the previous row. LEAD() looks at the next row. These are perfect for calculating differences between consecutive records — think month-over-month growth, day-over-day changes, or time between events.

Month-Over-Month Revenue Growth

WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', sale_date) AS month,
        SUM(amount) AS revenue
    FROM sales
    GROUP BY DATE_TRUNC('month', sale_date)
)
SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
    revenue - LAG(revenue) OVER (ORDER BY month) AS revenue_change,
    ROUND(
        (revenue - LAG(revenue) OVER (ORDER BY month))::DECIMAL
        / LAG(revenue) OVER (ORDER BY month) * 100, 1
    ) AS growth_pct
FROM monthly_revenue;

-- month      | revenue | prev_month_revenue | revenue_change | growth_pct
-- -----------|---------|--------------------|-----------------|-----------
-- 2025-01-01 |    2050 |               NULL |           NULL |       NULL
-- 2025-02-01 |    4750 |               2050 |           2700 |      131.7
-- 2025-03-01 |    5100 |               4750 |            350 |        7.4
-- 2025-04-01 |    5450 |               5100 |            350 |        6.9

The first row has NULL for LAG() because there's no previous row. You can provide a default value: LAG(revenue, 1, 0) would return 0 instead of NULL.

LEAD() for Forward-Looking Analysis

-- How many days until the next sale for each salesperson?
SELECT
    salesperson,
    sale_date,
    amount,
    LEAD(sale_date) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) AS next_sale_date,
    LEAD(sale_date) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) - sale_date AS days_until_next_sale
FROM sales;

-- salesperson | sale_date  | amount | next_sale_date | days_until_next_sale
-- ------------|------------|--------|----------------|---------------------
-- Alice       | 2025-01-15 |   1200 | 2025-02-10     |                  26
-- Alice       | 2025-02-10 |   2100 | 2025-03-05     |                  23
-- Alice       | 2025-03-05 |   1800 | 2025-04-02     |                  28
-- Alice       | 2025-04-02 |   1500 | NULL           |                NULL
-- Bob         | 2025-01-18 |    850 | 2025-02-22     |                  35
-- ...

6 SUM() OVER() — Running Totals and Cumulative Sums

A running total is one of the most common analytics requirements, and SUM() OVER() makes it trivial. The key is combining ORDER BY with the window function — this tells SQL to accumulate the sum as it moves through the ordered rows.

Cumulative Revenue by Date

SELECT
    sale_date,
    salesperson,
    amount,
    SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;

-- sale_date  | salesperson | amount | running_total
-- -----------|-------------|--------|---------------
-- 2025-01-15 | Alice       |   1200 |          1200
-- 2025-01-18 | Bob         |    850 |          2050
-- 2025-02-10 | Alice       |   2100 |          4150
-- 2025-02-14 | Carol       |   1750 |          5900
-- 2025-02-22 | Bob         |    900 |          6800
-- 2025-03-05 | Alice       |   1800 |          8600
-- 2025-03-12 | Carol       |   2200 |         10800
-- 2025-03-20 | Bob         |   1100 |         11900
-- 2025-04-02 | Alice       |   1500 |         13400
-- 2025-04-18 | Carol       |   1900 |         15300
-- 2025-04-25 | Bob         |   1300 |         16600
-- 2025-04-28 | Dave        |    750 |         17350

Running Total Per Salesperson

SELECT
    salesperson,
    sale_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) AS cumulative_sales
FROM sales;

-- salesperson | sale_date  | amount | cumulative_sales
-- ------------|------------|--------|------------------
-- Alice       | 2025-01-15 |   1200 |             1200
-- Alice       | 2025-02-10 |   2100 |             3300
-- Alice       | 2025-03-05 |   1800 |             5100
-- Alice       | 2025-04-02 |   1500 |             6600
-- Bob         | 2025-01-18 |    850 |              850
-- Bob         | 2025-02-22 |    900 |             1750
-- Bob         | 2025-03-20 |   1100 |             2850
-- Bob         | 2025-04-25 |   1300 |             4150
-- ...

When you add ORDER BY inside OVER(), the default frame becomes ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — which is exactly what gives you a running total.

7 AVG() OVER() — Moving Averages

Moving averages smooth out noise in your data and reveal trends. They're essential in financial analysis, sales forecasting, and performance monitoring. You control the window size with the ROWS BETWEEN clause.

3-Sale Moving Average per Salesperson

SELECT
    salesperson,
    sale_date,
    amount,
    ROUND(
        AVG(amount) OVER (
            PARTITION BY salesperson
            ORDER BY sale_date
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ), 0
    ) AS moving_avg_3
FROM sales;

-- salesperson | sale_date  | amount | moving_avg_3
-- ------------|------------|--------|-------------
-- Alice       | 2025-01-15 |   1200 |         1200
-- Alice       | 2025-02-10 |   2100 |         1650
-- Alice       | 2025-03-05 |   1800 |         1700
-- Alice       | 2025-04-02 |   1500 |         1800
-- Bob         | 2025-01-18 |    850 |          850
-- Bob         | 2025-02-22 |    900 |          875
-- Bob         | 2025-03-20 |   1100 |          950
-- Bob         | 2025-04-25 |   1300 |         1100

The ROWS BETWEEN 2 PRECEDING AND CURRENT ROW clause tells SQL to average the current row and the two rows before it. For the first row, where there are fewer than 3 rows available, it averages whatever is there — no NULLs, no errors.

You can also compute a trailing moving average that excludes the current row (ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) or a centered moving average (ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING). The frame clause gives you full control.

8 NTILE() — Bucketing Data Into Groups

NTILE(n) divides an ordered partition into n roughly equal buckets and assigns a bucket number to each row. This is incredibly useful for percentile analysis, tiering customers, or splitting data for A/B testing.

Quartile Analysis of Sales

SELECT
    salesperson,
    sale_date,
    amount,
    NTILE(4) OVER (ORDER BY amount) AS quartile
FROM sales;

-- salesperson | sale_date  | amount | quartile
-- ------------|------------|--------|----------
-- Dave        | 2025-04-28 |    750 |        1
-- Bob         | 2025-01-18 |    850 |        1
-- Bob         | 2025-02-22 |    900 |        1
-- Bob         | 2025-03-20 |   1100 |        2
-- Alice       | 2025-01-15 |   1200 |        2
-- Bob         | 2025-04-25 |   1300 |        2
-- Alice       | 2025-04-02 |   1500 |        3
-- Carol       | 2025-02-14 |   1750 |        3
-- Alice       | 2025-03-05 |   1800 |        3
-- Carol       | 2025-04-18 |   1900 |        4
-- Alice       | 2025-02-10 |   2100 |        4
-- Carol       | 2025-03-12 |   2200 |        4

Customer Tiering

-- Tier customers into Gold, Silver, Bronze based on total spend
WITH customer_spend AS (
    SELECT
        salesperson AS customer,
        SUM(amount) AS total_spend,
        NTILE(3) OVER (ORDER BY SUM(amount) DESC) AS tier
    FROM sales
    GROUP BY salesperson
)
SELECT
    customer,
    total_spend,
    CASE tier
        WHEN 1 THEN 'Gold'
        WHEN 2 THEN 'Silver'
        WHEN 3 THEN 'Bronze'
    END AS tier_label
FROM customer_spend;

-- customer | total_spend | tier_label
-- ---------|-------------|----------
-- Alice    |        6600 | Gold
-- Carol    |        5850 | Silver
-- Bob      |        4150 | Silver
-- Dave     |         750 | Bronze

9 Real-World Example: Complete Sales Analytics Query

Now let's put it all together. This is the kind of query I build for clients who need a comprehensive sales analytics view — combining multiple window functions in a single query to answer several questions at once.

WITH monthly_sales AS (
    SELECT
        salesperson,
        region,
        DATE_TRUNC('month', sale_date) AS sale_month,
        SUM(amount) AS monthly_revenue,
        COUNT(*) AS num_deals
    FROM sales
    GROUP BY salesperson, region, DATE_TRUNC('month', sale_date)
),
enriched AS (
    SELECT
        salesperson,
        region,
        sale_month,
        monthly_revenue,
        num_deals,

        -- Running total per salesperson
        SUM(monthly_revenue) OVER (
            PARTITION BY salesperson
            ORDER BY sale_month
        ) AS ytd_revenue,

        -- Month-over-month change
        monthly_revenue - LAG(monthly_revenue) OVER (
            PARTITION BY salesperson
            ORDER BY sale_month
        ) AS mom_change,

        -- Rank within region for each month
        DENSE_RANK() OVER (
            PARTITION BY region, sale_month
            ORDER BY monthly_revenue DESC
        ) AS region_rank,

        -- Percentage of region total
        ROUND(
            monthly_revenue::DECIMAL
            / SUM(monthly_revenue) OVER (PARTITION BY region, sale_month)
            * 100, 1
        ) AS pct_of_region,

        -- Overall row number for ordering
        ROW_NUMBER() OVER (
            PARTITION BY salesperson
            ORDER BY sale_month
        ) AS month_seq

    FROM monthly_sales
)
SELECT
    salesperson,
    region,
    sale_month,
    monthly_revenue,
    num_deals,
    ytd_revenue,
    mom_change,
    region_rank,
    pct_of_region,
    month_seq
FROM enriched
ORDER BY region, sale_month, region_rank;

This single query gives you:

  • Monthly revenue per salesperson with deal count
  • Year-to-date cumulative revenue (running total)
  • Month-over-month change to spot growth or decline
  • Ranking within each region per month
  • Percentage contribution to the regional total
  • A clean sequence number for each salesperson's timeline

Without window functions, building this would require multiple self-joins, correlated subqueries, and probably a headache. With window functions, it's a single pass through the data — readable, maintainable, and fast.

10 Performance Tips and Common Pitfalls

Window functions are powerful, but they can also trip you up if you're not careful. Here are the things I've learned from using them on real production data sets.

Performance Tips

  1. Index your PARTITION BY and ORDER BY columns. Window functions need to sort data. If those columns are indexed, the database can skip a full sort. This is the single biggest performance lever you have.
  2. Pre-aggregate before windowing. If you're computing window functions on top of millions of rows, aggregate to a coarser grain first (like monthly totals), then apply the window function. This reduces the window size dramatically.
  3. Reuse the same window definition. If multiple functions share the same OVER() clause, use a named window to avoid redundant sorts:
    SELECT
        salesperson,
        sale_date,
        amount,
        ROW_NUMBER() OVER w AS rn,
        SUM(amount) OVER w AS running_total,
        LAG(amount) OVER w AS prev_amount
    FROM sales
    WINDOW w AS (PARTITION BY salesperson ORDER BY sale_date);
  4. Be explicit about frame clauses. The default frame depends on whether you have ORDER BY in the OVER() clause. With ORDER BY, the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Without it, it's the entire partition. If you want a specific behavior, spell it out.

Common Pitfalls

  • Filtering on window function results in WHERE. You cannot use a window function in a WHERE clause. Wrap it in a CTE or subquery and filter on the outer query. This is why the deduplication pattern uses WITH ranked AS (...) SELECT ... WHERE rn = 1.
  • Confusing RANK() and ROW_NUMBER() for deduplication. If you're deduplicating, always use ROW_NUMBER(). RANK() can assign the same number to tied rows, which means your WHERE rn = 1 filter might return multiple rows per partition.
  • Forgetting that ORDER BY inside OVER() changes the frame. SUM(amount) OVER (PARTITION BY region) gives you a partition total. SUM(amount) OVER (PARTITION BY region ORDER BY sale_date) gives you a running total. Same function, very different results — and it catches people off guard.
  • Using window functions with DISTINCT. Mixing SELECT DISTINCT with window functions often produces unexpected results because DISTINCT is applied after the window function. If you need distinct values, deduplicate in a CTE first.

When in doubt, start with a small result set and verify your window function output row by row. It's much easier to debug window functions on 20 rows than on 2 million.

Need Help Building Analytical Queries?

If you're working with complex data and need help writing performant SQL — whether it's window functions, CTEs, data pipelines, or full analytics dashboards — I'd love to help. I've built SQL solutions for businesses of all sizes on Upwork, from ad-hoc reporting queries to production-grade data warehouses.

Get in touch for a free consultation, or hire me on Upwork to get started right away.

All Articles Hire Me

Want Data Tips Delivered to Your Inbox?

Subscribe to the ChromiumData newsletter for weekly insights, tutorials, and data tips straight to your inbox.

Subscribe Now