The Fastest Way to Get the Most Recent Row Per Group in Postgres

Jan 7, 2023 · 364 words · 2 minutes read datapostgressql

A common problem for backend engineers, data scientists, and data engineers is to find the most recent record for each group, or a single group. For example, the database might have a table of user login events and you want to find the latest login for each user.

Often people will write a SQL query with the ROW NUMBER window function in a CTE or sub query , but it is faster to use SELECT DISTINCT ON.


If you want to test this for yourself, you can follow along

First we need some data. We can generate synthetic data for testing and insert it into a table.

CREATE TABLE login_event AS
  random() AS user,
  random() AS timestamp
FROM generate_series(1, 1000000);

Then we can benchmark the queries below by running them with EXPLAIN ANALYZE.

The Slow Way

The often recommended, yet slow, method is to use a CTE or sub query to add a row number (ordered by the timestamp column) and then select the rows where the row number is one.

Using our login_event table, the query would look like this:

WITH tbl AS (
    row number() over(partition by user order by timestamp desc) as rownum
  FROM login_event
  GROUP BY user
  ORDER BY timestamp DESC
FROM tbl
WHERE rownum=1;

The Fast Way

The fastest way to select the most recent row is to use SELECT DISTINCT ON. To use SELECT DISTINCT ON, provide which column(s) should be used for the DISTINCT clause (eg user). Then, order the table by the DISTINCT column and the timestamp column. For example,

FROM login_event
ORDER BY user, timestamp DESC;

This is not only faster, but also less code!

The Fastest Way

Notice when we created the table, we didn’t add indexes. Using the proper indexes can dramatically speed up the queries.

CREATE INDEX login_event_user_six ON login_event (user);
CREATE INDEX login_event_user_timestamp_desc_fix ON login_event (timestamp DESC);

Adding these indexes changes the slow query time from X to Y and the fast query time from X to Y.


The fastest way to get the most recent row per group is to use SELECT DISTINCT ON.