Calculating the average of a PostGIS raster

Aug 19, 2015 · 476 words · 3 minutes read postgispostgresraster

In programming there are many ways to accomplish the same task. Take the case of calculating the average (mean) value of a raster stored in a PostGIS database. The task seems simple enough – surely there is a built-in function to do exactly this.

Such a function, ST_Avg, would be used like this:

SELECT ST_Avg(my_rast) FROM my_table WHERE rast_id = 7514;

however the function ST_Avg does not exist so we cannot calculate the average of the raster directly. Instead we can consider using ST_DumpValues to dump the raster values to an array, which we can then average ourselves (either using subqueries or with Numpy). Alternative we can use ST_SummaryStats and extract the mean.

These are all valid approaches (they all give the same result) so we want to choose the approach that optimizes speed. I selected a raster (1000x1000 16-bytes unsigned integer pixels) from the database at random and used the iPython (Jupyter?) magic command %%timeit to compare 30 runs of each approach.


The ST_DumpValues function returns the raster values in an array. Postgres does not have a built-in function for averaging arrays. But we can still calculate the average using subqueries or pass the array along to Python.


Working our way out from the middle, we use (ST_DumpValues(rast)).valarray to get the array and then have to unnest it in order to calculate the average. I wrote this using two subqueries for readability but it can be condensed to one (Postgres actually does behind the scenes).

query = """
SELECT AVG(values) AS mean FROM (
    SELECT UNNEST(valarray) AS values FROM (
        SELECT (ST_DumpValues(rast)).valarray FROM my_table WHERE ID = 7514)
    AS subquery)
AS subquery;"""
raster = cur.fetchone()

Results: 183 ms per loop


Another approach is to return the array and perform the calculation in Python using Numpy.

query = """SELECT (ST_DumpValues(rast)).valarray FROM my_table WHERE ID = 7514;"""
raster = cur.fetchone()

Results: 2.09 s per loop


The third approach is to extract the mean from the results of ST_SummaryStats.

query = """SELECT (ST_SummaryStats(rast)).mean FROM my_table WHERE ID = 7514;"""
raster = cur.fetchone()

Results: 50.2 ms per loop


These are just some of the possible ways to calculate the average of a raster in PostGIS. It was surprising to me that PostGIS does not have a ST_Avg function (or similar). Calculating the average of a raster can be thought of as the areal average and is trivial to compute. The ST_SummaryStats function had by far the best performance and admittedly is not too far off from a built-in raster averging function in that it just returns a few extra components. As it turns out ST_SummaryStats isn’t just a convenience function, as I expected, but actually is tuned for performance. Given that this approach is the most readable it was a little surprising that it was also the fastest.