Calculating the average of a PostGIS raster

Aug 19, 2015 · 476 words · 3 minutes read postgispostgresraster

In programming there are many ways to accomplish the same task. Take the case of calculating the average (mean) value of a raster stored in a PostGIS database. The task seems simple enough – surely there is a built-in function to do exactly this.

Such a function, ST_Avg, would be used like this:

SELECT ST_Avg(my_rast) FROM my_table WHERE rast_id = 7514;

however the function ST_Avg does not exist so we cannot calculate the average of the raster directly. Instead we can consider using ST_DumpValues to dump the raster values to an array, which we can then average ourselves (either using subqueries or with Numpy). Alternative we can use ST_SummaryStats and extract the mean.

These are all valid approaches (they all give the same result) so we want to choose the approach that optimizes speed. I selected a raster (1000x1000 16-bytes unsigned integer pixels) from the database at random and used the iPython (Jupyter?) magic command %%timeit to compare 30 runs of each approach.

ST_DumpValues

The ST_DumpValues function returns the raster values in an array. Postgres does not have a built-in function for averaging arrays. But we can still calculate the average using subqueries or pass the array along to Python.

Subqueries

Working our way out from the middle, we use (ST_DumpValues(rast)).valarray to get the array and then have to unnest it in order to calculate the average. I wrote this using two subqueries for readability but it can be condensed to one (Postgres actually does behind the scenes).

query = """
SELECT AVG(values) AS mean FROM (
    SELECT UNNEST(valarray) AS values FROM (
        SELECT (ST_DumpValues(rast)).valarray FROM my_table WHERE ID = 7514)
    AS subquery)
AS subquery;"""
cur.execute(query)
raster = cur.fetchone()
raster['mean']

Results: 183 ms per loop

Numpy

Another approach is to return the array and perform the calculation in Python using Numpy.

query = """SELECT (ST_DumpValues(rast)).valarray FROM my_table WHERE ID = 7514;"""
cur.execute(query)
raster = cur.fetchone()
np.mean(raster['valarray'])

Results: 2.09 s per loop

ST_SummaryStats

The third approach is to extract the mean from the results of ST_SummaryStats.

query = """SELECT (ST_SummaryStats(rast)).mean FROM my_table WHERE ID = 7514;"""
cur.execute(query)
raster = cur.fetchone()
raster['mean']

Results: 50.2 ms per loop

Conclusion

These are just some of the possible ways to calculate the average of a raster in PostGIS. It was surprising to me that PostGIS does not have a ST_Avg function (or similar). Calculating the average of a raster can be thought of as the areal average and is trivial to compute. The ST_SummaryStats function had by far the best performance and admittedly is not too far off from a built-in raster averging function in that it just returns a few extra components. As it turns out ST_SummaryStats isn’t just a convenience function, as I expected, but actually is tuned for performance. Given that this approach is the most readable it was a little surprising that it was also the fastest.