In programming there are many ways to accomplish the same task. Take the case of calculating the average (mean) value of a raster stored in a PostGIS database. The task seems simple enough – surely there is a built-in function to do exactly this.
Such a function,
ST_Avg, would be used like this:
SELECT ST_Avg(my_rast) FROM my_table WHERE rast_id = 7514;
however the function
ST_Avg does not exist so we cannot calculate the average of the raster directly. Instead we can consider using
ST_DumpValues to dump the raster values to an array, which we can then average ourselves (either using subqueries or with Numpy). Alternative we can use
ST_SummaryStats and extract the mean.
These are all valid approaches (they all give the same result) so we want to choose the approach that optimizes speed. I selected a raster (1000x1000 16-bytes unsigned integer pixels) from the database at random and used the iPython (Jupyter?) magic command
%%timeit to compare 30 runs of each approach.
ST_DumpValues function returns the raster values in an array. Postgres does not have a built-in function for averaging arrays. But we can still calculate the average using subqueries or pass the array along to Python.
Working our way out from the middle, we use
(ST_DumpValues(rast)).valarray to get the array and then have to unnest it in order to calculate the average. I wrote this using two subqueries for readability but it can be condensed to one (Postgres actually does behind the scenes).
query = """ SELECT AVG(values) AS mean FROM ( SELECT UNNEST(valarray) AS values FROM ( SELECT (ST_DumpValues(rast)).valarray FROM my_table WHERE ID = 7514) AS subquery) AS subquery;""" cur.execute(query) raster = cur.fetchone() raster['mean']
Results: 183 ms per loop
Another approach is to return the array and perform the calculation in Python using Numpy.
query = """SELECT (ST_DumpValues(rast)).valarray FROM my_table WHERE ID = 7514;""" cur.execute(query) raster = cur.fetchone() np.mean(raster['valarray'])
Results: 2.09 s per loop
The third approach is to extract the mean from the results of
query = """SELECT (ST_SummaryStats(rast)).mean FROM my_table WHERE ID = 7514;""" cur.execute(query) raster = cur.fetchone() raster['mean']
Results: 50.2 ms per loop
These are just some of the possible ways to calculate the average of a raster in PostGIS. It was surprising to me that PostGIS does not have a
ST_Avg function (or similar). Calculating the average of a raster can be thought of as the areal average and is trivial to compute. The
ST_SummaryStats function had by far the best performance and admittedly is not too far off from a built-in raster averging function in that it just returns a few extra components. As it turns out
ST_SummaryStats isn’t just a convenience function, as I expected, but actually is tuned for performance. Given that this approach is the most readable it was a little surprising that it was also the fastest.