# Calculating the average of a PostGIS raster

## Aug 19, 2015 · 476 words · 3 minutes read

In programming there are many ways to accomplish the same task. Take the case of calculating the average (mean) value of a raster stored in a PostGIS database. The task seems simple enough – surely there is a built-in function to do *exactly* this.

Such a function, `ST_Avg`

, would be used like this:

```
SELECT ST_Avg(my_rast) FROM my_table WHERE rast_id = 7514;
```

however the function `ST_Avg`

does not exist so we cannot calculate the average of the raster directly. Instead we can consider using `ST_DumpValues`

to dump the raster values to an array, which we can then average ourselves (either using subqueries or with Numpy). Alternative we can use `ST_SummaryStats`

and extract the mean.

These are all valid approaches (they all give the same result) so we want to choose the approach that optimizes speed. I selected a raster (1000x1000 16-bytes unsigned integer pixels) from the database at random and used the iPython (Jupyter?) magic command `%%timeit`

to compare 30 runs of each approach.

## ST_DumpValues

The `ST_DumpValues`

function returns the raster values in an array. Postgres does not have a built-in function for averaging arrays. But we can still calculate the average using subqueries or pass the array along to Python.

### Subqueries

Working our way out from the middle, we use `(ST_DumpValues(rast)).valarray`

to get the array and then have to unnest it in order to calculate the average. I wrote this using two subqueries for readability but it can be condensed to one (Postgres actually does behind the scenes).

```
query = """
SELECT AVG(values) AS mean FROM (
SELECT UNNEST(valarray) AS values FROM (
SELECT (ST_DumpValues(rast)).valarray FROM my_table WHERE ID = 7514)
AS subquery)
AS subquery;"""
cur.execute(query)
raster = cur.fetchone()
raster['mean']
```

Results: 183 ms per loop

### Numpy

Another approach is to return the array and perform the calculation in Python using Numpy.

```
query = """SELECT (ST_DumpValues(rast)).valarray FROM my_table WHERE ID = 7514;"""
cur.execute(query)
raster = cur.fetchone()
np.mean(raster['valarray'])
```

Results: 2.09 s per loop

## ST_SummaryStats

The third approach is to extract the mean from the results of `ST_SummaryStats`

.

```
query = """SELECT (ST_SummaryStats(rast)).mean FROM my_table WHERE ID = 7514;"""
cur.execute(query)
raster = cur.fetchone()
raster['mean']
```

Results: 50.2 ms per loop

# Conclusion

These are just some of the possible ways to calculate the average of a raster in PostGIS. It was surprising to me that PostGIS does not have a `ST_Avg`

function (or similar). Calculating the average of a raster can be thought of as the areal average and is trivial to compute. The `ST_SummaryStats`

function had by far the best performance and admittedly is not too far off from a built-in raster averging function in that it just returns a few extra components. As it turns out `ST_SummaryStats`

isn’t just a convenience function, as I expected, but actually is tuned for performance. Given that this approach is the most readable it was a little surprising that it was also the fastest.