Mar 21, 2017 · 259 words · 2 minutes read data processingpython

Recently I was faced with the problem of reshaping a single long vector into an array of overlapping sequences. There are a lot of different ways to tackle this problem but I wanted a fast and elegant solution that could generalize to different window lengths and step sizes.

For a concrete example, suppose we have a vector with digits 1 to 100:

We want to cut it into overlapping sequences with some step size, so the last element of one vector is the first element of the next.

The simple, straightforward way to do this would be to iteratively index the vector, incrementing the index positions. For example, start by indexing 0-9, then add 9, and continue until you’ve reached the end of the vector.

But there is a better way!

def rolling_window(a, window, step=1):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)[::step]

The function takes a numpy array, window length, and step size and returns a new array. The array has one more dimension than the original array shape. The window and step size determine the exact shape. If the step size is one then the output is the same as a moving window and if the step size is the same as the window size then the sequences don’t overlap.

This is based on a stackoverflow answer to using strides for an efficient moving average filter. But also checkout these other posts: