acgc.stats.timeseries

timeseries analysis functions

  1#!/usr/bin/env python3
  2'''timeseries analysis functions'''
  3
  4import numpy as np
  5import pandas as pd
  6import statsmodels.api as sm
  7from statsmodels.tsa.seasonal import STL, DecomposeResult
  8from statsmodels.nonparametric.smoothers_lowess import lowess
  9
 10__all__ = ['STL_harmonic',
 11           'STL_smoother',
 12           'DecomposeResult_to_DataFrame',
 13           'boxcar',
 14           'boxcarpoly']
 15
 16def STL_smoother( data, period=None, smooth=4, **kwargs):
 17    '''STL decomposition of time series into trend, seasonal, and residual components
 18    with additional smoothing of the seasonal component
 19    
 20    Parameters
 21    ----------
 22    data : array_like
 23        time series data to be decomposed
 24    period : int
 25        The length of the seasonal cycle
 26    smooth : int, optional
 27        length of the lowess filter applied to the STL seasonal cycle, default is 4
 28    **kwargs :
 29        Additional keyword arguments to pass to the STL function. 
 30        For example, robust=True. See the statsmodels documentation for more details
 31        
 32    Returns
 33    -------
 34    result : DecomposeResult
 35        object containing the original data, seasonal component, trend component, residuals, and weights (if robust=True)
 36    '''
 37
 38    result = STL(data, period, **kwargs).fit()
 39    
 40    #Additional smoothing of the seasonal component
 41    seasonal_smooth = lowess(result.seasonal, 
 42                             result.seasonal.index, 
 43                             frac=smooth/len(data), 
 44                             return_sorted=False)
 45    seasonal_smooth = pd.Series(seasonal_smooth, 
 46                                index=result.seasonal.index,
 47                                name='seasonal')
 48
 49    # Adjust residuals
 50    resid_adjusted = result.resid + (result.seasonal - seasonal_smooth)
 51    resid_adjusted = resid_adjusted.rename('resid')
 52    
 53    return DecomposeResult(data, 
 54                            seasonal_smooth, 
 55                            result.trend, 
 56                            resid_adjusted, 
 57                            weights=result.weights)
 58
 59def STL_harmonic( y, x=None, data=None, 
 60         period=None, trend_ratio=1.5,
 61         nharmonics=4, 
 62         maxiter=4, rtol=1e-2, robust=False, verbose=False,
 63         **kwargs):
 64    '''Decompose time series into seasonal, trend, and residual components 
 65    using harmonic regression for the seasonal component and lowess smoothing/filtering to separate the trend and residual components.
 66    
 67    This is similar STL decomposition with the following differences:
 68    * The seasonal component is modeled using harmonic regression, which guarantees a smooth seasonal cycle. 
 69    * The seasonal cycle is fixed in this decomposition, while it can vary in STL. 
 70    * STL requires regularly spaced data, while this decomposition can handle irregularly spaced data, 
 71    as long as the time variable `x` is provided. Nevertheless, the quality is likely to be better if the time series has no large gaps.
 72
 73    Like STL, the decomposition is iterative. The seasonal cycle is fitted to detredended data,
 74    then the seasonal cycle is subtracted from the data. Lowess is applied to the deseasonalized data to estimate
 75    the trend. The updated trend is subtracted from the original data and an updated seasonal cycle
 76    is fitted to the updated detrended data. This process is repeated until the seasonal cycle converges 
 77    or the maximum number of iterations is reached.
 78
 79    Parameters
 80    ----------
 81    y : array-like, pandas.Series or str
 82        The time series to decompose. If a string, it is treated as a column name in `data`.
 83    x : array-like or str, optional
 84        The time variable corresponding to `y`. If a string, it is treated as a column name in `data`. 
 85        If None (default), the index of `y` is used if it has one, otherwise a range from 0 to len(y)-1 is used.
 86    data : DataFrame, optional
 87        A DataFrame containing the data. Required if `y` or `x` are strings referring to column names.
 88        Not needed if `y` and `x` are array-like.
 89    period :  float or np.timedelta64 or pd.Timedelta
 90        The length of the seasonal cycle in the same units as `x`. 
 91        If `x` is datetime-like, `period` should be a np.timedelta64 or pandas.Timedelta, e.g. `pd.Timedelta(days=365.25)` for annual seasonal cycles.
 92        If `x` is in months, `period` should be 12, and if `x` is in days, then `period` should be 365.25.
 93    trend_ratio : float, optional
 94        The ratio of the trend smoother span to the seasonal cycle length (period). Default is 1.5, which means the trend smoother span will be 1.5 times the period.
 95    nharmonics : int, optional
 96        The number of harmonics to include in the seasonal model. Default is 4, which includes the annual, semiannual, triannual, and quadriannual harmonics for yearly seasonality.
 97    maxiter : int, optional
 98        The maximum number of iterations for the decomposition. Default is 4.
 99    rtol : float, optional
100        The relative tolerance for convergence of the seasonal component. Default is 1e-2.
101    robust : bool or int, optional
102        Specifies whether robust lowess fitting should be used for trend smoothing.
103        If False (default), no robust fitting is done. If True, 3 iterations of robust fitting are done within the lowess filter. 
104        If an integer >= 1, lowess uses that many iterations of robust fitting.
105    verbose : bool, optional
106        If True, prints convergence information. Default is False.
107    **kwargs :
108        Additional keyword arguments to pass to the lowess function for trend smoothing. For example, `
109    
110    Returns
111    -------
112    DecomposeResult
113        An object containing the original data, seasonal component, trend component, and residuals.
114    '''
115
116    if period is None:
117        raise ValueError('Keyword `period` must be specified')
118
119    # If y is a string, treat it as a column name in data
120    if isinstance(y, str):
121        try:
122            y = data[y]
123        except KeyError:
124            raise ValueError(f'y must be a 1-D array or a column name in data. {y} not found in data.')
125
126    # If x is a string, treat it as a column name in data.
127    if isinstance(x, str):
128        try:
129            x = data[x]
130        except KeyError:
131            raise ValueError(f'x must be a 1-D array or a column name in data. {x} not found in data.')
132    # If x is None, use the index of y or a range if y has no index.
133    elif x is None:
134        try:
135            x = y.index
136        except AttributeError:
137            x = np.arange(len(y))
138
139    # Ensure nharmonics is a positive integer
140    if not isinstance(nharmonics, int) or nharmonics < 1:
141        raise ValueError('nharmonics must be a positive integer')
142
143    # Robust fitting iterations
144    if robust is False:
145        robust_iterations=1
146    elif robust is True:
147        robust_iterations=3
148    elif isinstance(robust, int) and robust >= 1:
149        robust_iterations=robust
150        robust = True
151    else:
152        raise ValueError('Invalid value for robust parameter. Must be True, False, or an integer >= 1.')
153
154    # Normalize x, so that it is the number of periods since the start of the time series
155    # xnorm = ( x - np.min(x) ) / period
156    xnorm = ( x - x[0] ) / period
157
158    # Initialize trend and seasonal_old
159    trend = np.ones_like(y)*np.mean(y)
160    seasonal_old = np.zeros_like(y)
161    converged = False
162
163    for i in range(1,maxiter+1):
164          
165        # Detrend
166        detrended = y - trend
167
168        # Fit seasonal cycle with nharmonics harmonics (annual, semiannual, triannual, etc.)
169        X = np.column_stack([func(2*i*np.pi*xnorm) for i in range(1, nharmonics+1) for func in (np.sin, np.cos)])
170        if robust:
171            res = sm.RLM(detrended, X, M=sm.robust.norms.TukeyBiweight()).fit()
172        else:
173            res = sm.OLS(detrended, X).fit()
174        seasonal = res.predict(X)
175
176        # Update trend by fitting lowess to data - seasonal
177        trend = lowess(y-seasonal, xnorm, 
178                       frac = trend_ratio/np.max(xnorm), it=robust_iterations,
179                       #    frac=ratio*period/len(y), it=robust_iterations,
180                       return_sorted=False, **kwargs)
181        
182        # Exit if seasonal cycle has converged
183        if np.max(np.abs(seasonal - seasonal_old)) < rtol*np.mean(np.abs(seasonal)):
184            converged = True
185            break
186
187        # Update seasonal_old for next iteration
188        seasonal_old = seasonal    
189
190    # Print convergence message if verbose
191    if verbose:
192        if converged:
193            print(f'STL_harmonic converged in {i} iterations')
194        else:
195            print(f'STL_harmonic did not converge in {maxiter} iterations')
196
197    # Compute residual
198    resid = y - trend - seasonal
199
200    # Weights are not computed in this implementation, so set to NaN
201    weights = np.full_like(resid, np.nan)
202
203    if isinstance(y, pd.Series):
204        seasonal = pd.Series(seasonal, index=y.index, name='seasonal')
205        trend = pd.Series(trend, index=y.index, name='trend')
206        resid = pd.Series(resid, index=y.index, name='resid')
207        weights = pd.Series(weights, index=y.index, name='weights')
208
209    # return trend, seasonal, resid
210    return DecomposeResult(y,
211                           seasonal, 
212                           trend, 
213                           resid,
214                           weights)
215
216def DecomposeResult_to_DataFrame(result):
217    '''Convert a DecomposeResult object to a pandas DataFrame.
218    
219    Parameters
220    ----------
221    result : DecomposeResult
222    
223    Returns
224    -------
225    DataFrame
226        A DataFrame with columns 'original', 'seasonal', 'trend', 'residual', and 'weights'.
227    '''
228    try:
229        index = result.observed.index
230    except AttributeError:
231        index = np.arange(len(result.data))
232    return pd.DataFrame({'original': result.observed, 
233                         'seasonal': result.seasonal, 
234                         'trend': result.trend, 
235                         'residual': result.resid, 
236                         'weights': result.weights}, 
237                         index=index)
238
239
240def boxcarpoly( array, width, order=2, align='center'):
241    '''Calculate a boxcar polynomial (i.e. running polynomial fit) of an array. 
242    
243    See `boxcar` for parameter definitions
244    '''
245    return boxcar( array, width, order=order, align=align, method='polynomial')
246
247def boxcar( array, width, align='center', method='mean', order=2 ):
248    '''Calculate a boxcar average (i.e. running mean, median, or polynomial fit) of an array. 
249    
250    Elements of input array are assumed to be equally spaced along an (unspecified) 
251    coordinate dimension.
252
253    A centered boxcar average with even widths is traditionally
254    not allowed. This BOXCAR program allows even widths by
255    giving half weights to elements on either end of the
256    averaging window and full weights to all others. A centered
257    average with even width therefore uses uses WIDTH+1
258    elements in its averaging kernel.  
259
260    Parameters
261    ----------
262    array : array of float (N,)
263        1D array of values to average
264    width : int
265        number of elements to include in the average. 
266    align : str
267        specifies how the averaging window for each element of the output array
268        aligns with the input array. 
269        Values: center (default), forward (trailing elements), backward (leading elements)
270    method : {'mean' (default), 'median', 'polynomial'}
271        specifies the averaging kernel function
272    order : int, default=2
273        specifies the polynomial order to fit within each boxcar window.
274        This has no effect unless `method`='polynomial'
275        A parabola is order=2; order=0 is equivalent to method="mean"
276            
277    Returns
278    -------
279    result : array of float (N,)
280        1D array with averages with same size as input array. 
281        Elements on either end may be NaN
282    '''
283
284    N = array.size
285
286    # Initialize smoothed array
287    smootharray = np.empty_like(array)
288    smootharray[:] = np.NaN
289
290    # uniform averaging kernel
291    kernel = np.ones(width)
292
293    # Setup for backward boxcar
294    if align=='backward':
295        # Half widths before and after element I
296
297        HW1 = width-1
298        HW2 = width-HW1-1
299
300    # Setup for forward boxcar
301    elif align=='forward':
302        HW1 = 0
303        HW2 = width-HW1-1
304
305    # Setup for centered boxcar
306    elif align=='center':    
307        # Separate treatments for odd and even widths
308        if np.mod(width,2)==1:
309            # Odd widths
310            HW1 = np.floor(width/2)
311            HW2 = width-HW1-1
312
313        else:
314            # Even widths
315            HW1 = width/2
316            HW2 = width-HW1
317
318            # Uniform kernel in middle
319            kernel = np.ones(width+1)
320
321            # Half weight kernel for ends
322            kernel[0] = 0.5
323            kernel[width] = 0.5
324
325            # Normalize the kernel
326            kernel=kernel/kernel.sum()
327    else:
328        raise ValueError( f'align={align} not implemented. '
329                         + 'Value should be "center", "forward" or "backward".' )
330
331    # Convert to integer type
332    HW1 = int(HW1)
333    HW2 = int(HW2)
334
335    # Do boxcar
336    for i in range(HW1,N-HW2-1):
337
338        # Sub array that we operate on
339        sub = array[i-HW1:i+HW2+1]
340
341        if method=='median':
342            # Running median
343            smootharray[i] = np.nanmedian(sub)
344
345        elif method=='mean':
346            # Running mean
347            # Kernel average, only over finite elements
348            #(NaNs removed from numerator and denominator
349            smootharray[i] = np.nansum( sub*kernel ) / np.sum(kernel[np.where(np.isfinite(sub))])
350
351        elif method=='polynomial':
352
353            # Local x coordinate
354            x = np.arange(-HW1,HW2+1)
355
356            # Fit with polynomial of specified order
357            # Avoid NaNs
358            idx = np.isfinite(sub)
359
360            # number of valid points (non NaN)
361            nvalid = np.sum(idx)
362
363            if nvalid==0:
364                # smoothed value is NaN when there are no valid points
365                smootharray[i] = np.nan
366
367            else:
368                # A polynomial fit requires at least (order+1) points.
369                # When there are fewer points, use the highest possible order.
370                ordmax = np.max([np.min([order,np.sum(idx)-1]),0])
371
372                p = np.polyfit(x[idx],sub[idx],ordmax,w=kernel[idx])
373
374                # The fitted value at local x=0 is just the constant term
375                smootharray[i] = p[order]
376
377        else:
378            raise ValueError( f'method={method} not implemented. '
379                             + 'Value should be "mean", "median" or "polynomial"')
380
381    return smootharray
def STL_harmonic( y, x=None, data=None, period=None, trend_ratio=1.5, nharmonics=4, maxiter=4, rtol=0.01, robust=False, verbose=False, **kwargs):
 60def STL_harmonic( y, x=None, data=None, 
 61         period=None, trend_ratio=1.5,
 62         nharmonics=4, 
 63         maxiter=4, rtol=1e-2, robust=False, verbose=False,
 64         **kwargs):
 65    '''Decompose time series into seasonal, trend, and residual components 
 66    using harmonic regression for the seasonal component and lowess smoothing/filtering to separate the trend and residual components.
 67    
 68    This is similar STL decomposition with the following differences:
 69    * The seasonal component is modeled using harmonic regression, which guarantees a smooth seasonal cycle. 
 70    * The seasonal cycle is fixed in this decomposition, while it can vary in STL. 
 71    * STL requires regularly spaced data, while this decomposition can handle irregularly spaced data, 
 72    as long as the time variable `x` is provided. Nevertheless, the quality is likely to be better if the time series has no large gaps.
 73
 74    Like STL, the decomposition is iterative. The seasonal cycle is fitted to detredended data,
 75    then the seasonal cycle is subtracted from the data. Lowess is applied to the deseasonalized data to estimate
 76    the trend. The updated trend is subtracted from the original data and an updated seasonal cycle
 77    is fitted to the updated detrended data. This process is repeated until the seasonal cycle converges 
 78    or the maximum number of iterations is reached.
 79
 80    Parameters
 81    ----------
 82    y : array-like, pandas.Series or str
 83        The time series to decompose. If a string, it is treated as a column name in `data`.
 84    x : array-like or str, optional
 85        The time variable corresponding to `y`. If a string, it is treated as a column name in `data`. 
 86        If None (default), the index of `y` is used if it has one, otherwise a range from 0 to len(y)-1 is used.
 87    data : DataFrame, optional
 88        A DataFrame containing the data. Required if `y` or `x` are strings referring to column names.
 89        Not needed if `y` and `x` are array-like.
 90    period :  float or np.timedelta64 or pd.Timedelta
 91        The length of the seasonal cycle in the same units as `x`. 
 92        If `x` is datetime-like, `period` should be a np.timedelta64 or pandas.Timedelta, e.g. `pd.Timedelta(days=365.25)` for annual seasonal cycles.
 93        If `x` is in months, `period` should be 12, and if `x` is in days, then `period` should be 365.25.
 94    trend_ratio : float, optional
 95        The ratio of the trend smoother span to the seasonal cycle length (period). Default is 1.5, which means the trend smoother span will be 1.5 times the period.
 96    nharmonics : int, optional
 97        The number of harmonics to include in the seasonal model. Default is 4, which includes the annual, semiannual, triannual, and quadriannual harmonics for yearly seasonality.
 98    maxiter : int, optional
 99        The maximum number of iterations for the decomposition. Default is 4.
100    rtol : float, optional
101        The relative tolerance for convergence of the seasonal component. Default is 1e-2.
102    robust : bool or int, optional
103        Specifies whether robust lowess fitting should be used for trend smoothing.
104        If False (default), no robust fitting is done. If True, 3 iterations of robust fitting are done within the lowess filter. 
105        If an integer >= 1, lowess uses that many iterations of robust fitting.
106    verbose : bool, optional
107        If True, prints convergence information. Default is False.
108    **kwargs :
109        Additional keyword arguments to pass to the lowess function for trend smoothing. For example, `
110    
111    Returns
112    -------
113    DecomposeResult
114        An object containing the original data, seasonal component, trend component, and residuals.
115    '''
116
117    if period is None:
118        raise ValueError('Keyword `period` must be specified')
119
120    # If y is a string, treat it as a column name in data
121    if isinstance(y, str):
122        try:
123            y = data[y]
124        except KeyError:
125            raise ValueError(f'y must be a 1-D array or a column name in data. {y} not found in data.')
126
127    # If x is a string, treat it as a column name in data.
128    if isinstance(x, str):
129        try:
130            x = data[x]
131        except KeyError:
132            raise ValueError(f'x must be a 1-D array or a column name in data. {x} not found in data.')
133    # If x is None, use the index of y or a range if y has no index.
134    elif x is None:
135        try:
136            x = y.index
137        except AttributeError:
138            x = np.arange(len(y))
139
140    # Ensure nharmonics is a positive integer
141    if not isinstance(nharmonics, int) or nharmonics < 1:
142        raise ValueError('nharmonics must be a positive integer')
143
144    # Robust fitting iterations
145    if robust is False:
146        robust_iterations=1
147    elif robust is True:
148        robust_iterations=3
149    elif isinstance(robust, int) and robust >= 1:
150        robust_iterations=robust
151        robust = True
152    else:
153        raise ValueError('Invalid value for robust parameter. Must be True, False, or an integer >= 1.')
154
155    # Normalize x, so that it is the number of periods since the start of the time series
156    # xnorm = ( x - np.min(x) ) / period
157    xnorm = ( x - x[0] ) / period
158
159    # Initialize trend and seasonal_old
160    trend = np.ones_like(y)*np.mean(y)
161    seasonal_old = np.zeros_like(y)
162    converged = False
163
164    for i in range(1,maxiter+1):
165          
166        # Detrend
167        detrended = y - trend
168
169        # Fit seasonal cycle with nharmonics harmonics (annual, semiannual, triannual, etc.)
170        X = np.column_stack([func(2*i*np.pi*xnorm) for i in range(1, nharmonics+1) for func in (np.sin, np.cos)])
171        if robust:
172            res = sm.RLM(detrended, X, M=sm.robust.norms.TukeyBiweight()).fit()
173        else:
174            res = sm.OLS(detrended, X).fit()
175        seasonal = res.predict(X)
176
177        # Update trend by fitting lowess to data - seasonal
178        trend = lowess(y-seasonal, xnorm, 
179                       frac = trend_ratio/np.max(xnorm), it=robust_iterations,
180                       #    frac=ratio*period/len(y), it=robust_iterations,
181                       return_sorted=False, **kwargs)
182        
183        # Exit if seasonal cycle has converged
184        if np.max(np.abs(seasonal - seasonal_old)) < rtol*np.mean(np.abs(seasonal)):
185            converged = True
186            break
187
188        # Update seasonal_old for next iteration
189        seasonal_old = seasonal    
190
191    # Print convergence message if verbose
192    if verbose:
193        if converged:
194            print(f'STL_harmonic converged in {i} iterations')
195        else:
196            print(f'STL_harmonic did not converge in {maxiter} iterations')
197
198    # Compute residual
199    resid = y - trend - seasonal
200
201    # Weights are not computed in this implementation, so set to NaN
202    weights = np.full_like(resid, np.nan)
203
204    if isinstance(y, pd.Series):
205        seasonal = pd.Series(seasonal, index=y.index, name='seasonal')
206        trend = pd.Series(trend, index=y.index, name='trend')
207        resid = pd.Series(resid, index=y.index, name='resid')
208        weights = pd.Series(weights, index=y.index, name='weights')
209
210    # return trend, seasonal, resid
211    return DecomposeResult(y,
212                           seasonal, 
213                           trend, 
214                           resid,
215                           weights)

Decompose time series into seasonal, trend, and residual components using harmonic regression for the seasonal component and lowess smoothing/filtering to separate the trend and residual components.

This is similar STL decomposition with the following differences:

  • The seasonal component is modeled using harmonic regression, which guarantees a smooth seasonal cycle.
  • The seasonal cycle is fixed in this decomposition, while it can vary in STL.
  • STL requires regularly spaced data, while this decomposition can handle irregularly spaced data, as long as the time variable x is provided. Nevertheless, the quality is likely to be better if the time series has no large gaps.

Like STL, the decomposition is iterative. The seasonal cycle is fitted to detredended data, then the seasonal cycle is subtracted from the data. Lowess is applied to the deseasonalized data to estimate the trend. The updated trend is subtracted from the original data and an updated seasonal cycle is fitted to the updated detrended data. This process is repeated until the seasonal cycle converges or the maximum number of iterations is reached.

Parameters
  • y (array-like, pandas.Series or str): The time series to decompose. If a string, it is treated as a column name in data.
  • x (array-like or str, optional): The time variable corresponding to y. If a string, it is treated as a column name in data. If None (default), the index of y is used if it has one, otherwise a range from 0 to len(y)-1 is used.
  • data (DataFrame, optional): A DataFrame containing the data. Required if y or x are strings referring to column names. Not needed if y and x are array-like.
  • period (float or np.timedelta64 or pd.Timedelta): The length of the seasonal cycle in the same units as x. If x is datetime-like, period should be a np.timedelta64 or pandas.Timedelta, e.g. pd.Timedelta(days=365.25) for annual seasonal cycles. If x is in months, period should be 12, and if x is in days, then period should be 365.25.
  • trend_ratio (float, optional): The ratio of the trend smoother span to the seasonal cycle length (period). Default is 1.5, which means the trend smoother span will be 1.5 times the period.
  • nharmonics (int, optional): The number of harmonics to include in the seasonal model. Default is 4, which includes the annual, semiannual, triannual, and quadriannual harmonics for yearly seasonality.
  • maxiter (int, optional): The maximum number of iterations for the decomposition. Default is 4.
  • rtol (float, optional): The relative tolerance for convergence of the seasonal component. Default is 1e-2.
  • robust (bool or int, optional): Specifies whether robust lowess fitting should be used for trend smoothing. If False (default), no robust fitting is done. If True, 3 iterations of robust fitting are done within the lowess filter. If an integer >= 1, lowess uses that many iterations of robust fitting.
  • verbose (bool, optional): If True, prints convergence information. Default is False.
  • **kwargs :: Additional keyword arguments to pass to the lowess function for trend smoothing. For example, `
Returns
  • DecomposeResult: An object containing the original data, seasonal component, trend component, and residuals.
def STL_smoother(data, period=None, smooth=4, **kwargs):
17def STL_smoother( data, period=None, smooth=4, **kwargs):
18    '''STL decomposition of time series into trend, seasonal, and residual components
19    with additional smoothing of the seasonal component
20    
21    Parameters
22    ----------
23    data : array_like
24        time series data to be decomposed
25    period : int
26        The length of the seasonal cycle
27    smooth : int, optional
28        length of the lowess filter applied to the STL seasonal cycle, default is 4
29    **kwargs :
30        Additional keyword arguments to pass to the STL function. 
31        For example, robust=True. See the statsmodels documentation for more details
32        
33    Returns
34    -------
35    result : DecomposeResult
36        object containing the original data, seasonal component, trend component, residuals, and weights (if robust=True)
37    '''
38
39    result = STL(data, period, **kwargs).fit()
40    
41    #Additional smoothing of the seasonal component
42    seasonal_smooth = lowess(result.seasonal, 
43                             result.seasonal.index, 
44                             frac=smooth/len(data), 
45                             return_sorted=False)
46    seasonal_smooth = pd.Series(seasonal_smooth, 
47                                index=result.seasonal.index,
48                                name='seasonal')
49
50    # Adjust residuals
51    resid_adjusted = result.resid + (result.seasonal - seasonal_smooth)
52    resid_adjusted = resid_adjusted.rename('resid')
53    
54    return DecomposeResult(data, 
55                            seasonal_smooth, 
56                            result.trend, 
57                            resid_adjusted, 
58                            weights=result.weights)

STL decomposition of time series into trend, seasonal, and residual components with additional smoothing of the seasonal component

Parameters
  • data (array_like): time series data to be decomposed
  • period (int): The length of the seasonal cycle
  • smooth (int, optional): length of the lowess filter applied to the STL seasonal cycle, default is 4
  • **kwargs :: Additional keyword arguments to pass to the STL function. For example, robust=True. See the statsmodels documentation for more details
Returns
  • result (DecomposeResult): object containing the original data, seasonal component, trend component, residuals, and weights (if robust=True)
def DecomposeResult_to_DataFrame(result):
217def DecomposeResult_to_DataFrame(result):
218    '''Convert a DecomposeResult object to a pandas DataFrame.
219    
220    Parameters
221    ----------
222    result : DecomposeResult
223    
224    Returns
225    -------
226    DataFrame
227        A DataFrame with columns 'original', 'seasonal', 'trend', 'residual', and 'weights'.
228    '''
229    try:
230        index = result.observed.index
231    except AttributeError:
232        index = np.arange(len(result.data))
233    return pd.DataFrame({'original': result.observed, 
234                         'seasonal': result.seasonal, 
235                         'trend': result.trend, 
236                         'residual': result.resid, 
237                         'weights': result.weights}, 
238                         index=index)

Convert a DecomposeResult object to a pandas DataFrame.

Parameters
  • result (DecomposeResult):
Returns
  • DataFrame: A DataFrame with columns 'original', 'seasonal', 'trend', 'residual', and 'weights'.
def boxcar(array, width, align='center', method='mean', order=2):
248def boxcar( array, width, align='center', method='mean', order=2 ):
249    '''Calculate a boxcar average (i.e. running mean, median, or polynomial fit) of an array. 
250    
251    Elements of input array are assumed to be equally spaced along an (unspecified) 
252    coordinate dimension.
253
254    A centered boxcar average with even widths is traditionally
255    not allowed. This BOXCAR program allows even widths by
256    giving half weights to elements on either end of the
257    averaging window and full weights to all others. A centered
258    average with even width therefore uses uses WIDTH+1
259    elements in its averaging kernel.  
260
261    Parameters
262    ----------
263    array : array of float (N,)
264        1D array of values to average
265    width : int
266        number of elements to include in the average. 
267    align : str
268        specifies how the averaging window for each element of the output array
269        aligns with the input array. 
270        Values: center (default), forward (trailing elements), backward (leading elements)
271    method : {'mean' (default), 'median', 'polynomial'}
272        specifies the averaging kernel function
273    order : int, default=2
274        specifies the polynomial order to fit within each boxcar window.
275        This has no effect unless `method`='polynomial'
276        A parabola is order=2; order=0 is equivalent to method="mean"
277            
278    Returns
279    -------
280    result : array of float (N,)
281        1D array with averages with same size as input array. 
282        Elements on either end may be NaN
283    '''
284
285    N = array.size
286
287    # Initialize smoothed array
288    smootharray = np.empty_like(array)
289    smootharray[:] = np.NaN
290
291    # uniform averaging kernel
292    kernel = np.ones(width)
293
294    # Setup for backward boxcar
295    if align=='backward':
296        # Half widths before and after element I
297
298        HW1 = width-1
299        HW2 = width-HW1-1
300
301    # Setup for forward boxcar
302    elif align=='forward':
303        HW1 = 0
304        HW2 = width-HW1-1
305
306    # Setup for centered boxcar
307    elif align=='center':    
308        # Separate treatments for odd and even widths
309        if np.mod(width,2)==1:
310            # Odd widths
311            HW1 = np.floor(width/2)
312            HW2 = width-HW1-1
313
314        else:
315            # Even widths
316            HW1 = width/2
317            HW2 = width-HW1
318
319            # Uniform kernel in middle
320            kernel = np.ones(width+1)
321
322            # Half weight kernel for ends
323            kernel[0] = 0.5
324            kernel[width] = 0.5
325
326            # Normalize the kernel
327            kernel=kernel/kernel.sum()
328    else:
329        raise ValueError( f'align={align} not implemented. '
330                         + 'Value should be "center", "forward" or "backward".' )
331
332    # Convert to integer type
333    HW1 = int(HW1)
334    HW2 = int(HW2)
335
336    # Do boxcar
337    for i in range(HW1,N-HW2-1):
338
339        # Sub array that we operate on
340        sub = array[i-HW1:i+HW2+1]
341
342        if method=='median':
343            # Running median
344            smootharray[i] = np.nanmedian(sub)
345
346        elif method=='mean':
347            # Running mean
348            # Kernel average, only over finite elements
349            #(NaNs removed from numerator and denominator
350            smootharray[i] = np.nansum( sub*kernel ) / np.sum(kernel[np.where(np.isfinite(sub))])
351
352        elif method=='polynomial':
353
354            # Local x coordinate
355            x = np.arange(-HW1,HW2+1)
356
357            # Fit with polynomial of specified order
358            # Avoid NaNs
359            idx = np.isfinite(sub)
360
361            # number of valid points (non NaN)
362            nvalid = np.sum(idx)
363
364            if nvalid==0:
365                # smoothed value is NaN when there are no valid points
366                smootharray[i] = np.nan
367
368            else:
369                # A polynomial fit requires at least (order+1) points.
370                # When there are fewer points, use the highest possible order.
371                ordmax = np.max([np.min([order,np.sum(idx)-1]),0])
372
373                p = np.polyfit(x[idx],sub[idx],ordmax,w=kernel[idx])
374
375                # The fitted value at local x=0 is just the constant term
376                smootharray[i] = p[order]
377
378        else:
379            raise ValueError( f'method={method} not implemented. '
380                             + 'Value should be "mean", "median" or "polynomial"')
381
382    return smootharray

Calculate a boxcar average (i.e. running mean, median, or polynomial fit) of an array.

Elements of input array are assumed to be equally spaced along an (unspecified) coordinate dimension.

A centered boxcar average with even widths is traditionally not allowed. This BOXCAR program allows even widths by giving half weights to elements on either end of the averaging window and full weights to all others. A centered average with even width therefore uses uses WIDTH+1 elements in its averaging kernel.

Parameters
  • array (array of float (N,)): 1D array of values to average
  • width (int): number of elements to include in the average.
  • align (str): specifies how the averaging window for each element of the output array aligns with the input array. Values: center (default), forward (trailing elements), backward (leading elements)
  • method ({'mean' (default), 'median', 'polynomial'}): specifies the averaging kernel function
  • order (int, default=2): specifies the polynomial order to fit within each boxcar window. This has no effect unless method='polynomial' A parabola is order=2; order=0 is equivalent to method="mean"
Returns
  • result (array of float (N,)): 1D array with averages with same size as input array. Elements on either end may be NaN
def boxcarpoly(array, width, order=2, align='center'):
241def boxcarpoly( array, width, order=2, align='center'):
242    '''Calculate a boxcar polynomial (i.e. running polynomial fit) of an array. 
243    
244    See `boxcar` for parameter definitions
245    '''
246    return boxcar( array, width, order=order, align=align, method='polynomial')

Calculate a boxcar polynomial (i.e. running polynomial fit) of an array.

See boxcar for parameter definitions