acgc.stats.bivariate_lines

Specialized methods of bivariate line fitting

  • Standard major axis (SMA) also called reduced major axis (RMA)
  • York regression, for data with errors in x and y
  • Theil-Sen, non-parametric slope estimation (use numba to accelerate the function in this module)
  1# -*- coding: utf-8 -*-
  2"""Specialized methods of bivariate line fitting
  3
  4* Standard major axis (SMA) also called reduced major axis (RMA)
  5* York regression, for data with errors in x and y
  6* Theil-Sen, non-parametric slope estimation (use numba to accelerate the function in this module)
  7"""
  8# from collections import namedtuple
  9import warnings
 10import numpy as np
 11import scipy.stats as stats
 12from sklearn.covariance import MinCovDet
 13import statsmodels.formula.api as smf
 14import statsmodels.robust.norms as norms
 15from scipy.stats import theilslopes
 16#from numba import jit
 17
 18__all__ = [
 19    "bivariate_line_equation",
 20    "sma",
 21    "smafit",
 22    "sen",
 23    "sen_slope",
 24    "sen_numba",
 25    "york"
 26]
 27# Aliases
 28def sen_slope(*args,**kwargs):
 29    '''Alias for `sen`'''
 30    return sen(*args,**kwargs)
 31def smafit(*args,**kwargs):
 32    '''Alias for `sma`'''
 33    return sma(*args,**kwargs)
 34
 35def bivariate_line_equation(fitresult,
 36                    floatformat='{:.3f}',
 37                    ystring='include',
 38                    include_error=False ):
 39    '''Write equation for the fitted line as a string
 40    
 41    Parameters
 42    ----------
 43    fitresult : dict
 44        results of the line fit
 45    floatformat : str
 46        format string for the numerical values (default='{:.3f}')
 47    ystring : {'include' (default), 'separate', 'none'}
 48        specifies whether "y =" should be included in result, a separate item in tuple, or none
 49    include_error : bool
 50        specifies whether uncertainty terms should be included in the equation
 51    
 52    Returns
 53    -------
 54    fitline_string : str
 55        equation for the the fitted line, in the form "y = a x + b" or "y = a x"
 56        If uncertainty terms are included, then "y = (a ± c) x + (b ± d)" or "y = (a ± c) x"
 57    '''
 58
 59    # Left-hand side
 60    lhs = "y_"+fitresult['method']
 61
 62    # Right-hand side
 63    if fitresult['fitintercept']:
 64        if include_error:
 65            rhs = f'({floatformat:s} ± {floatformat:s}) x + ({floatformat:s} ± {floatformat:s})'.\
 66                    format( fitresult['slope'], fitresult['slope_ste'], fitresult['intercept'], fitresult['intercept_ste'] )
 67        else:
 68            rhs = f'{floatformat:s} x + {floatformat:s}'.\
 69                    format( fitresult['slope'], fitresult['intercept'] )
 70    else:
 71        if include_error:
 72            rhs = f'({floatformat:s} ± {floatformat:s}) x'.\
 73                    format( fitresult['slope'], fitresult['slope_ste'] )
 74        else:
 75            rhs = f'{floatformat:s} x'.\
 76                    format( fitresult['slope'] )
 77
 78    # Combine right and left-hand sides
 79    if ystring=='include':
 80        equation = f'{lhs:s} = {rhs:s}'
 81    elif ystring=='separate':
 82        equation = (lhs,rhs)
 83    elif ystring=='none':
 84        equation = rhs
 85    else:
 86        raise ValueError('Unrecognized value of ystring: '+ystring)
 87
 88    return equation
 89
 90def sma(X,Y,W=None,
 91           data=None,
 92           alpha=0.95,
 93           intercept=True,
 94           robust=False,robust_method='FastMCD'):
 95    '''Standard Major-Axis (SMA) line fitting
 96    
 97    Calculate standard major axis, aka reduced major axis, fit to 
 98    data X and Y. The main advantage of this over ordinary least squares is 
 99    that the best fit of Y to X will be the same as the best fit of X to Y.
100    
101    The fit equations and confidence intervals are implemented following 
102    Warton et al. (2006). Robust fits use the FastMCD covariance estimate 
103    from Rousseeuw and Van Driessen (1999). While there are many alternative 
104    robust covariance estimators (e.g. other papers by D.I. Warton using M-estimators), 
105    the FastMCD algorithm is default in Matlab. When the standard error or 
106    uncertainty of each point is known, then weighted SMA may be preferrable to 
107    robust SMA. The conventional choice of weights for each point i is 
108    W_i = 1 / ( var(X_i) + var(Y_i) ), where var() is the variance 
109    (squared standard error).
110    
111    References 
112    Warton, D. I., Wright, I. J., Falster, D. S. and Westoby, M.: 
113        Bivariate line-fitting methods for allometry, Biol. Rev., 81(02), 259, 
114        doi:10.1017/S1464793106007007, 2006.
115    Rousseeuw, P. J. and Van Driessen, K.: A Fast Algorithm for the Minimum 
116        Covariance Determinant Estimator, Technometrics, 41(3), 1999.
117
118    Parameters
119    ----------
120    X, Y : array_like or str
121        Input values, Must have same length.
122    W    : array_like or str, optional
123        array of weights for each X-Y point, typically W_i = 1/(var(X_i)+var(Y_i)) 
124    data : dict_like, optional
125        data structure containing variables. Used when X, Y, or W are str.
126    alpha : float (default = 0.95)
127        Desired confidence level [0,1] for output. 
128    intercept : bool, default=True
129        Specify if the fitted model should include a non-zero intercept.
130        The model will be forced through the origin (0,0) if intercept=False.
131    robust : bool, default=False
132        Use statistical methods that are robust to the presence of outliers
133    robust_method: {'FastMCD' (default), 'Huber', 'Biweight'}
134        Method for calculating robust variance and covariance. Options:
135        - 'MCD' or 'FastMCD' for Fast MCD
136        - 'Huber' for Huber's T: reduce, not eliminate, influence of outliers
137        - 'Biweight' for Tukey's Biweight: reduces then eliminates influence of outliers
138
139        
140    Returns
141    -------
142    fitresult : dict 
143        Contains the following keys:
144        - slope (float)
145            Slope or Gradient of Y vs. X
146        - intercept (float)
147            Y intercept.
148        - slope_ste (float)
149            Standard error of slope estimate
150        - intercept_ste (float)
151            standard error of intercept estimate
152        - slope_interval ([float, float])
153            confidence interval for gradient at confidence level alpha
154        - intercept_interval ([float, float])
155            confidence interval for intercept at confidence level alpha
156        - alpha (float)
157            confidence level [0,1] for slope and intercept intervals
158        - df_model (float)
159            degrees of freedom for model
160        - df_resid (float)
161            degrees of freedom for residuals
162        - params ([float,float])
163            array of fitted parameters
164        - fittedvalues (ndarray)
165            array of fitted values
166        - resid (ndarray)
167            array of residual values
168        - method (str)
169            name of the fit method
170    '''
171
172    def str2var( v, data ):
173        '''Extract variable named v from Dataframe named data'''
174        try:
175            return data[v]
176        except Exception as exc:
177            raise ValueError( 'Argument data must be provided with a key named '+v ) from exc
178
179    # If variables are provided as strings, get values from the data structure
180    if isinstance( X, str ):
181        X = str2var( X, data )
182    if isinstance( Y, str ):
183        Y = str2var( Y, data )
184    if isinstance( W, str ):
185        W = str2var( W, data )
186
187    # Make sure arrays have the same length
188    assert ( len(X) == len(Y) ), 'Arrays X and Y must have the same length'
189    if W is None:
190        W = np.zeros_like(X) + 1
191    else:
192        assert ( len(W) == len(X) ), 'Array W must have the same length as X and Y'
193
194    # Make sure alpha is within the range 0-1
195    assert (alpha < 1), 'alpha must be less than 1'
196    assert (alpha > 0), 'alpha must be greater than 0'
197
198    # Drop any NaN elements of X, Y, or W
199    # Infinite values are allowed but will make the result undefined
200    # idx = ~np.logical_or( np.isnan(X0), np.isnan(Y0) )
201    idx = ~np.isnan(X) * ~np.isnan(Y) * ~np.isnan(W)
202
203    X0 = X[idx]
204    Y0 = Y[idx]
205    W0 = W[idx]
206
207    # Number of observations
208    N = len(X0)
209
210    include_intercept = intercept
211
212    # Degrees of freedom for the model
213    if include_intercept:
214        dfmod = 2
215    else:
216        dfmod = 1
217
218    method = 'SMA'
219
220    # Choose whether to use methods robust to outliers
221    if robust:
222
223        method = 'rSMA'
224
225        # Choose the robust method
226        if ((robust_method.lower() =='mcd') or (robust_method.lower() == 'fastmcd') ):
227            # FAST MCD
228
229            if not include_intercept:
230                # intercept=False could possibly be supported by calculating
231                # using mcd.support_ as weights in an explicit variance/covariance calculation
232                raise NotImplementedError('FastMCD method only supports SMA with intercept')
233
234            # Fit robust model of mean and covariance
235            mcd = MinCovDet().fit( np.array([X0,Y0]).T )
236
237            # Robust mean
238            Xmean = mcd.location_[0]
239            Ymean = mcd.location_[1]
240
241            # Robust variance of X, Y
242            Vx    = mcd.covariance_[0,0]
243            Vy    = mcd.covariance_[1,1]
244
245            # Robust covariance
246            Vxy   = mcd.covariance_[0,1]
247
248            # Number of observations used in mean and covariance estimate
249            # excludes observations marked as outliers
250            N = mcd.support_.sum()
251
252        elif ((robust_method.lower() =='biweight') or (robust_method.lower() == 'huber') ):
253
254            # Tukey's Biweight and Huber's T
255            if robust_method.lower()=='biweight':
256                norm = norms.TukeyBiweight()
257            else:
258                norm = norms.HuberT()
259
260            # Get weights for downweighting outliers
261            # Fitting a linear model the easiest way to get these
262            # Options include "TukeyBiweight" (totally removes large deviates)
263            # "HuberT" (linear, not squared weighting of large deviates)
264            rweights = smf.rlm('y~x+1',{'x':X0,'y':Y0},M=norm).fit().weights
265
266            # Sum of weight and weights squared, for convienience
267            rsum  = np.sum( rweights )
268            rsum2 = np.sum( rweights**2 )
269
270            # Mean
271            Xmean = np.sum( X0 * rweights ) / rsum
272            Ymean = np.sum( Y0 * rweights ) / rsum
273
274            # Force intercept through zero, if requested
275            if not include_intercept:
276                Xmean = 0
277                Ymean = 0
278
279            # Variance & Covariance
280            Vx    = np.sum( (X0-Xmean)**2 * rweights**2 ) / rsum2
281            Vy    = np.sum( (Y0-Ymean)**2 * rweights**2 ) / rsum2
282            Vxy   = np.sum( (X0-Xmean) * (Y0-Ymean) * rweights**2 ) / rsum2
283
284            # Effective number of observations
285            N = rsum
286
287        else:
288
289            raise NotImplementedError("sma hasn't implemented robust_method={:%s}".\
290                                      format(robust_method))
291    else:
292
293        if include_intercept:
294
295            wsum = np.sum(W)
296
297            # Average values
298            Xmean = np.sum(X0 * W0) / wsum
299            Ymean = np.sum(Y0 * W0) / wsum
300
301            # Covariance matrix
302            cov = np.cov( X0, Y0, ddof=1, aweights=W0**2 )
303
304            # Variance
305            Vx = cov[0,0]
306            Vy = cov[1,1]
307
308            # Covariance
309            Vxy = cov[0,1]
310
311        else:
312
313            # Force the line to pass through origin by setting means to zero
314            Xmean = 0
315            Ymean = 0
316
317            wsum = np.sum(W0)
318
319            # Sum of squares in place of variance and covariance
320            Vx = np.sum( X0**2 * W0 ) / wsum
321            Vy = np.sum( Y0**2 * W0 ) / wsum
322            Vxy= np.sum( X0*Y0 * W0 ) / wsum
323
324    # Standard deviation
325    Sx = np.sqrt( Vx )
326    Sy = np.sqrt( Vy )
327
328    # Correlation coefficient (equivalent to np.corrcoef()[1,0] for non-robust cases)
329    R = Vxy / np.sqrt( Vx * Vy )
330
331    #############
332    # SLOPE
333
334    Slope  = np.sign(R) * Sy / Sx
335
336    # Standard error of slope estimate
337    ste_slope = np.sqrt( 1/(N-dfmod) * Sy**2 / Sx**2 * (1-R**2) )
338
339    # If variables are DataArrays, convert to scalar
340    if hasattr(Slope,'values'):
341        Slope = Slope.values
342    if hasattr(R,'values'):
343        R = R.values
344    if hasattr(N,'values'):
345        N = N.values
346
347    # Confidence interval for Slope
348    B = (1-R**2)/(N-dfmod) * stats.f.isf(1-alpha, 1, N-dfmod)
349    ci_grad = Slope * ( np.sqrt( B+1 ) + np.sqrt(B)*np.array([-1,+1]) )
350
351    #############
352    # INTERCEPT
353
354    if include_intercept:
355        Intercept = Ymean - Slope * Xmean
356
357        # Standard deviation of residuals
358        # New Method: Formula from smatr R package (Warton)
359        # This formula avoids large residuals of outliers when using robust=True
360        Sr = np.sqrt((Vy - 2 * Slope * Vxy + Slope**2 *  Vx ) * (N-1) / (N-dfmod) )
361
362        # OLD METHOD
363        # Standard deviation of residuals
364        #resid = Y0 - (Intercept + Slope * X0 )
365        # Population standard deviation of the residuals
366        #Sr = np.std( resid, ddof=0 )
367
368        # Standard error of the intercept estimate
369        ste_int = np.sqrt( Sr**2/N + Xmean**2 * ste_slope**2  )
370
371        # If Intercept and ste_int are DataArrays, convert to float
372        if hasattr(Intercept,'values'):
373            Intercept = Intercept.values
374        if hasattr(ste_int,'values'):
375            ste_int = ste_int.values
376
377        # Confidence interval for Intercept
378        tcrit = stats.t.isf((1-alpha)/2,N-dfmod)
379        ci_int = Intercept + ste_int * np.array([-tcrit,tcrit])
380
381    else:
382
383        # Set Intercept quantities to zero
384        Intercept = 0
385        ste_int   = 0
386        ci_int    = np.array([0,0])
387
388    result = dict( method           = method,
389                   fitintercept     = include_intercept,
390                   slope            = Slope,
391                   intercept        = Intercept,
392                   slope_ste        = ste_slope,
393                   intercept_ste    = ste_int,
394                   slope_interval   = ci_grad,
395                   intercept_interval = ci_int,
396                   alpha            = alpha,
397                   df_model         = dfmod,
398                   df_resid         = N-dfmod,
399                   params           = np.array([Slope,Intercept]),
400                   nobs             = N,
401                   fittedvalues     = Intercept + Slope * X0,
402                   resid            = Y0 - ( Intercept + Slope * X0  )
403                )
404
405    # return Slope, Intercept, ste_slope, ste_int, ci_grad, ci_int
406    return result
407
408def york( x, y, err_x=1, err_y=1, rerr_xy=0 ):
409    '''York regression accounting for error in x and y
410    Follows the notation and algorithm of York et al. (2004) Section III
411    
412    Parameters
413    ----------
414    x, y : ndarray
415        dependent (x) and independent (y) variables for fitting
416    err_x, err_y : ndarray (default=1)
417        standard deviation of errors/uncertainty in x and y
418    rerr_xy : float (default=0)
419        correlation coefficient for errors in x and y, 
420        default to rerr_xy=0 meaning that the errors in x are unrelated to errors in y
421        err_x, err_y, and rerr_xy can be constants or arrays of the same length as x and y
422    
423    Returns
424    -------
425    fitresult : dict 
426        Contains the following keys:
427        - slope (float)
428            Slope or Gradient of Y vs. X
429        - intercept (float)
430            Y intercept.
431        - slope_ste (float)
432            Standard error of slope estimate
433        - intercept_ste (float)
434            standard error of intercept estimate
435        - slope_interval ([float, float])
436            confidence interval for gradient at confidence level alpha
437        - intercept_interval ([float, float])
438            confidence interval for intercept at confidence level alpha
439        - alpha (float)
440            confidence level [0,1] for slope and intercept intervals
441        - df_model (float)
442            degrees of freedom for model
443        - df_resid (float)
444            degrees of freedom for residuals
445        - params ([float,float])
446            array of fitted parameters
447        - fittedvalues (ndarray)
448            array of fitted values
449        - resid (ndarray)
450            array of residual values
451    '''
452
453    # relative error tolerance required for convergence
454    rtol = 1e-15
455
456    # Initial guess for slope, from ordinary least squares
457    result = stats.linregress( x, y )
458    b = result[0]
459
460    # Weights for x and y
461    wx = 1 / err_x**2
462    wy = 1 / err_y**2
463
464    # Combined weights
465    alpha = np.sqrt( wx * wy )
466
467    # Iterate until solution converges, but not more 50 times
468    maxiter=50
469    for i in range(1,maxiter):
470
471        # Weight for point i
472        W = wx * wy / ( wx + b**2 * wy - 2 * b * rerr_xy * alpha )
473        Wsum = np.sum( W )
474
475        # Weighted means
476        Xbar = np.sum( W * x ) / Wsum
477        Ybar = np.sum( W * y ) / Wsum
478
479        # Deviation from weighted means
480        U = x - Xbar
481        V = y - Ybar
482
483        # parameter needed for slope
484        beta = W * ( U / wy + b*V / wx - (b*U + V) * rerr_xy / alpha )
485
486        # Update slope estimate
487        bnew = np.sum( W * beta * V ) / np.sum( W * beta * U )
488
489        # Break from loop if new value is very close to old value
490        if np.abs( (bnew-b)/b ) < rtol:
491            break
492        else:
493            b = bnew
494
495    if i==maxiter:
496        raise ValueError( f'York regression failed to converge in {maxiter:d} iterations' )
497
498    # Intercept
499    a = Ybar - b * Xbar
500
501    # least-squares adjusted points, expectation values of X and Y
502    xa = Xbar + beta
503    ya = Ybar + b*beta
504
505    # Mean of adjusted points
506    xabar = np.sum( W * xa ) / Wsum
507    yabar = np.sum( W * ya ) / Wsum
508
509    # Devaiation of adjusted points from their means
510    u = xa - xabar
511    v = ya - yabar
512
513    # Variance of slope and intercept estimates
514    varb = 1 / np.sum( W * u**2 )
515    vara = 1 / Wsum + xabar**2 * varb
516
517    # Standard error of slope and intercept
518    siga = np.sqrt( vara )
519    sigb = np.sqrt( varb )
520
521    # Define a named tuple type that will contain the results
522    # result = namedtuple( 'result', 'slope intercept sigs sigi params sigma' )
523
524    # Return results as a named tuple, User can access as a regular tuple too
525    # return result( b, a, sigb, siga, [b,a], [sigb, siga] )
526
527    dfmod = 2
528    N = np.sum( ~np.isnan(x) * ~np.isnan(y) )
529
530    result = dict( method        = 'York',
531                fitintercept     = True,
532                slope            = b,
533                intercept        = a,
534                slope_ste        = sigb,
535                intercept_ste    = siga,
536                slope_interval   = [None,None],
537                intercept_interval = [None,None],
538                alpha            = alpha,
539                df_model         = dfmod,
540                df_resid         = N-dfmod,
541                params           = np.array([b,a]),
542                nobs             = N,
543                fittedvalues     = a + b * x,
544                resid            = y - ( a + b * x ) 
545                )
546
547    return result
548
549def sen( x, y, alpha=0.95, method='separate' ):
550    ''''Theil-Sen slope estimate
551    
552    This function wraps `scipy.stats.theilslopes` and provides
553    results in the same dict format as the other line fitting methods 
554    in this module
555    
556    Parameters
557    ----------
558    x, y : ndarray
559        dependent (x) and independent (y) variables for fitting
560    alpha : float (default = 0.95)
561        Desired confidence level [0,1] for output. 
562    method : {'separate' (default), 'joint'}
563        Method for estimating intercept. 
564        - 'separate' uses np.median(y) - slope * np.median(x)
565        - 'joint' uses np.median( y - slope * x )
566            
567    Returns
568    -------
569    fitresult : dict 
570        Contains the following keys:
571        - slope (float)
572            Slope or Gradient of Y vs. X
573        - intercept (float)
574            Y intercept.
575        - slope_ste (float)
576            Standard error of slope estimate
577        - intercept_ste (float)
578            standard error of intercept estimate
579        - slope_interval ([float, float])
580            confidence interval for gradient at confidence level alpha
581        - intercept_interval ([float, float])
582            confidence interval for intercept at confidence level alpha
583        - alpha (float)
584            confidence level [0,1] for slope and intercept intervals
585        - df_model (float)
586            degrees of freedom for model
587        - df_resid (float)
588            degrees of freedom for residuals
589        - params ([float,float])
590            array of fitted parameters
591        - fittedvalues (ndarray)
592            array of fitted values
593        - resid (ndarray)
594            array of residual values
595    '''
596
597    slope, intercept, low_slope, high_slope = theilslopes(y,x,alpha,method)
598
599    dfmod = 2
600    N = np.sum( ~np.isnan(x) * ~np.isnan(y) )
601
602    result = dict( method        = 'Theil-Sen',
603                fitintercept     = True,
604                slope            = slope,
605                intercept        = intercept,
606                slope_ste        = None,
607                intercept_ste    = None,
608                slope_interval   = [low_slope,high_slope],
609                intercept_interval = [None,None],
610                alpha            = alpha,
611                df_model         = dfmod,
612                df_resid         = N-dfmod,
613                params           = np.array([slope,intercept]),
614                nobs             = N,
615                fittedvalues     = intercept + slope * x,
616                resid            = y - ( intercept + slope * x ) 
617                )
618
619    return result
620
621#@jit(nopython=True)
622def sen_numba( x, y ):
623    '''Estimate linear trend using the Thiel-Sen method
624    
625    This non-parametric method finds the median slope among all
626    combinations of time points. 
627    scipy.stats.theilslopes provides the same slope estimate, with  
628    confidence intervals. However, this function is faster for 
629    large datasets due to Numba 
630    
631    Parameters
632    ----------
633    x : array_like (N,)
634        independent variable
635    y : array_like (N,)
636        dependent variable
637    
638    Returns
639    -------
640    sen : float
641        the median slope
642    slopes : array (N*N,)
643        all slope estimates from all combinations of x and y
644    '''
645
646    with warnings.catch_warnings():
647        warnings.simplefilter('always', DeprecationWarning)
648        warnings.warn(f'Sen function is slow unless numba.jit is used. Use scipy.stats.theilslopes instead.',
649                    DeprecationWarning, stacklevel=2)
650        
651    if len( x ) != len( y ):
652        print('Inputs x and y must have same dimension')
653        return np.nan
654
655    # Find number of time points
656    n = len( x )
657
658    # Array to hold all slope estimates
659    slopes = np.zeros(  np.ceil( n * ( n-1 ) / 2 ).astype('int') )
660    slopes[:] = np.nan
661
662    count = 0
663
664    for i in range(n):
665        for j in range(i+1, n):
666
667            # Slope between elements i and j
668            slopeij = ( y[j] - y[i] ) / ( x[j] - x[i] )
669
670            slopes[count] = slopeij
671
672            count += 1
673
674    # Thiel-Sen estimate is the median slope, neglecting NaN
675    sen = np.nanmedian( slopes )
676
677    return sen, slopes
def bivariate_line_equation( fitresult, floatformat='{:.3f}', ystring='include', include_error=False):
36def bivariate_line_equation(fitresult,
37                    floatformat='{:.3f}',
38                    ystring='include',
39                    include_error=False ):
40    '''Write equation for the fitted line as a string
41    
42    Parameters
43    ----------
44    fitresult : dict
45        results of the line fit
46    floatformat : str
47        format string for the numerical values (default='{:.3f}')
48    ystring : {'include' (default), 'separate', 'none'}
49        specifies whether "y =" should be included in result, a separate item in tuple, or none
50    include_error : bool
51        specifies whether uncertainty terms should be included in the equation
52    
53    Returns
54    -------
55    fitline_string : str
56        equation for the the fitted line, in the form "y = a x + b" or "y = a x"
57        If uncertainty terms are included, then "y = (a ± c) x + (b ± d)" or "y = (a ± c) x"
58    '''
59
60    # Left-hand side
61    lhs = "y_"+fitresult['method']
62
63    # Right-hand side
64    if fitresult['fitintercept']:
65        if include_error:
66            rhs = f'({floatformat:s} ± {floatformat:s}) x + ({floatformat:s} ± {floatformat:s})'.\
67                    format( fitresult['slope'], fitresult['slope_ste'], fitresult['intercept'], fitresult['intercept_ste'] )
68        else:
69            rhs = f'{floatformat:s} x + {floatformat:s}'.\
70                    format( fitresult['slope'], fitresult['intercept'] )
71    else:
72        if include_error:
73            rhs = f'({floatformat:s} ± {floatformat:s}) x'.\
74                    format( fitresult['slope'], fitresult['slope_ste'] )
75        else:
76            rhs = f'{floatformat:s} x'.\
77                    format( fitresult['slope'] )
78
79    # Combine right and left-hand sides
80    if ystring=='include':
81        equation = f'{lhs:s} = {rhs:s}'
82    elif ystring=='separate':
83        equation = (lhs,rhs)
84    elif ystring=='none':
85        equation = rhs
86    else:
87        raise ValueError('Unrecognized value of ystring: '+ystring)
88
89    return equation

Write equation for the fitted line as a string

Parameters
  • fitresult (dict): results of the line fit
  • floatformat (str): format string for the numerical values (default='{:.3f}')
  • ystring ({'include' (default), 'separate', 'none'}): specifies whether "y =" should be included in result, a separate item in tuple, or none
  • include_error (bool): specifies whether uncertainty terms should be included in the equation
Returns
  • fitline_string (str): equation for the the fitted line, in the form "y = a x + b" or "y = a x" If uncertainty terms are included, then "y = (a ± c) x + (b ± d)" or "y = (a ± c) x"
def sma( X, Y, W=None, data=None, alpha=0.95, intercept=True, robust=False, robust_method='FastMCD'):
 91def sma(X,Y,W=None,
 92           data=None,
 93           alpha=0.95,
 94           intercept=True,
 95           robust=False,robust_method='FastMCD'):
 96    '''Standard Major-Axis (SMA) line fitting
 97    
 98    Calculate standard major axis, aka reduced major axis, fit to 
 99    data X and Y. The main advantage of this over ordinary least squares is 
100    that the best fit of Y to X will be the same as the best fit of X to Y.
101    
102    The fit equations and confidence intervals are implemented following 
103    Warton et al. (2006). Robust fits use the FastMCD covariance estimate 
104    from Rousseeuw and Van Driessen (1999). While there are many alternative 
105    robust covariance estimators (e.g. other papers by D.I. Warton using M-estimators), 
106    the FastMCD algorithm is default in Matlab. When the standard error or 
107    uncertainty of each point is known, then weighted SMA may be preferrable to 
108    robust SMA. The conventional choice of weights for each point i is 
109    W_i = 1 / ( var(X_i) + var(Y_i) ), where var() is the variance 
110    (squared standard error).
111    
112    References 
113    Warton, D. I., Wright, I. J., Falster, D. S. and Westoby, M.: 
114        Bivariate line-fitting methods for allometry, Biol. Rev., 81(02), 259, 
115        doi:10.1017/S1464793106007007, 2006.
116    Rousseeuw, P. J. and Van Driessen, K.: A Fast Algorithm for the Minimum 
117        Covariance Determinant Estimator, Technometrics, 41(3), 1999.
118
119    Parameters
120    ----------
121    X, Y : array_like or str
122        Input values, Must have same length.
123    W    : array_like or str, optional
124        array of weights for each X-Y point, typically W_i = 1/(var(X_i)+var(Y_i)) 
125    data : dict_like, optional
126        data structure containing variables. Used when X, Y, or W are str.
127    alpha : float (default = 0.95)
128        Desired confidence level [0,1] for output. 
129    intercept : bool, default=True
130        Specify if the fitted model should include a non-zero intercept.
131        The model will be forced through the origin (0,0) if intercept=False.
132    robust : bool, default=False
133        Use statistical methods that are robust to the presence of outliers
134    robust_method: {'FastMCD' (default), 'Huber', 'Biweight'}
135        Method for calculating robust variance and covariance. Options:
136        - 'MCD' or 'FastMCD' for Fast MCD
137        - 'Huber' for Huber's T: reduce, not eliminate, influence of outliers
138        - 'Biweight' for Tukey's Biweight: reduces then eliminates influence of outliers
139
140        
141    Returns
142    -------
143    fitresult : dict 
144        Contains the following keys:
145        - slope (float)
146            Slope or Gradient of Y vs. X
147        - intercept (float)
148            Y intercept.
149        - slope_ste (float)
150            Standard error of slope estimate
151        - intercept_ste (float)
152            standard error of intercept estimate
153        - slope_interval ([float, float])
154            confidence interval for gradient at confidence level alpha
155        - intercept_interval ([float, float])
156            confidence interval for intercept at confidence level alpha
157        - alpha (float)
158            confidence level [0,1] for slope and intercept intervals
159        - df_model (float)
160            degrees of freedom for model
161        - df_resid (float)
162            degrees of freedom for residuals
163        - params ([float,float])
164            array of fitted parameters
165        - fittedvalues (ndarray)
166            array of fitted values
167        - resid (ndarray)
168            array of residual values
169        - method (str)
170            name of the fit method
171    '''
172
173    def str2var( v, data ):
174        '''Extract variable named v from Dataframe named data'''
175        try:
176            return data[v]
177        except Exception as exc:
178            raise ValueError( 'Argument data must be provided with a key named '+v ) from exc
179
180    # If variables are provided as strings, get values from the data structure
181    if isinstance( X, str ):
182        X = str2var( X, data )
183    if isinstance( Y, str ):
184        Y = str2var( Y, data )
185    if isinstance( W, str ):
186        W = str2var( W, data )
187
188    # Make sure arrays have the same length
189    assert ( len(X) == len(Y) ), 'Arrays X and Y must have the same length'
190    if W is None:
191        W = np.zeros_like(X) + 1
192    else:
193        assert ( len(W) == len(X) ), 'Array W must have the same length as X and Y'
194
195    # Make sure alpha is within the range 0-1
196    assert (alpha < 1), 'alpha must be less than 1'
197    assert (alpha > 0), 'alpha must be greater than 0'
198
199    # Drop any NaN elements of X, Y, or W
200    # Infinite values are allowed but will make the result undefined
201    # idx = ~np.logical_or( np.isnan(X0), np.isnan(Y0) )
202    idx = ~np.isnan(X) * ~np.isnan(Y) * ~np.isnan(W)
203
204    X0 = X[idx]
205    Y0 = Y[idx]
206    W0 = W[idx]
207
208    # Number of observations
209    N = len(X0)
210
211    include_intercept = intercept
212
213    # Degrees of freedom for the model
214    if include_intercept:
215        dfmod = 2
216    else:
217        dfmod = 1
218
219    method = 'SMA'
220
221    # Choose whether to use methods robust to outliers
222    if robust:
223
224        method = 'rSMA'
225
226        # Choose the robust method
227        if ((robust_method.lower() =='mcd') or (robust_method.lower() == 'fastmcd') ):
228            # FAST MCD
229
230            if not include_intercept:
231                # intercept=False could possibly be supported by calculating
232                # using mcd.support_ as weights in an explicit variance/covariance calculation
233                raise NotImplementedError('FastMCD method only supports SMA with intercept')
234
235            # Fit robust model of mean and covariance
236            mcd = MinCovDet().fit( np.array([X0,Y0]).T )
237
238            # Robust mean
239            Xmean = mcd.location_[0]
240            Ymean = mcd.location_[1]
241
242            # Robust variance of X, Y
243            Vx    = mcd.covariance_[0,0]
244            Vy    = mcd.covariance_[1,1]
245
246            # Robust covariance
247            Vxy   = mcd.covariance_[0,1]
248
249            # Number of observations used in mean and covariance estimate
250            # excludes observations marked as outliers
251            N = mcd.support_.sum()
252
253        elif ((robust_method.lower() =='biweight') or (robust_method.lower() == 'huber') ):
254
255            # Tukey's Biweight and Huber's T
256            if robust_method.lower()=='biweight':
257                norm = norms.TukeyBiweight()
258            else:
259                norm = norms.HuberT()
260
261            # Get weights for downweighting outliers
262            # Fitting a linear model the easiest way to get these
263            # Options include "TukeyBiweight" (totally removes large deviates)
264            # "HuberT" (linear, not squared weighting of large deviates)
265            rweights = smf.rlm('y~x+1',{'x':X0,'y':Y0},M=norm).fit().weights
266
267            # Sum of weight and weights squared, for convienience
268            rsum  = np.sum( rweights )
269            rsum2 = np.sum( rweights**2 )
270
271            # Mean
272            Xmean = np.sum( X0 * rweights ) / rsum
273            Ymean = np.sum( Y0 * rweights ) / rsum
274
275            # Force intercept through zero, if requested
276            if not include_intercept:
277                Xmean = 0
278                Ymean = 0
279
280            # Variance & Covariance
281            Vx    = np.sum( (X0-Xmean)**2 * rweights**2 ) / rsum2
282            Vy    = np.sum( (Y0-Ymean)**2 * rweights**2 ) / rsum2
283            Vxy   = np.sum( (X0-Xmean) * (Y0-Ymean) * rweights**2 ) / rsum2
284
285            # Effective number of observations
286            N = rsum
287
288        else:
289
290            raise NotImplementedError("sma hasn't implemented robust_method={:%s}".\
291                                      format(robust_method))
292    else:
293
294        if include_intercept:
295
296            wsum = np.sum(W)
297
298            # Average values
299            Xmean = np.sum(X0 * W0) / wsum
300            Ymean = np.sum(Y0 * W0) / wsum
301
302            # Covariance matrix
303            cov = np.cov( X0, Y0, ddof=1, aweights=W0**2 )
304
305            # Variance
306            Vx = cov[0,0]
307            Vy = cov[1,1]
308
309            # Covariance
310            Vxy = cov[0,1]
311
312        else:
313
314            # Force the line to pass through origin by setting means to zero
315            Xmean = 0
316            Ymean = 0
317
318            wsum = np.sum(W0)
319
320            # Sum of squares in place of variance and covariance
321            Vx = np.sum( X0**2 * W0 ) / wsum
322            Vy = np.sum( Y0**2 * W0 ) / wsum
323            Vxy= np.sum( X0*Y0 * W0 ) / wsum
324
325    # Standard deviation
326    Sx = np.sqrt( Vx )
327    Sy = np.sqrt( Vy )
328
329    # Correlation coefficient (equivalent to np.corrcoef()[1,0] for non-robust cases)
330    R = Vxy / np.sqrt( Vx * Vy )
331
332    #############
333    # SLOPE
334
335    Slope  = np.sign(R) * Sy / Sx
336
337    # Standard error of slope estimate
338    ste_slope = np.sqrt( 1/(N-dfmod) * Sy**2 / Sx**2 * (1-R**2) )
339
340    # If variables are DataArrays, convert to scalar
341    if hasattr(Slope,'values'):
342        Slope = Slope.values
343    if hasattr(R,'values'):
344        R = R.values
345    if hasattr(N,'values'):
346        N = N.values
347
348    # Confidence interval for Slope
349    B = (1-R**2)/(N-dfmod) * stats.f.isf(1-alpha, 1, N-dfmod)
350    ci_grad = Slope * ( np.sqrt( B+1 ) + np.sqrt(B)*np.array([-1,+1]) )
351
352    #############
353    # INTERCEPT
354
355    if include_intercept:
356        Intercept = Ymean - Slope * Xmean
357
358        # Standard deviation of residuals
359        # New Method: Formula from smatr R package (Warton)
360        # This formula avoids large residuals of outliers when using robust=True
361        Sr = np.sqrt((Vy - 2 * Slope * Vxy + Slope**2 *  Vx ) * (N-1) / (N-dfmod) )
362
363        # OLD METHOD
364        # Standard deviation of residuals
365        #resid = Y0 - (Intercept + Slope * X0 )
366        # Population standard deviation of the residuals
367        #Sr = np.std( resid, ddof=0 )
368
369        # Standard error of the intercept estimate
370        ste_int = np.sqrt( Sr**2/N + Xmean**2 * ste_slope**2  )
371
372        # If Intercept and ste_int are DataArrays, convert to float
373        if hasattr(Intercept,'values'):
374            Intercept = Intercept.values
375        if hasattr(ste_int,'values'):
376            ste_int = ste_int.values
377
378        # Confidence interval for Intercept
379        tcrit = stats.t.isf((1-alpha)/2,N-dfmod)
380        ci_int = Intercept + ste_int * np.array([-tcrit,tcrit])
381
382    else:
383
384        # Set Intercept quantities to zero
385        Intercept = 0
386        ste_int   = 0
387        ci_int    = np.array([0,0])
388
389    result = dict( method           = method,
390                   fitintercept     = include_intercept,
391                   slope            = Slope,
392                   intercept        = Intercept,
393                   slope_ste        = ste_slope,
394                   intercept_ste    = ste_int,
395                   slope_interval   = ci_grad,
396                   intercept_interval = ci_int,
397                   alpha            = alpha,
398                   df_model         = dfmod,
399                   df_resid         = N-dfmod,
400                   params           = np.array([Slope,Intercept]),
401                   nobs             = N,
402                   fittedvalues     = Intercept + Slope * X0,
403                   resid            = Y0 - ( Intercept + Slope * X0  )
404                )
405
406    # return Slope, Intercept, ste_slope, ste_int, ci_grad, ci_int
407    return result

Standard Major-Axis (SMA) line fitting

Calculate standard major axis, aka reduced major axis, fit to data X and Y. The main advantage of this over ordinary least squares is that the best fit of Y to X will be the same as the best fit of X to Y.

The fit equations and confidence intervals are implemented following Warton et al. (2006). Robust fits use the FastMCD covariance estimate from Rousseeuw and Van Driessen (1999). While there are many alternative robust covariance estimators (e.g. other papers by D.I. Warton using M-estimators), the FastMCD algorithm is default in Matlab. When the standard error or uncertainty of each point is known, then weighted SMA may be preferrable to robust SMA. The conventional choice of weights for each point i is W_i = 1 / ( var(X_i) + var(Y_i) ), where var() is the variance (squared standard error).

References Warton, D. I., Wright, I. J., Falster, D. S. and Westoby, M.: Bivariate line-fitting methods for allometry, Biol. Rev., 81(02), 259, doi:10.1017/S1464793106007007, 2006. Rousseeuw, P. J. and Van Driessen, K.: A Fast Algorithm for the Minimum Covariance Determinant Estimator, Technometrics, 41(3), 1999.

Parameters
  • X, Y (array_like or str): Input values, Must have same length.
  • W (array_like or str, optional): array of weights for each X-Y point, typically W_i = 1/(var(X_i)+var(Y_i))
  • data (dict_like, optional): data structure containing variables. Used when X, Y, or W are str.
  • alpha (float (default = 0.95)): Desired confidence level [0,1] for output.
  • intercept (bool, default=True): Specify if the fitted model should include a non-zero intercept. The model will be forced through the origin (0,0) if intercept=False.
  • robust (bool, default=False): Use statistical methods that are robust to the presence of outliers
  • robust_method ({'FastMCD' (default), 'Huber', 'Biweight'}): Method for calculating robust variance and covariance. Options:
    • 'MCD' or 'FastMCD' for Fast MCD
    • 'Huber' for Huber's T: reduce, not eliminate, influence of outliers
    • 'Biweight' for Tukey's Biweight: reduces then eliminates influence of outliers
Returns
  • fitresult (dict): Contains the following keys:
    • slope (float) Slope or Gradient of Y vs. X
    • intercept (float) Y intercept.
    • slope_ste (float) Standard error of slope estimate
    • intercept_ste (float) standard error of intercept estimate
    • slope_interval ([float, float]) confidence interval for gradient at confidence level alpha
    • intercept_interval ([float, float]) confidence interval for intercept at confidence level alpha
    • alpha (float) confidence level [0,1] for slope and intercept intervals
    • df_model (float) degrees of freedom for model
    • df_resid (float) degrees of freedom for residuals
    • params ([float,float]) array of fitted parameters
    • fittedvalues (ndarray) array of fitted values
    • resid (ndarray) array of residual values
    • method (str) name of the fit method
def smafit(*args, **kwargs):
32def smafit(*args,**kwargs):
33    '''Alias for `sma`'''
34    return sma(*args,**kwargs)

Alias for sma

def sen(x, y, alpha=0.95, method='separate'):
550def sen( x, y, alpha=0.95, method='separate' ):
551    ''''Theil-Sen slope estimate
552    
553    This function wraps `scipy.stats.theilslopes` and provides
554    results in the same dict format as the other line fitting methods 
555    in this module
556    
557    Parameters
558    ----------
559    x, y : ndarray
560        dependent (x) and independent (y) variables for fitting
561    alpha : float (default = 0.95)
562        Desired confidence level [0,1] for output. 
563    method : {'separate' (default), 'joint'}
564        Method for estimating intercept. 
565        - 'separate' uses np.median(y) - slope * np.median(x)
566        - 'joint' uses np.median( y - slope * x )
567            
568    Returns
569    -------
570    fitresult : dict 
571        Contains the following keys:
572        - slope (float)
573            Slope or Gradient of Y vs. X
574        - intercept (float)
575            Y intercept.
576        - slope_ste (float)
577            Standard error of slope estimate
578        - intercept_ste (float)
579            standard error of intercept estimate
580        - slope_interval ([float, float])
581            confidence interval for gradient at confidence level alpha
582        - intercept_interval ([float, float])
583            confidence interval for intercept at confidence level alpha
584        - alpha (float)
585            confidence level [0,1] for slope and intercept intervals
586        - df_model (float)
587            degrees of freedom for model
588        - df_resid (float)
589            degrees of freedom for residuals
590        - params ([float,float])
591            array of fitted parameters
592        - fittedvalues (ndarray)
593            array of fitted values
594        - resid (ndarray)
595            array of residual values
596    '''
597
598    slope, intercept, low_slope, high_slope = theilslopes(y,x,alpha,method)
599
600    dfmod = 2
601    N = np.sum( ~np.isnan(x) * ~np.isnan(y) )
602
603    result = dict( method        = 'Theil-Sen',
604                fitintercept     = True,
605                slope            = slope,
606                intercept        = intercept,
607                slope_ste        = None,
608                intercept_ste    = None,
609                slope_interval   = [low_slope,high_slope],
610                intercept_interval = [None,None],
611                alpha            = alpha,
612                df_model         = dfmod,
613                df_resid         = N-dfmod,
614                params           = np.array([slope,intercept]),
615                nobs             = N,
616                fittedvalues     = intercept + slope * x,
617                resid            = y - ( intercept + slope * x ) 
618                )
619
620    return result

'Theil-Sen slope estimate

This function wraps scipy.stats.theilslopes and provides results in the same dict format as the other line fitting methods in this module

Parameters
  • x, y (ndarray): dependent (x) and independent (y) variables for fitting
  • alpha (float (default = 0.95)): Desired confidence level [0,1] for output.
  • method ({'separate' (default), 'joint'}): Method for estimating intercept.
    • 'separate' uses np.median(y) - slope * np.median(x)
    • 'joint' uses np.median( y - slope * x )
Returns
  • fitresult (dict): Contains the following keys:
    • slope (float) Slope or Gradient of Y vs. X
    • intercept (float) Y intercept.
    • slope_ste (float) Standard error of slope estimate
    • intercept_ste (float) standard error of intercept estimate
    • slope_interval ([float, float]) confidence interval for gradient at confidence level alpha
    • intercept_interval ([float, float]) confidence interval for intercept at confidence level alpha
    • alpha (float) confidence level [0,1] for slope and intercept intervals
    • df_model (float) degrees of freedom for model
    • df_resid (float) degrees of freedom for residuals
    • params ([float,float]) array of fitted parameters
    • fittedvalues (ndarray) array of fitted values
    • resid (ndarray) array of residual values
def sen_slope(*args, **kwargs):
29def sen_slope(*args,**kwargs):
30    '''Alias for `sen`'''
31    return sen(*args,**kwargs)

Alias for sen

def sen_numba(x, y):
623def sen_numba( x, y ):
624    '''Estimate linear trend using the Thiel-Sen method
625    
626    This non-parametric method finds the median slope among all
627    combinations of time points. 
628    scipy.stats.theilslopes provides the same slope estimate, with  
629    confidence intervals. However, this function is faster for 
630    large datasets due to Numba 
631    
632    Parameters
633    ----------
634    x : array_like (N,)
635        independent variable
636    y : array_like (N,)
637        dependent variable
638    
639    Returns
640    -------
641    sen : float
642        the median slope
643    slopes : array (N*N,)
644        all slope estimates from all combinations of x and y
645    '''
646
647    with warnings.catch_warnings():
648        warnings.simplefilter('always', DeprecationWarning)
649        warnings.warn(f'Sen function is slow unless numba.jit is used. Use scipy.stats.theilslopes instead.',
650                    DeprecationWarning, stacklevel=2)
651        
652    if len( x ) != len( y ):
653        print('Inputs x and y must have same dimension')
654        return np.nan
655
656    # Find number of time points
657    n = len( x )
658
659    # Array to hold all slope estimates
660    slopes = np.zeros(  np.ceil( n * ( n-1 ) / 2 ).astype('int') )
661    slopes[:] = np.nan
662
663    count = 0
664
665    for i in range(n):
666        for j in range(i+1, n):
667
668            # Slope between elements i and j
669            slopeij = ( y[j] - y[i] ) / ( x[j] - x[i] )
670
671            slopes[count] = slopeij
672
673            count += 1
674
675    # Thiel-Sen estimate is the median slope, neglecting NaN
676    sen = np.nanmedian( slopes )
677
678    return sen, slopes

Estimate linear trend using the Thiel-Sen method

This non-parametric method finds the median slope among all combinations of time points. scipy.stats.theilslopes provides the same slope estimate, with
confidence intervals. However, this function is faster for large datasets due to Numba

Parameters
  • x (array_like (N,)): independent variable
  • y (array_like (N,)): dependent variable
Returns
  • sen (float): the median slope
  • slopes (array (N*N,)): all slope estimates from all combinations of x and y
def york(x, y, err_x=1, err_y=1, rerr_xy=0):
409def york( x, y, err_x=1, err_y=1, rerr_xy=0 ):
410    '''York regression accounting for error in x and y
411    Follows the notation and algorithm of York et al. (2004) Section III
412    
413    Parameters
414    ----------
415    x, y : ndarray
416        dependent (x) and independent (y) variables for fitting
417    err_x, err_y : ndarray (default=1)
418        standard deviation of errors/uncertainty in x and y
419    rerr_xy : float (default=0)
420        correlation coefficient for errors in x and y, 
421        default to rerr_xy=0 meaning that the errors in x are unrelated to errors in y
422        err_x, err_y, and rerr_xy can be constants or arrays of the same length as x and y
423    
424    Returns
425    -------
426    fitresult : dict 
427        Contains the following keys:
428        - slope (float)
429            Slope or Gradient of Y vs. X
430        - intercept (float)
431            Y intercept.
432        - slope_ste (float)
433            Standard error of slope estimate
434        - intercept_ste (float)
435            standard error of intercept estimate
436        - slope_interval ([float, float])
437            confidence interval for gradient at confidence level alpha
438        - intercept_interval ([float, float])
439            confidence interval for intercept at confidence level alpha
440        - alpha (float)
441            confidence level [0,1] for slope and intercept intervals
442        - df_model (float)
443            degrees of freedom for model
444        - df_resid (float)
445            degrees of freedom for residuals
446        - params ([float,float])
447            array of fitted parameters
448        - fittedvalues (ndarray)
449            array of fitted values
450        - resid (ndarray)
451            array of residual values
452    '''
453
454    # relative error tolerance required for convergence
455    rtol = 1e-15
456
457    # Initial guess for slope, from ordinary least squares
458    result = stats.linregress( x, y )
459    b = result[0]
460
461    # Weights for x and y
462    wx = 1 / err_x**2
463    wy = 1 / err_y**2
464
465    # Combined weights
466    alpha = np.sqrt( wx * wy )
467
468    # Iterate until solution converges, but not more 50 times
469    maxiter=50
470    for i in range(1,maxiter):
471
472        # Weight for point i
473        W = wx * wy / ( wx + b**2 * wy - 2 * b * rerr_xy * alpha )
474        Wsum = np.sum( W )
475
476        # Weighted means
477        Xbar = np.sum( W * x ) / Wsum
478        Ybar = np.sum( W * y ) / Wsum
479
480        # Deviation from weighted means
481        U = x - Xbar
482        V = y - Ybar
483
484        # parameter needed for slope
485        beta = W * ( U / wy + b*V / wx - (b*U + V) * rerr_xy / alpha )
486
487        # Update slope estimate
488        bnew = np.sum( W * beta * V ) / np.sum( W * beta * U )
489
490        # Break from loop if new value is very close to old value
491        if np.abs( (bnew-b)/b ) < rtol:
492            break
493        else:
494            b = bnew
495
496    if i==maxiter:
497        raise ValueError( f'York regression failed to converge in {maxiter:d} iterations' )
498
499    # Intercept
500    a = Ybar - b * Xbar
501
502    # least-squares adjusted points, expectation values of X and Y
503    xa = Xbar + beta
504    ya = Ybar + b*beta
505
506    # Mean of adjusted points
507    xabar = np.sum( W * xa ) / Wsum
508    yabar = np.sum( W * ya ) / Wsum
509
510    # Devaiation of adjusted points from their means
511    u = xa - xabar
512    v = ya - yabar
513
514    # Variance of slope and intercept estimates
515    varb = 1 / np.sum( W * u**2 )
516    vara = 1 / Wsum + xabar**2 * varb
517
518    # Standard error of slope and intercept
519    siga = np.sqrt( vara )
520    sigb = np.sqrt( varb )
521
522    # Define a named tuple type that will contain the results
523    # result = namedtuple( 'result', 'slope intercept sigs sigi params sigma' )
524
525    # Return results as a named tuple, User can access as a regular tuple too
526    # return result( b, a, sigb, siga, [b,a], [sigb, siga] )
527
528    dfmod = 2
529    N = np.sum( ~np.isnan(x) * ~np.isnan(y) )
530
531    result = dict( method        = 'York',
532                fitintercept     = True,
533                slope            = b,
534                intercept        = a,
535                slope_ste        = sigb,
536                intercept_ste    = siga,
537                slope_interval   = [None,None],
538                intercept_interval = [None,None],
539                alpha            = alpha,
540                df_model         = dfmod,
541                df_resid         = N-dfmod,
542                params           = np.array([b,a]),
543                nobs             = N,
544                fittedvalues     = a + b * x,
545                resid            = y - ( a + b * x ) 
546                )
547
548    return result

York regression accounting for error in x and y Follows the notation and algorithm of York et al. (2004) Section III

Parameters
  • x, y (ndarray): dependent (x) and independent (y) variables for fitting
  • err_x, err_y (ndarray (default=1)): standard deviation of errors/uncertainty in x and y
  • rerr_xy (float (default=0)): correlation coefficient for errors in x and y, default to rerr_xy=0 meaning that the errors in x are unrelated to errors in y err_x, err_y, and rerr_xy can be constants or arrays of the same length as x and y
Returns
  • fitresult (dict): Contains the following keys:
    • slope (float) Slope or Gradient of Y vs. X
    • intercept (float) Y intercept.
    • slope_ste (float) Standard error of slope estimate
    • intercept_ste (float) standard error of intercept estimate
    • slope_interval ([float, float]) confidence interval for gradient at confidence level alpha
    • intercept_interval ([float, float]) confidence interval for intercept at confidence level alpha
    • alpha (float) confidence level [0,1] for slope and intercept intervals
    • df_model (float) degrees of freedom for model
    • df_resid (float) degrees of freedom for residuals
    • params ([float,float]) array of fitted parameters
    • fittedvalues (ndarray) array of fitted values
    • resid (ndarray) array of residual values