acgc.stats.bivariate_lines
Specialized methods of bivariate line fitting
- Standard major axis (SMA) also called reduced major axis (RMA)
- York regression, for data with errors in x and y
- Theil-Sen, non-parametric slope estimation (use numba to accelerate the function in this module)
1# -*- coding: utf-8 -*- 2"""Specialized methods of bivariate line fitting 3 4* Standard major axis (SMA) also called reduced major axis (RMA) 5* York regression, for data with errors in x and y 6* Theil-Sen, non-parametric slope estimation (use numba to accelerate the function in this module) 7""" 8# from collections import namedtuple 9import warnings 10import numpy as np 11import scipy.stats as stats 12from sklearn.covariance import MinCovDet 13import statsmodels.formula.api as smf 14import statsmodels.robust.norms as norms 15from scipy.stats import theilslopes 16#from numba import jit 17 18__all__ = [ 19 "bivariate_line_equation", 20 "sma", 21 "smafit", 22 "sen", 23 "sen_slope", 24 "sen_numba", 25 "york" 26] 27# Aliases 28def sen_slope(*args,**kwargs): 29 '''Alias for `sen`''' 30 return sen(*args,**kwargs) 31def smafit(*args,**kwargs): 32 '''Alias for `sma`''' 33 return sma(*args,**kwargs) 34 35def bivariate_line_equation(fitresult, 36 floatformat='{:.3f}', 37 ystring='include', 38 include_error=False ): 39 '''Write equation for the fitted line as a string 40 41 Parameters 42 ---------- 43 fitresult : dict 44 results of the line fit 45 floatformat : str 46 format string for the numerical values (default='{:.3f}') 47 ystring : {'include' (default), 'separate', 'none'} 48 specifies whether "y =" should be included in result, a separate item in tuple, or none 49 include_error : bool 50 specifies whether uncertainty terms should be included in the equation 51 52 Returns 53 ------- 54 fitline_string : str 55 equation for the the fitted line, in the form "y = a x + b" or "y = a x" 56 If uncertainty terms are included, then "y = (a ± c) x + (b ± d)" or "y = (a ± c) x" 57 ''' 58 59 # Left-hand side 60 lhs = "y_"+fitresult['method'] 61 62 # Right-hand side 63 if fitresult['fitintercept']: 64 if include_error: 65 rhs = f'({floatformat:s} ± {floatformat:s}) x + ({floatformat:s} ± {floatformat:s})'.\ 66 format( fitresult['slope'], fitresult['slope_ste'], fitresult['intercept'], fitresult['intercept_ste'] ) 67 else: 68 rhs = f'{floatformat:s} x + {floatformat:s}'.\ 69 format( fitresult['slope'], fitresult['intercept'] ) 70 else: 71 if include_error: 72 rhs = f'({floatformat:s} ± {floatformat:s}) x'.\ 73 format( fitresult['slope'], fitresult['slope_ste'] ) 74 else: 75 rhs = f'{floatformat:s} x'.\ 76 format( fitresult['slope'] ) 77 78 # Combine right and left-hand sides 79 if ystring=='include': 80 equation = f'{lhs:s} = {rhs:s}' 81 elif ystring=='separate': 82 equation = (lhs,rhs) 83 elif ystring=='none': 84 equation = rhs 85 else: 86 raise ValueError('Unrecognized value of ystring: '+ystring) 87 88 return equation 89 90def sma(X,Y,W=None, 91 data=None, 92 alpha=0.95, 93 intercept=True, 94 robust=False,robust_method='FastMCD'): 95 '''Standard Major-Axis (SMA) line fitting 96 97 Calculate standard major axis, aka reduced major axis, fit to 98 data X and Y. The main advantage of this over ordinary least squares is 99 that the best fit of Y to X will be the same as the best fit of X to Y. 100 101 The fit equations and confidence intervals are implemented following 102 Warton et al. (2006). Robust fits use the FastMCD covariance estimate 103 from Rousseeuw and Van Driessen (1999). While there are many alternative 104 robust covariance estimators (e.g. other papers by D.I. Warton using M-estimators), 105 the FastMCD algorithm is default in Matlab. When the standard error or 106 uncertainty of each point is known, then weighted SMA may be preferrable to 107 robust SMA. The conventional choice of weights for each point i is 108 W_i = 1 / ( var(X_i) + var(Y_i) ), where var() is the variance 109 (squared standard error). 110 111 References 112 Warton, D. I., Wright, I. J., Falster, D. S. and Westoby, M.: 113 Bivariate line-fitting methods for allometry, Biol. Rev., 81(02), 259, 114 doi:10.1017/S1464793106007007, 2006. 115 Rousseeuw, P. J. and Van Driessen, K.: A Fast Algorithm for the Minimum 116 Covariance Determinant Estimator, Technometrics, 41(3), 1999. 117 118 Parameters 119 ---------- 120 X, Y : array_like or str 121 Input values, Must have same length. 122 W : array_like or str, optional 123 array of weights for each X-Y point, typically W_i = 1/(var(X_i)+var(Y_i)) 124 data : dict_like, optional 125 data structure containing variables. Used when X, Y, or W are str. 126 alpha : float (default = 0.95) 127 Desired confidence level [0,1] for output. 128 intercept : bool, default=True 129 Specify if the fitted model should include a non-zero intercept. 130 The model will be forced through the origin (0,0) if intercept=False. 131 robust : bool, default=False 132 Use statistical methods that are robust to the presence of outliers 133 robust_method: {'FastMCD' (default), 'Huber', 'Biweight'} 134 Method for calculating robust variance and covariance. Options: 135 - 'MCD' or 'FastMCD' for Fast MCD 136 - 'Huber' for Huber's T: reduce, not eliminate, influence of outliers 137 - 'Biweight' for Tukey's Biweight: reduces then eliminates influence of outliers 138 139 140 Returns 141 ------- 142 fitresult : dict 143 Contains the following keys: 144 - slope (float) 145 Slope or Gradient of Y vs. X 146 - intercept (float) 147 Y intercept. 148 - slope_ste (float) 149 Standard error of slope estimate 150 - intercept_ste (float) 151 standard error of intercept estimate 152 - slope_interval ([float, float]) 153 confidence interval for gradient at confidence level alpha 154 - intercept_interval ([float, float]) 155 confidence interval for intercept at confidence level alpha 156 - alpha (float) 157 confidence level [0,1] for slope and intercept intervals 158 - df_model (float) 159 degrees of freedom for model 160 - df_resid (float) 161 degrees of freedom for residuals 162 - params ([float,float]) 163 array of fitted parameters 164 - fittedvalues (ndarray) 165 array of fitted values 166 - resid (ndarray) 167 array of residual values 168 - method (str) 169 name of the fit method 170 ''' 171 172 def str2var( v, data ): 173 '''Extract variable named v from Dataframe named data''' 174 try: 175 return data[v] 176 except Exception as exc: 177 raise ValueError( 'Argument data must be provided with a key named '+v ) from exc 178 179 # If variables are provided as strings, get values from the data structure 180 if isinstance( X, str ): 181 X = str2var( X, data ) 182 if isinstance( Y, str ): 183 Y = str2var( Y, data ) 184 if isinstance( W, str ): 185 W = str2var( W, data ) 186 187 # Make sure arrays have the same length 188 assert ( len(X) == len(Y) ), 'Arrays X and Y must have the same length' 189 if W is None: 190 W = np.zeros_like(X) + 1 191 else: 192 assert ( len(W) == len(X) ), 'Array W must have the same length as X and Y' 193 194 # Make sure alpha is within the range 0-1 195 assert (alpha < 1), 'alpha must be less than 1' 196 assert (alpha > 0), 'alpha must be greater than 0' 197 198 # Drop any NaN elements of X, Y, or W 199 # Infinite values are allowed but will make the result undefined 200 # idx = ~np.logical_or( np.isnan(X0), np.isnan(Y0) ) 201 idx = ~np.isnan(X) * ~np.isnan(Y) * ~np.isnan(W) 202 203 X0 = X[idx] 204 Y0 = Y[idx] 205 W0 = W[idx] 206 207 # Number of observations 208 N = len(X0) 209 210 include_intercept = intercept 211 212 # Degrees of freedom for the model 213 if include_intercept: 214 dfmod = 2 215 else: 216 dfmod = 1 217 218 method = 'SMA' 219 220 # Choose whether to use methods robust to outliers 221 if robust: 222 223 method = 'rSMA' 224 225 # Choose the robust method 226 if ((robust_method.lower() =='mcd') or (robust_method.lower() == 'fastmcd') ): 227 # FAST MCD 228 229 if not include_intercept: 230 # intercept=False could possibly be supported by calculating 231 # using mcd.support_ as weights in an explicit variance/covariance calculation 232 raise NotImplementedError('FastMCD method only supports SMA with intercept') 233 234 # Fit robust model of mean and covariance 235 mcd = MinCovDet().fit( np.array([X0,Y0]).T ) 236 237 # Robust mean 238 Xmean = mcd.location_[0] 239 Ymean = mcd.location_[1] 240 241 # Robust variance of X, Y 242 Vx = mcd.covariance_[0,0] 243 Vy = mcd.covariance_[1,1] 244 245 # Robust covariance 246 Vxy = mcd.covariance_[0,1] 247 248 # Number of observations used in mean and covariance estimate 249 # excludes observations marked as outliers 250 N = mcd.support_.sum() 251 252 elif ((robust_method.lower() =='biweight') or (robust_method.lower() == 'huber') ): 253 254 # Tukey's Biweight and Huber's T 255 if robust_method.lower()=='biweight': 256 norm = norms.TukeyBiweight() 257 else: 258 norm = norms.HuberT() 259 260 # Get weights for downweighting outliers 261 # Fitting a linear model the easiest way to get these 262 # Options include "TukeyBiweight" (totally removes large deviates) 263 # "HuberT" (linear, not squared weighting of large deviates) 264 rweights = smf.rlm('y~x+1',{'x':X0,'y':Y0},M=norm).fit().weights 265 266 # Sum of weight and weights squared, for convienience 267 rsum = np.sum( rweights ) 268 rsum2 = np.sum( rweights**2 ) 269 270 # Mean 271 Xmean = np.sum( X0 * rweights ) / rsum 272 Ymean = np.sum( Y0 * rweights ) / rsum 273 274 # Force intercept through zero, if requested 275 if not include_intercept: 276 Xmean = 0 277 Ymean = 0 278 279 # Variance & Covariance 280 Vx = np.sum( (X0-Xmean)**2 * rweights**2 ) / rsum2 281 Vy = np.sum( (Y0-Ymean)**2 * rweights**2 ) / rsum2 282 Vxy = np.sum( (X0-Xmean) * (Y0-Ymean) * rweights**2 ) / rsum2 283 284 # Effective number of observations 285 N = rsum 286 287 else: 288 289 raise NotImplementedError("sma hasn't implemented robust_method={:%s}".\ 290 format(robust_method)) 291 else: 292 293 if include_intercept: 294 295 wsum = np.sum(W) 296 297 # Average values 298 Xmean = np.sum(X0 * W0) / wsum 299 Ymean = np.sum(Y0 * W0) / wsum 300 301 # Covariance matrix 302 cov = np.cov( X0, Y0, ddof=1, aweights=W0**2 ) 303 304 # Variance 305 Vx = cov[0,0] 306 Vy = cov[1,1] 307 308 # Covariance 309 Vxy = cov[0,1] 310 311 else: 312 313 # Force the line to pass through origin by setting means to zero 314 Xmean = 0 315 Ymean = 0 316 317 wsum = np.sum(W0) 318 319 # Sum of squares in place of variance and covariance 320 Vx = np.sum( X0**2 * W0 ) / wsum 321 Vy = np.sum( Y0**2 * W0 ) / wsum 322 Vxy= np.sum( X0*Y0 * W0 ) / wsum 323 324 # Standard deviation 325 Sx = np.sqrt( Vx ) 326 Sy = np.sqrt( Vy ) 327 328 # Correlation coefficient (equivalent to np.corrcoef()[1,0] for non-robust cases) 329 R = Vxy / np.sqrt( Vx * Vy ) 330 331 ############# 332 # SLOPE 333 334 Slope = np.sign(R) * Sy / Sx 335 336 # Standard error of slope estimate 337 ste_slope = np.sqrt( 1/(N-dfmod) * Sy**2 / Sx**2 * (1-R**2) ) 338 339 # If variables are DataArrays, convert to scalar 340 if hasattr(Slope,'values'): 341 Slope = Slope.values 342 if hasattr(R,'values'): 343 R = R.values 344 if hasattr(N,'values'): 345 N = N.values 346 347 # Confidence interval for Slope 348 B = (1-R**2)/(N-dfmod) * stats.f.isf(1-alpha, 1, N-dfmod) 349 ci_grad = Slope * ( np.sqrt( B+1 ) + np.sqrt(B)*np.array([-1,+1]) ) 350 351 ############# 352 # INTERCEPT 353 354 if include_intercept: 355 Intercept = Ymean - Slope * Xmean 356 357 # Standard deviation of residuals 358 # New Method: Formula from smatr R package (Warton) 359 # This formula avoids large residuals of outliers when using robust=True 360 Sr = np.sqrt((Vy - 2 * Slope * Vxy + Slope**2 * Vx ) * (N-1) / (N-dfmod) ) 361 362 # OLD METHOD 363 # Standard deviation of residuals 364 #resid = Y0 - (Intercept + Slope * X0 ) 365 # Population standard deviation of the residuals 366 #Sr = np.std( resid, ddof=0 ) 367 368 # Standard error of the intercept estimate 369 ste_int = np.sqrt( Sr**2/N + Xmean**2 * ste_slope**2 ) 370 371 # If Intercept and ste_int are DataArrays, convert to float 372 if hasattr(Intercept,'values'): 373 Intercept = Intercept.values 374 if hasattr(ste_int,'values'): 375 ste_int = ste_int.values 376 377 # Confidence interval for Intercept 378 tcrit = stats.t.isf((1-alpha)/2,N-dfmod) 379 ci_int = Intercept + ste_int * np.array([-tcrit,tcrit]) 380 381 else: 382 383 # Set Intercept quantities to zero 384 Intercept = 0 385 ste_int = 0 386 ci_int = np.array([0,0]) 387 388 result = dict( method = method, 389 fitintercept = include_intercept, 390 slope = Slope, 391 intercept = Intercept, 392 slope_ste = ste_slope, 393 intercept_ste = ste_int, 394 slope_interval = ci_grad, 395 intercept_interval = ci_int, 396 alpha = alpha, 397 df_model = dfmod, 398 df_resid = N-dfmod, 399 params = np.array([Slope,Intercept]), 400 nobs = N, 401 fittedvalues = Intercept + Slope * X0, 402 resid = Y0 - ( Intercept + Slope * X0 ) 403 ) 404 405 # return Slope, Intercept, ste_slope, ste_int, ci_grad, ci_int 406 return result 407 408def york( x, y, err_x=1, err_y=1, rerr_xy=0 ): 409 '''York regression accounting for error in x and y 410 Follows the notation and algorithm of York et al. (2004) Section III 411 412 Parameters 413 ---------- 414 x, y : ndarray 415 dependent (x) and independent (y) variables for fitting 416 err_x, err_y : ndarray (default=1) 417 standard deviation of errors/uncertainty in x and y 418 rerr_xy : float (default=0) 419 correlation coefficient for errors in x and y, 420 default to rerr_xy=0 meaning that the errors in x are unrelated to errors in y 421 err_x, err_y, and rerr_xy can be constants or arrays of the same length as x and y 422 423 Returns 424 ------- 425 fitresult : dict 426 Contains the following keys: 427 - slope (float) 428 Slope or Gradient of Y vs. X 429 - intercept (float) 430 Y intercept. 431 - slope_ste (float) 432 Standard error of slope estimate 433 - intercept_ste (float) 434 standard error of intercept estimate 435 - slope_interval ([float, float]) 436 confidence interval for gradient at confidence level alpha 437 - intercept_interval ([float, float]) 438 confidence interval for intercept at confidence level alpha 439 - alpha (float) 440 confidence level [0,1] for slope and intercept intervals 441 - df_model (float) 442 degrees of freedom for model 443 - df_resid (float) 444 degrees of freedom for residuals 445 - params ([float,float]) 446 array of fitted parameters 447 - fittedvalues (ndarray) 448 array of fitted values 449 - resid (ndarray) 450 array of residual values 451 ''' 452 453 # relative error tolerance required for convergence 454 rtol = 1e-15 455 456 # Initial guess for slope, from ordinary least squares 457 result = stats.linregress( x, y ) 458 b = result[0] 459 460 # Weights for x and y 461 wx = 1 / err_x**2 462 wy = 1 / err_y**2 463 464 # Combined weights 465 alpha = np.sqrt( wx * wy ) 466 467 # Iterate until solution converges, but not more 50 times 468 maxiter=50 469 for i in range(1,maxiter): 470 471 # Weight for point i 472 W = wx * wy / ( wx + b**2 * wy - 2 * b * rerr_xy * alpha ) 473 Wsum = np.sum( W ) 474 475 # Weighted means 476 Xbar = np.sum( W * x ) / Wsum 477 Ybar = np.sum( W * y ) / Wsum 478 479 # Deviation from weighted means 480 U = x - Xbar 481 V = y - Ybar 482 483 # parameter needed for slope 484 beta = W * ( U / wy + b*V / wx - (b*U + V) * rerr_xy / alpha ) 485 486 # Update slope estimate 487 bnew = np.sum( W * beta * V ) / np.sum( W * beta * U ) 488 489 # Break from loop if new value is very close to old value 490 if np.abs( (bnew-b)/b ) < rtol: 491 break 492 else: 493 b = bnew 494 495 if i==maxiter: 496 raise ValueError( f'York regression failed to converge in {maxiter:d} iterations' ) 497 498 # Intercept 499 a = Ybar - b * Xbar 500 501 # least-squares adjusted points, expectation values of X and Y 502 xa = Xbar + beta 503 ya = Ybar + b*beta 504 505 # Mean of adjusted points 506 xabar = np.sum( W * xa ) / Wsum 507 yabar = np.sum( W * ya ) / Wsum 508 509 # Devaiation of adjusted points from their means 510 u = xa - xabar 511 v = ya - yabar 512 513 # Variance of slope and intercept estimates 514 varb = 1 / np.sum( W * u**2 ) 515 vara = 1 / Wsum + xabar**2 * varb 516 517 # Standard error of slope and intercept 518 siga = np.sqrt( vara ) 519 sigb = np.sqrt( varb ) 520 521 # Define a named tuple type that will contain the results 522 # result = namedtuple( 'result', 'slope intercept sigs sigi params sigma' ) 523 524 # Return results as a named tuple, User can access as a regular tuple too 525 # return result( b, a, sigb, siga, [b,a], [sigb, siga] ) 526 527 dfmod = 2 528 N = np.sum( ~np.isnan(x) * ~np.isnan(y) ) 529 530 result = dict( method = 'York', 531 fitintercept = True, 532 slope = b, 533 intercept = a, 534 slope_ste = sigb, 535 intercept_ste = siga, 536 slope_interval = [None,None], 537 intercept_interval = [None,None], 538 alpha = alpha, 539 df_model = dfmod, 540 df_resid = N-dfmod, 541 params = np.array([b,a]), 542 nobs = N, 543 fittedvalues = a + b * x, 544 resid = y - ( a + b * x ) 545 ) 546 547 return result 548 549def sen( x, y, alpha=0.95, method='separate' ): 550 ''''Theil-Sen slope estimate 551 552 This function wraps `scipy.stats.theilslopes` and provides 553 results in the same dict format as the other line fitting methods 554 in this module 555 556 Parameters 557 ---------- 558 x, y : ndarray 559 dependent (x) and independent (y) variables for fitting 560 alpha : float (default = 0.95) 561 Desired confidence level [0,1] for output. 562 method : {'separate' (default), 'joint'} 563 Method for estimating intercept. 564 - 'separate' uses np.median(y) - slope * np.median(x) 565 - 'joint' uses np.median( y - slope * x ) 566 567 Returns 568 ------- 569 fitresult : dict 570 Contains the following keys: 571 - slope (float) 572 Slope or Gradient of Y vs. X 573 - intercept (float) 574 Y intercept. 575 - slope_ste (float) 576 Standard error of slope estimate 577 - intercept_ste (float) 578 standard error of intercept estimate 579 - slope_interval ([float, float]) 580 confidence interval for gradient at confidence level alpha 581 - intercept_interval ([float, float]) 582 confidence interval for intercept at confidence level alpha 583 - alpha (float) 584 confidence level [0,1] for slope and intercept intervals 585 - df_model (float) 586 degrees of freedom for model 587 - df_resid (float) 588 degrees of freedom for residuals 589 - params ([float,float]) 590 array of fitted parameters 591 - fittedvalues (ndarray) 592 array of fitted values 593 - resid (ndarray) 594 array of residual values 595 ''' 596 597 slope, intercept, low_slope, high_slope = theilslopes(y,x,alpha,method) 598 599 dfmod = 2 600 N = np.sum( ~np.isnan(x) * ~np.isnan(y) ) 601 602 result = dict( method = 'Theil-Sen', 603 fitintercept = True, 604 slope = slope, 605 intercept = intercept, 606 slope_ste = None, 607 intercept_ste = None, 608 slope_interval = [low_slope,high_slope], 609 intercept_interval = [None,None], 610 alpha = alpha, 611 df_model = dfmod, 612 df_resid = N-dfmod, 613 params = np.array([slope,intercept]), 614 nobs = N, 615 fittedvalues = intercept + slope * x, 616 resid = y - ( intercept + slope * x ) 617 ) 618 619 return result 620 621#@jit(nopython=True) 622def sen_numba( x, y ): 623 '''Estimate linear trend using the Thiel-Sen method 624 625 This non-parametric method finds the median slope among all 626 combinations of time points. 627 scipy.stats.theilslopes provides the same slope estimate, with 628 confidence intervals. However, this function is faster for 629 large datasets due to Numba 630 631 Parameters 632 ---------- 633 x : array_like (N,) 634 independent variable 635 y : array_like (N,) 636 dependent variable 637 638 Returns 639 ------- 640 sen : float 641 the median slope 642 slopes : array (N*N,) 643 all slope estimates from all combinations of x and y 644 ''' 645 646 with warnings.catch_warnings(): 647 warnings.simplefilter('always', DeprecationWarning) 648 warnings.warn(f'Sen function is slow unless numba.jit is used. Use scipy.stats.theilslopes instead.', 649 DeprecationWarning, stacklevel=2) 650 651 if len( x ) != len( y ): 652 print('Inputs x and y must have same dimension') 653 return np.nan 654 655 # Find number of time points 656 n = len( x ) 657 658 # Array to hold all slope estimates 659 slopes = np.zeros( np.ceil( n * ( n-1 ) / 2 ).astype('int') ) 660 slopes[:] = np.nan 661 662 count = 0 663 664 for i in range(n): 665 for j in range(i+1, n): 666 667 # Slope between elements i and j 668 slopeij = ( y[j] - y[i] ) / ( x[j] - x[i] ) 669 670 slopes[count] = slopeij 671 672 count += 1 673 674 # Thiel-Sen estimate is the median slope, neglecting NaN 675 sen = np.nanmedian( slopes ) 676 677 return sen, slopes
36def bivariate_line_equation(fitresult, 37 floatformat='{:.3f}', 38 ystring='include', 39 include_error=False ): 40 '''Write equation for the fitted line as a string 41 42 Parameters 43 ---------- 44 fitresult : dict 45 results of the line fit 46 floatformat : str 47 format string for the numerical values (default='{:.3f}') 48 ystring : {'include' (default), 'separate', 'none'} 49 specifies whether "y =" should be included in result, a separate item in tuple, or none 50 include_error : bool 51 specifies whether uncertainty terms should be included in the equation 52 53 Returns 54 ------- 55 fitline_string : str 56 equation for the the fitted line, in the form "y = a x + b" or "y = a x" 57 If uncertainty terms are included, then "y = (a ± c) x + (b ± d)" or "y = (a ± c) x" 58 ''' 59 60 # Left-hand side 61 lhs = "y_"+fitresult['method'] 62 63 # Right-hand side 64 if fitresult['fitintercept']: 65 if include_error: 66 rhs = f'({floatformat:s} ± {floatformat:s}) x + ({floatformat:s} ± {floatformat:s})'.\ 67 format( fitresult['slope'], fitresult['slope_ste'], fitresult['intercept'], fitresult['intercept_ste'] ) 68 else: 69 rhs = f'{floatformat:s} x + {floatformat:s}'.\ 70 format( fitresult['slope'], fitresult['intercept'] ) 71 else: 72 if include_error: 73 rhs = f'({floatformat:s} ± {floatformat:s}) x'.\ 74 format( fitresult['slope'], fitresult['slope_ste'] ) 75 else: 76 rhs = f'{floatformat:s} x'.\ 77 format( fitresult['slope'] ) 78 79 # Combine right and left-hand sides 80 if ystring=='include': 81 equation = f'{lhs:s} = {rhs:s}' 82 elif ystring=='separate': 83 equation = (lhs,rhs) 84 elif ystring=='none': 85 equation = rhs 86 else: 87 raise ValueError('Unrecognized value of ystring: '+ystring) 88 89 return equation
Write equation for the fitted line as a string
Parameters
- fitresult (dict): results of the line fit
- floatformat (str): format string for the numerical values (default='{:.3f}')
- ystring ({'include' (default), 'separate', 'none'}): specifies whether "y =" should be included in result, a separate item in tuple, or none
- include_error (bool): specifies whether uncertainty terms should be included in the equation
Returns
- fitline_string (str): equation for the the fitted line, in the form "y = a x + b" or "y = a x" If uncertainty terms are included, then "y = (a ± c) x + (b ± d)" or "y = (a ± c) x"
91def sma(X,Y,W=None, 92 data=None, 93 alpha=0.95, 94 intercept=True, 95 robust=False,robust_method='FastMCD'): 96 '''Standard Major-Axis (SMA) line fitting 97 98 Calculate standard major axis, aka reduced major axis, fit to 99 data X and Y. The main advantage of this over ordinary least squares is 100 that the best fit of Y to X will be the same as the best fit of X to Y. 101 102 The fit equations and confidence intervals are implemented following 103 Warton et al. (2006). Robust fits use the FastMCD covariance estimate 104 from Rousseeuw and Van Driessen (1999). While there are many alternative 105 robust covariance estimators (e.g. other papers by D.I. Warton using M-estimators), 106 the FastMCD algorithm is default in Matlab. When the standard error or 107 uncertainty of each point is known, then weighted SMA may be preferrable to 108 robust SMA. The conventional choice of weights for each point i is 109 W_i = 1 / ( var(X_i) + var(Y_i) ), where var() is the variance 110 (squared standard error). 111 112 References 113 Warton, D. I., Wright, I. J., Falster, D. S. and Westoby, M.: 114 Bivariate line-fitting methods for allometry, Biol. Rev., 81(02), 259, 115 doi:10.1017/S1464793106007007, 2006. 116 Rousseeuw, P. J. and Van Driessen, K.: A Fast Algorithm for the Minimum 117 Covariance Determinant Estimator, Technometrics, 41(3), 1999. 118 119 Parameters 120 ---------- 121 X, Y : array_like or str 122 Input values, Must have same length. 123 W : array_like or str, optional 124 array of weights for each X-Y point, typically W_i = 1/(var(X_i)+var(Y_i)) 125 data : dict_like, optional 126 data structure containing variables. Used when X, Y, or W are str. 127 alpha : float (default = 0.95) 128 Desired confidence level [0,1] for output. 129 intercept : bool, default=True 130 Specify if the fitted model should include a non-zero intercept. 131 The model will be forced through the origin (0,0) if intercept=False. 132 robust : bool, default=False 133 Use statistical methods that are robust to the presence of outliers 134 robust_method: {'FastMCD' (default), 'Huber', 'Biweight'} 135 Method for calculating robust variance and covariance. Options: 136 - 'MCD' or 'FastMCD' for Fast MCD 137 - 'Huber' for Huber's T: reduce, not eliminate, influence of outliers 138 - 'Biweight' for Tukey's Biweight: reduces then eliminates influence of outliers 139 140 141 Returns 142 ------- 143 fitresult : dict 144 Contains the following keys: 145 - slope (float) 146 Slope or Gradient of Y vs. X 147 - intercept (float) 148 Y intercept. 149 - slope_ste (float) 150 Standard error of slope estimate 151 - intercept_ste (float) 152 standard error of intercept estimate 153 - slope_interval ([float, float]) 154 confidence interval for gradient at confidence level alpha 155 - intercept_interval ([float, float]) 156 confidence interval for intercept at confidence level alpha 157 - alpha (float) 158 confidence level [0,1] for slope and intercept intervals 159 - df_model (float) 160 degrees of freedom for model 161 - df_resid (float) 162 degrees of freedom for residuals 163 - params ([float,float]) 164 array of fitted parameters 165 - fittedvalues (ndarray) 166 array of fitted values 167 - resid (ndarray) 168 array of residual values 169 - method (str) 170 name of the fit method 171 ''' 172 173 def str2var( v, data ): 174 '''Extract variable named v from Dataframe named data''' 175 try: 176 return data[v] 177 except Exception as exc: 178 raise ValueError( 'Argument data must be provided with a key named '+v ) from exc 179 180 # If variables are provided as strings, get values from the data structure 181 if isinstance( X, str ): 182 X = str2var( X, data ) 183 if isinstance( Y, str ): 184 Y = str2var( Y, data ) 185 if isinstance( W, str ): 186 W = str2var( W, data ) 187 188 # Make sure arrays have the same length 189 assert ( len(X) == len(Y) ), 'Arrays X and Y must have the same length' 190 if W is None: 191 W = np.zeros_like(X) + 1 192 else: 193 assert ( len(W) == len(X) ), 'Array W must have the same length as X and Y' 194 195 # Make sure alpha is within the range 0-1 196 assert (alpha < 1), 'alpha must be less than 1' 197 assert (alpha > 0), 'alpha must be greater than 0' 198 199 # Drop any NaN elements of X, Y, or W 200 # Infinite values are allowed but will make the result undefined 201 # idx = ~np.logical_or( np.isnan(X0), np.isnan(Y0) ) 202 idx = ~np.isnan(X) * ~np.isnan(Y) * ~np.isnan(W) 203 204 X0 = X[idx] 205 Y0 = Y[idx] 206 W0 = W[idx] 207 208 # Number of observations 209 N = len(X0) 210 211 include_intercept = intercept 212 213 # Degrees of freedom for the model 214 if include_intercept: 215 dfmod = 2 216 else: 217 dfmod = 1 218 219 method = 'SMA' 220 221 # Choose whether to use methods robust to outliers 222 if robust: 223 224 method = 'rSMA' 225 226 # Choose the robust method 227 if ((robust_method.lower() =='mcd') or (robust_method.lower() == 'fastmcd') ): 228 # FAST MCD 229 230 if not include_intercept: 231 # intercept=False could possibly be supported by calculating 232 # using mcd.support_ as weights in an explicit variance/covariance calculation 233 raise NotImplementedError('FastMCD method only supports SMA with intercept') 234 235 # Fit robust model of mean and covariance 236 mcd = MinCovDet().fit( np.array([X0,Y0]).T ) 237 238 # Robust mean 239 Xmean = mcd.location_[0] 240 Ymean = mcd.location_[1] 241 242 # Robust variance of X, Y 243 Vx = mcd.covariance_[0,0] 244 Vy = mcd.covariance_[1,1] 245 246 # Robust covariance 247 Vxy = mcd.covariance_[0,1] 248 249 # Number of observations used in mean and covariance estimate 250 # excludes observations marked as outliers 251 N = mcd.support_.sum() 252 253 elif ((robust_method.lower() =='biweight') or (robust_method.lower() == 'huber') ): 254 255 # Tukey's Biweight and Huber's T 256 if robust_method.lower()=='biweight': 257 norm = norms.TukeyBiweight() 258 else: 259 norm = norms.HuberT() 260 261 # Get weights for downweighting outliers 262 # Fitting a linear model the easiest way to get these 263 # Options include "TukeyBiweight" (totally removes large deviates) 264 # "HuberT" (linear, not squared weighting of large deviates) 265 rweights = smf.rlm('y~x+1',{'x':X0,'y':Y0},M=norm).fit().weights 266 267 # Sum of weight and weights squared, for convienience 268 rsum = np.sum( rweights ) 269 rsum2 = np.sum( rweights**2 ) 270 271 # Mean 272 Xmean = np.sum( X0 * rweights ) / rsum 273 Ymean = np.sum( Y0 * rweights ) / rsum 274 275 # Force intercept through zero, if requested 276 if not include_intercept: 277 Xmean = 0 278 Ymean = 0 279 280 # Variance & Covariance 281 Vx = np.sum( (X0-Xmean)**2 * rweights**2 ) / rsum2 282 Vy = np.sum( (Y0-Ymean)**2 * rweights**2 ) / rsum2 283 Vxy = np.sum( (X0-Xmean) * (Y0-Ymean) * rweights**2 ) / rsum2 284 285 # Effective number of observations 286 N = rsum 287 288 else: 289 290 raise NotImplementedError("sma hasn't implemented robust_method={:%s}".\ 291 format(robust_method)) 292 else: 293 294 if include_intercept: 295 296 wsum = np.sum(W) 297 298 # Average values 299 Xmean = np.sum(X0 * W0) / wsum 300 Ymean = np.sum(Y0 * W0) / wsum 301 302 # Covariance matrix 303 cov = np.cov( X0, Y0, ddof=1, aweights=W0**2 ) 304 305 # Variance 306 Vx = cov[0,0] 307 Vy = cov[1,1] 308 309 # Covariance 310 Vxy = cov[0,1] 311 312 else: 313 314 # Force the line to pass through origin by setting means to zero 315 Xmean = 0 316 Ymean = 0 317 318 wsum = np.sum(W0) 319 320 # Sum of squares in place of variance and covariance 321 Vx = np.sum( X0**2 * W0 ) / wsum 322 Vy = np.sum( Y0**2 * W0 ) / wsum 323 Vxy= np.sum( X0*Y0 * W0 ) / wsum 324 325 # Standard deviation 326 Sx = np.sqrt( Vx ) 327 Sy = np.sqrt( Vy ) 328 329 # Correlation coefficient (equivalent to np.corrcoef()[1,0] for non-robust cases) 330 R = Vxy / np.sqrt( Vx * Vy ) 331 332 ############# 333 # SLOPE 334 335 Slope = np.sign(R) * Sy / Sx 336 337 # Standard error of slope estimate 338 ste_slope = np.sqrt( 1/(N-dfmod) * Sy**2 / Sx**2 * (1-R**2) ) 339 340 # If variables are DataArrays, convert to scalar 341 if hasattr(Slope,'values'): 342 Slope = Slope.values 343 if hasattr(R,'values'): 344 R = R.values 345 if hasattr(N,'values'): 346 N = N.values 347 348 # Confidence interval for Slope 349 B = (1-R**2)/(N-dfmod) * stats.f.isf(1-alpha, 1, N-dfmod) 350 ci_grad = Slope * ( np.sqrt( B+1 ) + np.sqrt(B)*np.array([-1,+1]) ) 351 352 ############# 353 # INTERCEPT 354 355 if include_intercept: 356 Intercept = Ymean - Slope * Xmean 357 358 # Standard deviation of residuals 359 # New Method: Formula from smatr R package (Warton) 360 # This formula avoids large residuals of outliers when using robust=True 361 Sr = np.sqrt((Vy - 2 * Slope * Vxy + Slope**2 * Vx ) * (N-1) / (N-dfmod) ) 362 363 # OLD METHOD 364 # Standard deviation of residuals 365 #resid = Y0 - (Intercept + Slope * X0 ) 366 # Population standard deviation of the residuals 367 #Sr = np.std( resid, ddof=0 ) 368 369 # Standard error of the intercept estimate 370 ste_int = np.sqrt( Sr**2/N + Xmean**2 * ste_slope**2 ) 371 372 # If Intercept and ste_int are DataArrays, convert to float 373 if hasattr(Intercept,'values'): 374 Intercept = Intercept.values 375 if hasattr(ste_int,'values'): 376 ste_int = ste_int.values 377 378 # Confidence interval for Intercept 379 tcrit = stats.t.isf((1-alpha)/2,N-dfmod) 380 ci_int = Intercept + ste_int * np.array([-tcrit,tcrit]) 381 382 else: 383 384 # Set Intercept quantities to zero 385 Intercept = 0 386 ste_int = 0 387 ci_int = np.array([0,0]) 388 389 result = dict( method = method, 390 fitintercept = include_intercept, 391 slope = Slope, 392 intercept = Intercept, 393 slope_ste = ste_slope, 394 intercept_ste = ste_int, 395 slope_interval = ci_grad, 396 intercept_interval = ci_int, 397 alpha = alpha, 398 df_model = dfmod, 399 df_resid = N-dfmod, 400 params = np.array([Slope,Intercept]), 401 nobs = N, 402 fittedvalues = Intercept + Slope * X0, 403 resid = Y0 - ( Intercept + Slope * X0 ) 404 ) 405 406 # return Slope, Intercept, ste_slope, ste_int, ci_grad, ci_int 407 return result
Standard Major-Axis (SMA) line fitting
Calculate standard major axis, aka reduced major axis, fit to data X and Y. The main advantage of this over ordinary least squares is that the best fit of Y to X will be the same as the best fit of X to Y.
The fit equations and confidence intervals are implemented following Warton et al. (2006). Robust fits use the FastMCD covariance estimate from Rousseeuw and Van Driessen (1999). While there are many alternative robust covariance estimators (e.g. other papers by D.I. Warton using M-estimators), the FastMCD algorithm is default in Matlab. When the standard error or uncertainty of each point is known, then weighted SMA may be preferrable to robust SMA. The conventional choice of weights for each point i is W_i = 1 / ( var(X_i) + var(Y_i) ), where var() is the variance (squared standard error).
References Warton, D. I., Wright, I. J., Falster, D. S. and Westoby, M.: Bivariate line-fitting methods for allometry, Biol. Rev., 81(02), 259, doi:10.1017/S1464793106007007, 2006. Rousseeuw, P. J. and Van Driessen, K.: A Fast Algorithm for the Minimum Covariance Determinant Estimator, Technometrics, 41(3), 1999.
Parameters
- X, Y (array_like or str): Input values, Must have same length.
- W (array_like or str, optional): array of weights for each X-Y point, typically W_i = 1/(var(X_i)+var(Y_i))
- data (dict_like, optional): data structure containing variables. Used when X, Y, or W are str.
- alpha (float (default = 0.95)): Desired confidence level [0,1] for output.
- intercept (bool, default=True): Specify if the fitted model should include a non-zero intercept. The model will be forced through the origin (0,0) if intercept=False.
- robust (bool, default=False): Use statistical methods that are robust to the presence of outliers
- robust_method ({'FastMCD' (default), 'Huber', 'Biweight'}):
Method for calculating robust variance and covariance. Options:
- 'MCD' or 'FastMCD' for Fast MCD
- 'Huber' for Huber's T: reduce, not eliminate, influence of outliers
- 'Biweight' for Tukey's Biweight: reduces then eliminates influence of outliers
Returns
- fitresult (dict):
Contains the following keys:
- slope (float) Slope or Gradient of Y vs. X
- intercept (float) Y intercept.
- slope_ste (float) Standard error of slope estimate
- intercept_ste (float) standard error of intercept estimate
- slope_interval ([float, float]) confidence interval for gradient at confidence level alpha
- intercept_interval ([float, float]) confidence interval for intercept at confidence level alpha
- alpha (float) confidence level [0,1] for slope and intercept intervals
- df_model (float) degrees of freedom for model
- df_resid (float) degrees of freedom for residuals
- params ([float,float]) array of fitted parameters
- fittedvalues (ndarray) array of fitted values
- resid (ndarray) array of residual values
- method (str) name of the fit method
Alias for sma
550def sen( x, y, alpha=0.95, method='separate' ): 551 ''''Theil-Sen slope estimate 552 553 This function wraps `scipy.stats.theilslopes` and provides 554 results in the same dict format as the other line fitting methods 555 in this module 556 557 Parameters 558 ---------- 559 x, y : ndarray 560 dependent (x) and independent (y) variables for fitting 561 alpha : float (default = 0.95) 562 Desired confidence level [0,1] for output. 563 method : {'separate' (default), 'joint'} 564 Method for estimating intercept. 565 - 'separate' uses np.median(y) - slope * np.median(x) 566 - 'joint' uses np.median( y - slope * x ) 567 568 Returns 569 ------- 570 fitresult : dict 571 Contains the following keys: 572 - slope (float) 573 Slope or Gradient of Y vs. X 574 - intercept (float) 575 Y intercept. 576 - slope_ste (float) 577 Standard error of slope estimate 578 - intercept_ste (float) 579 standard error of intercept estimate 580 - slope_interval ([float, float]) 581 confidence interval for gradient at confidence level alpha 582 - intercept_interval ([float, float]) 583 confidence interval for intercept at confidence level alpha 584 - alpha (float) 585 confidence level [0,1] for slope and intercept intervals 586 - df_model (float) 587 degrees of freedom for model 588 - df_resid (float) 589 degrees of freedom for residuals 590 - params ([float,float]) 591 array of fitted parameters 592 - fittedvalues (ndarray) 593 array of fitted values 594 - resid (ndarray) 595 array of residual values 596 ''' 597 598 slope, intercept, low_slope, high_slope = theilslopes(y,x,alpha,method) 599 600 dfmod = 2 601 N = np.sum( ~np.isnan(x) * ~np.isnan(y) ) 602 603 result = dict( method = 'Theil-Sen', 604 fitintercept = True, 605 slope = slope, 606 intercept = intercept, 607 slope_ste = None, 608 intercept_ste = None, 609 slope_interval = [low_slope,high_slope], 610 intercept_interval = [None,None], 611 alpha = alpha, 612 df_model = dfmod, 613 df_resid = N-dfmod, 614 params = np.array([slope,intercept]), 615 nobs = N, 616 fittedvalues = intercept + slope * x, 617 resid = y - ( intercept + slope * x ) 618 ) 619 620 return result
'Theil-Sen slope estimate
This function wraps scipy.stats.theilslopes and provides
results in the same dict format as the other line fitting methods
in this module
Parameters
- x, y (ndarray): dependent (x) and independent (y) variables for fitting
- alpha (float (default = 0.95)): Desired confidence level [0,1] for output.
- method ({'separate' (default), 'joint'}):
Method for estimating intercept.
- 'separate' uses np.median(y) - slope * np.median(x)
- 'joint' uses np.median( y - slope * x )
Returns
- fitresult (dict):
Contains the following keys:
- slope (float) Slope or Gradient of Y vs. X
- intercept (float) Y intercept.
- slope_ste (float) Standard error of slope estimate
- intercept_ste (float) standard error of intercept estimate
- slope_interval ([float, float]) confidence interval for gradient at confidence level alpha
- intercept_interval ([float, float]) confidence interval for intercept at confidence level alpha
- alpha (float) confidence level [0,1] for slope and intercept intervals
- df_model (float) degrees of freedom for model
- df_resid (float) degrees of freedom for residuals
- params ([float,float]) array of fitted parameters
- fittedvalues (ndarray) array of fitted values
- resid (ndarray) array of residual values
Alias for sen
623def sen_numba( x, y ): 624 '''Estimate linear trend using the Thiel-Sen method 625 626 This non-parametric method finds the median slope among all 627 combinations of time points. 628 scipy.stats.theilslopes provides the same slope estimate, with 629 confidence intervals. However, this function is faster for 630 large datasets due to Numba 631 632 Parameters 633 ---------- 634 x : array_like (N,) 635 independent variable 636 y : array_like (N,) 637 dependent variable 638 639 Returns 640 ------- 641 sen : float 642 the median slope 643 slopes : array (N*N,) 644 all slope estimates from all combinations of x and y 645 ''' 646 647 with warnings.catch_warnings(): 648 warnings.simplefilter('always', DeprecationWarning) 649 warnings.warn(f'Sen function is slow unless numba.jit is used. Use scipy.stats.theilslopes instead.', 650 DeprecationWarning, stacklevel=2) 651 652 if len( x ) != len( y ): 653 print('Inputs x and y must have same dimension') 654 return np.nan 655 656 # Find number of time points 657 n = len( x ) 658 659 # Array to hold all slope estimates 660 slopes = np.zeros( np.ceil( n * ( n-1 ) / 2 ).astype('int') ) 661 slopes[:] = np.nan 662 663 count = 0 664 665 for i in range(n): 666 for j in range(i+1, n): 667 668 # Slope between elements i and j 669 slopeij = ( y[j] - y[i] ) / ( x[j] - x[i] ) 670 671 slopes[count] = slopeij 672 673 count += 1 674 675 # Thiel-Sen estimate is the median slope, neglecting NaN 676 sen = np.nanmedian( slopes ) 677 678 return sen, slopes
Estimate linear trend using the Thiel-Sen method
This non-parametric method finds the median slope among all
combinations of time points.
scipy.stats.theilslopes provides the same slope estimate, with
confidence intervals. However, this function is faster for
large datasets due to Numba
Parameters
- x (array_like (N,)): independent variable
- y (array_like (N,)): dependent variable
Returns
- sen (float): the median slope
- slopes (array (N*N,)): all slope estimates from all combinations of x and y
409def york( x, y, err_x=1, err_y=1, rerr_xy=0 ): 410 '''York regression accounting for error in x and y 411 Follows the notation and algorithm of York et al. (2004) Section III 412 413 Parameters 414 ---------- 415 x, y : ndarray 416 dependent (x) and independent (y) variables for fitting 417 err_x, err_y : ndarray (default=1) 418 standard deviation of errors/uncertainty in x and y 419 rerr_xy : float (default=0) 420 correlation coefficient for errors in x and y, 421 default to rerr_xy=0 meaning that the errors in x are unrelated to errors in y 422 err_x, err_y, and rerr_xy can be constants or arrays of the same length as x and y 423 424 Returns 425 ------- 426 fitresult : dict 427 Contains the following keys: 428 - slope (float) 429 Slope or Gradient of Y vs. X 430 - intercept (float) 431 Y intercept. 432 - slope_ste (float) 433 Standard error of slope estimate 434 - intercept_ste (float) 435 standard error of intercept estimate 436 - slope_interval ([float, float]) 437 confidence interval for gradient at confidence level alpha 438 - intercept_interval ([float, float]) 439 confidence interval for intercept at confidence level alpha 440 - alpha (float) 441 confidence level [0,1] for slope and intercept intervals 442 - df_model (float) 443 degrees of freedom for model 444 - df_resid (float) 445 degrees of freedom for residuals 446 - params ([float,float]) 447 array of fitted parameters 448 - fittedvalues (ndarray) 449 array of fitted values 450 - resid (ndarray) 451 array of residual values 452 ''' 453 454 # relative error tolerance required for convergence 455 rtol = 1e-15 456 457 # Initial guess for slope, from ordinary least squares 458 result = stats.linregress( x, y ) 459 b = result[0] 460 461 # Weights for x and y 462 wx = 1 / err_x**2 463 wy = 1 / err_y**2 464 465 # Combined weights 466 alpha = np.sqrt( wx * wy ) 467 468 # Iterate until solution converges, but not more 50 times 469 maxiter=50 470 for i in range(1,maxiter): 471 472 # Weight for point i 473 W = wx * wy / ( wx + b**2 * wy - 2 * b * rerr_xy * alpha ) 474 Wsum = np.sum( W ) 475 476 # Weighted means 477 Xbar = np.sum( W * x ) / Wsum 478 Ybar = np.sum( W * y ) / Wsum 479 480 # Deviation from weighted means 481 U = x - Xbar 482 V = y - Ybar 483 484 # parameter needed for slope 485 beta = W * ( U / wy + b*V / wx - (b*U + V) * rerr_xy / alpha ) 486 487 # Update slope estimate 488 bnew = np.sum( W * beta * V ) / np.sum( W * beta * U ) 489 490 # Break from loop if new value is very close to old value 491 if np.abs( (bnew-b)/b ) < rtol: 492 break 493 else: 494 b = bnew 495 496 if i==maxiter: 497 raise ValueError( f'York regression failed to converge in {maxiter:d} iterations' ) 498 499 # Intercept 500 a = Ybar - b * Xbar 501 502 # least-squares adjusted points, expectation values of X and Y 503 xa = Xbar + beta 504 ya = Ybar + b*beta 505 506 # Mean of adjusted points 507 xabar = np.sum( W * xa ) / Wsum 508 yabar = np.sum( W * ya ) / Wsum 509 510 # Devaiation of adjusted points from their means 511 u = xa - xabar 512 v = ya - yabar 513 514 # Variance of slope and intercept estimates 515 varb = 1 / np.sum( W * u**2 ) 516 vara = 1 / Wsum + xabar**2 * varb 517 518 # Standard error of slope and intercept 519 siga = np.sqrt( vara ) 520 sigb = np.sqrt( varb ) 521 522 # Define a named tuple type that will contain the results 523 # result = namedtuple( 'result', 'slope intercept sigs sigi params sigma' ) 524 525 # Return results as a named tuple, User can access as a regular tuple too 526 # return result( b, a, sigb, siga, [b,a], [sigb, siga] ) 527 528 dfmod = 2 529 N = np.sum( ~np.isnan(x) * ~np.isnan(y) ) 530 531 result = dict( method = 'York', 532 fitintercept = True, 533 slope = b, 534 intercept = a, 535 slope_ste = sigb, 536 intercept_ste = siga, 537 slope_interval = [None,None], 538 intercept_interval = [None,None], 539 alpha = alpha, 540 df_model = dfmod, 541 df_resid = N-dfmod, 542 params = np.array([b,a]), 543 nobs = N, 544 fittedvalues = a + b * x, 545 resid = y - ( a + b * x ) 546 ) 547 548 return result
York regression accounting for error in x and y Follows the notation and algorithm of York et al. (2004) Section III
Parameters
- x, y (ndarray): dependent (x) and independent (y) variables for fitting
- err_x, err_y (ndarray (default=1)): standard deviation of errors/uncertainty in x and y
- rerr_xy (float (default=0)): correlation coefficient for errors in x and y, default to rerr_xy=0 meaning that the errors in x are unrelated to errors in y err_x, err_y, and rerr_xy can be constants or arrays of the same length as x and y
Returns
- fitresult (dict):
Contains the following keys:
- slope (float) Slope or Gradient of Y vs. X
- intercept (float) Y intercept.
- slope_ste (float) Standard error of slope estimate
- intercept_ste (float) standard error of intercept estimate
- slope_interval ([float, float]) confidence interval for gradient at confidence level alpha
- intercept_interval ([float, float]) confidence interval for intercept at confidence level alpha
- alpha (float) confidence level [0,1] for slope and intercept intervals
- df_model (float) degrees of freedom for model
- df_resid (float) degrees of freedom for residuals
- params ([float,float]) array of fitted parameters
- fittedvalues (ndarray) array of fitted values
- resid (ndarray) array of residual values