Package ivs :: Package statistics :: Module linearregression
[hide private]
[frames] | no frames]

Module linearregression

source code

Easy linear fitting. Author: Joris De Ridder

A crash course

The following example will illustrate some of the core functionality of the package. We first make some "fake" observations obs that were measured as a function of x.

>>> from numpy import *
>>> noise = array([0.44, -0.48, 0.26, -2.00, -0.93, 2.21, -0.57, -2.04, -1.09, 1.53])
>>> x = linspace(0, 5, 10)
>>> obs = 2.0 + 3.0 * exp(x) + noise          # our "observations"

The model we want to fit is y = a_0 + a_1 * exp(x) = a_0 * 1 + a_1 * exp(x):

>>> myModel = LinearModel([ones(10), exp(x)], ["1", "exp(x)"])
>>> print(myModel)
Model: y = a_0 + a_1 * exp(x)
Expected number of observations: 10

Now we fit the observations:

>>> myFit = myModel.fitData(obs)
>>> myFit.summary()
Model: y = a_0 + a_1 * exp(x)
Regression coefficients:
    a_0 = 1.519644 +/- 0.574938
    a_1 = 3.006151 +/- 0.010033
Sum of squared residuals: 16.755934
Estimated variance of the residuals: 2.094492
Coefficient of determination R^2: 0.999911
F-statistic: 89767.825042
AIC: 15.161673
BIC: 12.069429

The more advanced features

A lot of interesting information can be extracted from the LinearModel and LinearFit classes. A simple way to list the available methods on the prompt is:

>>> [method for method in dir(myModel) if not method.startswith("_")]

With the LinearModel class we can compute, for example:

>>> myModel.conditionNumber()
71.994675255533011
>>> myModel.singularValues()
array([ 181.21519026,    2.51706379])
>>> myModel.degreesOfFreedom()
8
>>> myModel.traceHatMatrix()
1.9999999999999996

and more. Some examples with the LinearFit class:

>>> myFit.regressionCoefficients()
array([ 1.51964437,  3.00615141])
>>> myFit.errorBars()
array([ 0.57493781,  0.01003345])
>>> myFit.covarianceMatrix()            # correlation matrix also available
array([[  3.30553480e-01,  -3.49164676e-03],
       [ -3.49164676e-03,   1.00670216e-04]])
>>> myFit.confidenceIntervals(0.05)
(array([ 0.19383541,  2.98301422]), array([ 2.84545333,  3.0292886 ]))

The 95% confidence interval of coefficient a_0 is thus [0.19, 2.85]. See the list of methods for more functionality. Note that also weights can be taken into account.

To evaluate your linear model in a new set of covariates 'xnew', use the method evaluate(). Be aware, however, that you need to give the regressors evaluated in the new covariates, not only the covariates. For example:

>>> xnew = linspace(-5.0, +5.0, 20)
>>> y = myFit.evaluate([ones_like(xnew), exp(xnew)])
>>> print(y)
[   1.53989966    1.55393018    1.57767944    1.61787944    1.68592536
    1.80110565    1.99606954    2.32608192    2.8846888     3.83023405
    5.43074394    8.13990238   12.72565316   20.48788288   33.62688959
   55.86708392   93.51271836  157.23490405  265.09646647  447.67207215]

Assessing the models

Test if we can reject the null hypothesis that all coefficients are zero with a 5% significance level:

>>> myFit.FstatisticTest(0.05)
True

The "True" result means that H0 can indeed be rejected. Now we test if we can reject the null hypothesis that one particular coefficient is zero with a 5% significance level

>>> myFit.regressorTtest(0.05)
array([ True,  True], dtype=bool)

We obtained "True" for both coefficients, so both a_0 and a_1 are significant.

Some handy derived classes

There is a shortcut to define polynomials:

>>> myPolynomial = PolynomialModel(x, "x", [3,1])
>>> print myPolynomial
Model: y = a_0 * x^3 + a_1 * x^1
Expected number of observations: 10

and to define harmonic models:

>>> myHarmonicModel = HarmonicModel(x, "x", [2.13, 3.88], ["f1", "f2"], maxNharmonics=2)
>>> print myHarmonicModel
Model: y = a_0 * sin(2*pi*1*f1*x) + a_1 * cos(2*pi*1*f1*x) + a_2 * sin(2*pi*2*f1*x) + a_3 * cos(2*pi*2*f1*x) + a_4 * sin(2*pi*1*f2*x) + a_5 * cos(2*pi*1*f2*x) + a_6 * sin(2*pi*2*f2*x) + a_7 * cos(2*pi*2*f2*x)
Expected number of observations: 10

One can even add the two models:

>>> myCombinedModel = myPolynomial + myHarmonicModel
>>> print myCombinedModel
Model: y = a_0 * x^3 + a_1 * x^1 + a_2 * sin(2*pi*1*f1*x) + a_3 * cos(2*pi*1*f1*x) + a_4 * sin(2*pi*2*f1*x) + a_5 * cos(2*pi*2*f1*x) + a_6 * sin(2*pi*1*f2*x) + a_7 * cos(2*pi*1*f2*x) + a_8 * sin(2*pi*2*f2*x) + a_9 * cos(2*pi*2*f2*x)
Expected number of observations: 10

Notes

The package is aimed to be robust and efficient.

Classes [hide private]
  LinearModel
A linear model class
  PolynomialModel
Class to implement a polynomial model
  HarmonicModel
Class to implement a harmonic model function.
  LinearFit
A class that allows to easily extract the fit coefficients, covariance matrix, predictions, residuals, etc.
Variables [hide private]
  sys
  copy
  sqrt
  log
  pi
  combinations
  np
  sp
  stats