Package ivs :: Package statistics :: Module linearregression :: Class LinearModel

[frames] | no frames]

Class LinearModel

source code

A linear model class

The class implements a linear model of the form

y(x) = a_0 * f_0(x) + a_1 * f_1(x) + ... + a_{n-1} f_{n-1}(x)

where

y are the responses (observables)
x are the covariates
a_i are the regression coefficients (to be determined)
f_i(x) are the regressors evaluated in the covariates.

Note: LinearModel instances can be added and printed.

Instance Methods

[hide private]

LinearModel

__init__(self, regressors, nameList, covMatrix=None, regressorsAreWeighted=False)
Initialiser of the LinearModel class.

source code

LinearModel

__add__(self, linearModel)
Defines operator '+' of two LinearModel instances.

source code

ndarray

designMatrix(self)
Returns the design matrix of the linear model

source code

_singularValueDecompose(self)
Performs and stores the singular value decomposition of the design matrix

source code

ndarray

pseudoInverse(self)
Computes the pseudo-inverse in a robust way

source code

ndarray

standardCovarianceMatrix(self)
Computes the standard covariance matrix.

source code

double

conditionNumber(self)
Computes the condition number.

source code

ndarray

singularValues(self)
Return the non-zero singular values as obtained with the singular value decomposition

source code

ndarray

hatMatrix(self)
Computes the hat matrix.

source code

ndarray

traceHatMatrix(self)
Returns the trace of the (possibly weighted) hat matrix.

source code

integer

degreesOfFreedom(self)
Returns the effective number of degrees of freedom.

source code

list

regressorNames(self)
Returns the regressor names.

source code

boolean

containsRegressorName(self, regressorName)
Checks if the linear model contains the given regressor.

source code

boolean

someRegressorNameContains(self, someString)
Checks if a regressor name contains a given substring.

source code

boolean

withIntercept(self)
Checks if there is a constant regressor

source code

integer

nObservations(self)
Returns the number of observations

source code

integer

nParameters(self)
Returns the number of fit coefficients

source code

LinearFit

fitData(self, observations)
Create and return a LinearFit object.

source code

generator

submodels(self, Nmin=1, Nmax=None, nested=True, ranks=None, simpleFirst=True)
Returns a generator capable of generating submodels

source code

LinearModel

copy(self)
Returns a duplicate of the current LinearModel

source code

__str__(self)
Generates what is written to the console if the LinearModel is printed

source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __subclasshook__

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

init(self, regressors, nameList, covMatrix=None, regressorsAreWeighted=False)
(Constructor)

source code

Initialiser of the LinearModel class.

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([sin(x),x*x], ["sin(x)", "x^2"])

Parameters:

regressors (list or ndarray) - either a list of equally-sized numpy arrays with the regressors evaluated in the covariates: [f_0(x),f_1(x),f_2(x),...], or an N x M design matrix (numpy array) where these regressor arrays are column-stacked, with N the number of regressors, and M the number of data points.
nameList (list) - list of strings of the regressors names. E.g. ["sin(x)", "cos(x)"]
covMatrix (ndarray) - square array containing the covariance matrix of the observations. Weights, and correlations can hence be taken into account.
regressorsAreWeighted (boolean) - False if regressors are not yet weighted, True otherwise

Returns: LinearModel

a LinearModel instance

Overrides: object.__init__

add(self, linearModel)
(Addition operator)

source code

Defines operator '+' of two LinearModel instances.

Example:

>>> x = np.linspace(0,10,100)
>>> lm1 = LinearModel([x], ["x"])
>>> lm2 = LinearModel([x**2], ["x^2"])
>>> lm3 = lm1 + lm2    # the same as LinearModel([x, x**2], ["x", "x^2"])

Parameters:

linearModel (LinearModel class) - a LinearModel instance with the same number of observations as the linear model in 'self'

Returns: LinearModel

a newly created LinearModel object representing a linear model that is the sum of 'self' and 'linearModel'. 'self' and 'linearModel' are unaltered.

designMatrix(self)

source code

Returns the design matrix of the linear model

Example:

>>> x = np.linspace(0,10,5)
>>> lm = LinearModel([x], ["x"])
>>> lm.designMatrix()
array([[   0.  ,    0.  ],
[   2.5 ,    6.25],
[   5.  ,   25.  ],
[   7.5 ,   56.25],
[  10.  ,  100.  ]])

Returns: ndarray: a N x M design matrix. If the model is written as y(x) = a_0 * f_0(x) + a_1 * f_1(x) + ... + a_{n-1} f_{n-1}(x) then the design matrix A is defined by A_{i,j} = f_i(x_j). N is the number of regressors, M is the number of data points.

_singularValueDecompose(self)

source code

Performs and stores the singular value decomposition of the design matrix

Private class method, not to be used by the user.

The singular value decomposition of the design matrix is defined as

X = U * W * V^t

where

X: M x N design matrix.
U: M x M unitary matrix
W: M x N diagonal matrix
V: N x N unitary matrix

with M the number of observations, and N the number of regressors. Note that for a unitary matrix A: A^{-1} = A^t

Returns:: nothing

pseudoInverse(self)

source code

Computes the pseudo-inverse in a robust way

Remarks:

The "robustness" is obtained by using a singular value decomposition of X, and treating the singular values in order to avoid numerical problems.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.pseudoInverse()
array([[  8.95460520e-17,   1.63870968e-01,   1.98709677e-01,
          1.04516129e-01,  -1.18709677e-01],
       [ -1.07455262e-17,  -1.80645161e-02,  -2.06451613e-02,
         -7.74193548e-03,   2.06451613e-02]])

Returns: ndarray: A numerical safe version of the N x M matrix X^t X)^{-1} X^t where X is the M x N design matrix, ^t the transpose, ^{-1} the inverse matrix. The multiplications are matrix multiplications.

standardCovarianceMatrix(self)

source code

Computes the standard covariance matrix.

Remarks:

The "safeness" is obtained by using a singular value decomposition of X, and treating the singular values to avoid numerical problems.
The matrix C is the covariance matrix of the regression coefficients if the signal noise would be i.i.d. with sigma = 1.
In the case of sigma != 1, the actual covariance matrix can be obtained by simply rescaling the standard covariance matrix. See LinearFit.covarianceMatrix.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.standardCovarianceMatrix()
array([[ 0.09135484, -0.01032258],
       [-0.01032258,  0.00123871]])

Returns: ndarray: A numerical safe version of C = (X^t X)^{-1} where X is the design matrix, ^t the transpose, ^{-1} the inverse matrix. The multiplications are matrix multiplications.

conditionNumber(self)

source code

Computes the condition number.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm..conditionNumber()
35.996606504814814

Returns: double: The ratio of the highest to the lowest singular value, where the singular values are obtained by singular value decomposition of the design matrix. A large condition number implies an ill-conditioned design matrix.

singularValues(self)

source code

Return the non-zero singular values as obtained with the singular value decomposition

Remarks:

the singular values can be used a diagnostic to see how ill-conditioned the matrix inversion is.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.singularValues()
array([ 118.34194851,    3.28758625])

Returns: ndarray: The non-zero singular values.

hatMatrix(self)

source code

Computes the hat matrix.

The hat matrix is defined by H = X (X^t X)^{-1} X^t with

X the (weighted) design matrix
^t the transpose,
^{-1} the inverse matrix

and where all multiplications are matrix multiplications. It has the property that \hat{y} = H * y where

\hat{y} are the (weighted) predictions
y the (weighted) observations.

So, it "puts a hat" on the (weighted) observation vector.

Remarks:

as the number of data points may be large, this matrix may become so huge that it no longer fits into your computer's memory
for weighted (and/or correlated) observations, the resulting weighted hat matrix does not put a hat on the unweighted observations vector, only on the weighted one.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.hatMatrix()
array([[  9.32150093e-32,   1.56705591e-16,   1.79092104e-16,
          6.71595390e-17,  -1.79092104e-16],
       [  1.56705591e-16,   2.96774194e-01,   3.67741935e-01,
          2.12903226e-01,  -1.67741935e-01],
       [  1.79092104e-16,   3.67741935e-01,   4.77419355e-01,
          3.29032258e-01,  -7.74193548e-02],
       [  6.71595390e-17,   2.12903226e-01,   3.29032258e-01,
          3.48387097e-01,   2.70967742e-01],
       [ -1.79092104e-16,  -1.67741935e-01,  -7.74193548e-02,
          2.70967742e-01,   8.77419355e-01]])

Returns: ndarray: The hat matrix.

traceHatMatrix(self)

source code

Returns the trace of the (possibly weighted) hat matrix.

The computation of the entire MxM hat matrix (M the number of observations) is avoided to calculate the trace. This is useful when the number of observations is so high that the hat matrix does no longer fit into the memory.

Remarks:

If the hat matrix was computed anyway it will be used.
The trace of the hat matrix that puts a hat on the original observations equals the trace of the hat matrix that puts a hat on the weighted observations.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.hatMatrix()
1.9999999999999993

Returns: ndarray: The trace of the hat matrix.

degreesOfFreedom(self)

source code

Returns the effective number of degrees of freedom.

The d.o.f. is defined as the trace of the hat matrix. See also: Wikipedia

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.degreesOfFreedom()
3

Returns: integer: The effective number of degrees of freedom.

regressorNames(self)

source code

Returns the regressor names.

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([sin(x),x*x], ["sin(x)", "x^2"]) 
>>> lm.regressorNames()
["sin(x)", "x^2"]

Returns: list: list with strings containing the names of the regressors.

containsRegressorName(self, regressorName)

source code

Checks if the linear model contains the given regressor.

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([sin(x), x*x], ["sin(x)", "x^2"])
>>> lm.containsRegressorName("x^2")
True
>>> lm.containsRegressorName("cos(x)")
False

Parameters:

regressorName (string) - name of a regressor

Returns: boolean

True if one of the regressor names is identical to the input 'regressorName', else False.

someRegressorNameContains(self, someString)

source code

Checks if a regressor name contains a given substring.

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([sin(x), x*x], ["sin(x)", "x^2"])
>>> lm.someRegressorNameContains("sin")
True

Parameters:

someString (string) - substring to be checked

Returns: boolean

True if at least one of the regressor names contains the substring someString, else return false.

withIntercept(self)

source code

Checks if there is a constant regressor

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([1, x*x], ["1", "x^2"])
>>> lm.withIntercept()
True

Returns: boolean: True if there is a constant regressor (the intercept). False otherwise.

nObservations(self)

source code

Returns the number of observations

Example:

>>> x = linspace(0,10,23)
>>> lm = LinearModel([x], ["x"])
>>> lm.nObservations()
23

Returns: integer: the number of observations that the LinearModel expects

nParameters(self)

source code

Returns the number of fit coefficients

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([sin(x),x*x], ["sin(x)", "x^2"])
>>> lm.nParameters()
2

Returns: integer: the numbero fit coefficients of the linear model

fitData(self, observations)

source code

Create and return a LinearFit object.

From this object the user can extract all kinds of quantities related to the linear fit. See LinearFit.

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([x], ["x"])
>>> obs = x + normal(0.0, 1.0, 100)         # some simulated observations
>>> linearfit = lm.fitData(obs)

Parameters:

observations (ndarray) - array with the observations y_i. The size of the array should be compatible with the number of expected observations in the LinearModel.

Returns: LinearFit

a LinearFit instance, containing the fit of the observations with the linear model.

submodels(self, Nmin=1, Nmax=None, nested=True, ranks=None, simpleFirst=True)

source code

Returns a generator capable of generating submodels

The generator sequentially returns the submodels which are instances of LinearModel with only a subset of the regressors of the parent model. The submodels with fewer regressors will always appear first in the list. The order in which the regressors appear in the submodels will be from low rank to high rank.

Examples:

If the regressors are [1, x, x**2] then

Nmin=1, nested=False, ranks=None gives the submodels [1], [x], [x**2], [1,x], [1,x**2], [x,x**2], [1,x,x**2]
Nmin=1, Nmax=2, nested=False, ranks=None gives the submodels [1], [x], [x**2], [1,x], [1,x**2], [x,x**2]
Nmin=2, nested=False, ranks=None gives the submodels [1,x], [1,x**2], [x,x**2], [1,x,x**2]
Nmin=3, nested=True, ranks=None gives the submodel [1,x,x**2]
Nmin=3, nested=True, ranks=[1,0,3] gives the submodel [x,1,x**2]
Nmin=2, nested=False, ranks=[1,0,3] gives the submodels [x,1], [x,x**2], [1,x**2], [x,1,x**2]
Nmin=2, nested=True, ranks=[1,0,3] gives the submodels [x,1], [x,1,x**2]

If the regressors are [1,x,sin(2x),cos(2x)] then

Nmin=3, nested=True, ranks=[0,1,2,2] gives the submodel [1,x,sin(2x),cos(2x)]
Nmin=2, nested=True, ranks=[0,1,2,2] gives the submodels [1,x], [1,x,sin(2x),cos(2x)]
Nmin=2, nested=False, ranks=[0,1,2,2] gives the submodels [1,x], [1,sin(2x),cos(2x)], [x,sin(2x),cos(2x)], [1,x,sin(2x),cos(2x)]
Nmin=1, Nmax=2, nested=False, ranks=[0,1,2,2] gives the submodels [1], [x], [sin(2x),cos(2x)], [1,x], [1,sin(2x),cos(2x)], [x,sin(2x),cos(2x)]

Parameters:

Nmin (integer) - Minimum number of regressors (or groups of equally ranking regressors). Should be minimal 1.
Nmax (integer) - Maximum number of regressors (or groups of equally ranking regressors). If == None, it will take the maximum number possible. Note: if the ranks = [0,1,2,2] then there are maximum 3 "groups" of regressors, as the latter 2 regressors will always be taken together.
nested (boolean) - if False: return all possible submodels, i.e. all possible combinations of regressors. If true: return only nested submodels with combinations of regressors (or groups of equally ranked regressors) so that a higher ranked regressors will never appear without the lower ranked regressors
ranks (list with integers) - list with (integer) rankings of the regressors, only the relative ranking of the numbers will be used, so the exact numbers are not important. For example, [1,2,3] has the same result as [10,24,35]. Regressors having the same rank should always be included together in a submodel, regardless how the input parameter 'nested' has been set. In case of nested submodels, a regressor with a higher rank should never occur without the regressors with lower rank in the same submodel.
simpleFirst (boolean) - if True: generate the more simple submodels first. if False: generate the complicated submodels first.

Returns: generator

a python generator implementing the yield method

copy(self)

source code

Returns a duplicate of the current LinearModel

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([x], ["x"])
>>> lm2 = lm.copy()

Returns: LinearModel: of duplicate of self. Not a reference, a real copy.

str(self)
(Informal representation operator)

source code

Generates what is written to the console if the LinearModel is printed

Returns:: nothing
Overrides: object.__str__

Class LinearModel

__init__(self, regressors, nameList, covMatrix=None, regressorsAreWeighted=False) (Constructor)

__add__(self, linearModel) (Addition operator)

designMatrix(self)

_singularValueDecompose(self)

pseudoInverse(self)

standardCovarianceMatrix(self)

conditionNumber(self)

singularValues(self)

hatMatrix(self)

traceHatMatrix(self)

degreesOfFreedom(self)

regressorNames(self)

containsRegressorName(self, regressorName)

someRegressorNameContains(self, someString)

withIntercept(self)

nObservations(self)

nParameters(self)

fitData(self, observations)

submodels(self, Nmin=1, Nmax=None, nested=True, ranks=None, simpleFirst=True)

copy(self)

__str__(self) (Informal representation operator)

init(self, regressors, nameList, covMatrix=None, regressorsAreWeighted=False)
(Constructor)

add(self, linearModel)
(Addition operator)

str(self)
(Informal representation operator)