Package ivs :: Package statistics :: Module linearregression :: Class LinearModel
[hide private]
[frames] | no frames]

Class LinearModel

source code


A linear model class

The class implements a linear model of the form

y(x) = a_0 * f_0(x) + a_1 * f_1(x) + ... + a_{n-1} f_{n-1}(x)

where

Note: LinearModel instances can be added and printed.

Instance Methods [hide private]
LinearModel
__init__(self, regressors, nameList, covMatrix=None, regressorsAreWeighted=False)
Initialiser of the LinearModel class.
source code
LinearModel
__add__(self, linearModel)
Defines operator '+' of two LinearModel instances.
source code
ndarray
designMatrix(self)
Returns the design matrix of the linear model
source code
 
_singularValueDecompose(self)
Performs and stores the singular value decomposition of the design matrix
source code
ndarray
pseudoInverse(self)
Computes the pseudo-inverse in a robust way
source code
ndarray
standardCovarianceMatrix(self)
Computes the standard covariance matrix.
source code
double
conditionNumber(self)
Computes the condition number.
source code
ndarray
singularValues(self)
Return the non-zero singular values as obtained with the singular value decomposition
source code
ndarray
hatMatrix(self)
Computes the hat matrix.
source code
ndarray
traceHatMatrix(self)
Returns the trace of the (possibly weighted) hat matrix.
source code
integer
degreesOfFreedom(self)
Returns the effective number of degrees of freedom.
source code
list
regressorNames(self)
Returns the regressor names.
source code
boolean
containsRegressorName(self, regressorName)
Checks if the linear model contains the given regressor.
source code
boolean
someRegressorNameContains(self, someString)
Checks if a regressor name contains a given substring.
source code
boolean
withIntercept(self)
Checks if there is a constant regressor
source code
integer
nObservations(self)
Returns the number of observations
source code
integer
nParameters(self)
Returns the number of fit coefficients
source code
LinearFit
fitData(self, observations)
Create and return a LinearFit object.
source code
generator
submodels(self, Nmin=1, Nmax=None, nested=True, ranks=None, simpleFirst=True)
Returns a generator capable of generating submodels
source code
LinearModel
copy(self)
Returns a duplicate of the current LinearModel
source code
 
__str__(self)
Generates what is written to the console if the LinearModel is printed
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, regressors, nameList, covMatrix=None, regressorsAreWeighted=False)
(Constructor)

source code 

Initialiser of the LinearModel class.

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([sin(x),x*x], ["sin(x)", "x^2"])
Parameters:
  • regressors (list or ndarray) - either a list of equally-sized numpy arrays with the regressors evaluated in the covariates: [f_0(x),f_1(x),f_2(x),...], or an N x M design matrix (numpy array) where these regressor arrays are column-stacked, with N the number of regressors, and M the number of data points.
  • nameList (list) - list of strings of the regressors names. E.g. ["sin(x)", "cos(x)"]
  • covMatrix (ndarray) - square array containing the covariance matrix of the observations. Weights, and correlations can hence be taken into account.
  • regressorsAreWeighted (boolean) - False if regressors are not yet weighted, True otherwise
Returns: LinearModel
a LinearModel instance
Overrides: object.__init__

__add__(self, linearModel)
(Addition operator)

source code 

Defines operator '+' of two LinearModel instances.

Example:

>>> x = np.linspace(0,10,100)
>>> lm1 = LinearModel([x], ["x"])
>>> lm2 = LinearModel([x**2], ["x^2"])
>>> lm3 = lm1 + lm2    # the same as LinearModel([x, x**2], ["x", "x^2"])
Parameters:
  • linearModel (LinearModel class) - a LinearModel instance with the same number of observations as the linear model in 'self'
Returns: LinearModel
a newly created LinearModel object representing a linear model that is the sum of 'self' and 'linearModel'. 'self' and 'linearModel' are unaltered.

designMatrix(self)

source code 

Returns the design matrix of the linear model

Example:

>>> x = np.linspace(0,10,5)
>>> lm = LinearModel([x], ["x"])
>>> lm.designMatrix()
array([[   0.  ,    0.  ],
[   2.5 ,    6.25],
[   5.  ,   25.  ],
[   7.5 ,   56.25],
[  10.  ,  100.  ]])
Returns: ndarray
a N x M design matrix. If the model is written as y(x) = a_0 * f_0(x) + a_1 * f_1(x) + ... + a_{n-1} f_{n-1}(x) then the design matrix A is defined by A_{i,j} = f_i(x_j). N is the number of regressors, M is the number of data points.

_singularValueDecompose(self)

source code 

Performs and stores the singular value decomposition of the design matrix

Private class method, not to be used by the user.

The singular value decomposition of the design matrix is defined as

X = U * W * V^t

where

  • X: M x N design matrix.
  • U: M x M unitary matrix
  • W: M x N diagonal matrix
  • V: N x N unitary matrix

with M the number of observations, and N the number of regressors. Note that for a unitary matrix A: A^{-1} = A^t

Returns:
nothing

pseudoInverse(self)

source code 

Computes the pseudo-inverse in a robust way

Remarks:

  • The "robustness" is obtained by using a singular value decomposition of X, and treating the singular values in order to avoid numerical problems.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.pseudoInverse()
array([[  8.95460520e-17,   1.63870968e-01,   1.98709677e-01,
          1.04516129e-01,  -1.18709677e-01],
       [ -1.07455262e-17,  -1.80645161e-02,  -2.06451613e-02,
         -7.74193548e-03,   2.06451613e-02]])
Returns: ndarray
A numerical safe version of the N x M matrix X^t X)^{-1} X^t where X is the M x N design matrix, ^t the transpose, ^{-1} the inverse matrix. The multiplications are matrix multiplications.

standardCovarianceMatrix(self)

source code 

Computes the standard covariance matrix.

Remarks:

  • The "safeness" is obtained by using a singular value decomposition of X, and treating the singular values to avoid numerical problems.
  • The matrix C is the covariance matrix of the regression coefficients if the signal noise would be i.i.d. with sigma = 1.
  • In the case of sigma != 1, the actual covariance matrix can be obtained by simply rescaling the standard covariance matrix. See LinearFit.covarianceMatrix.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.standardCovarianceMatrix()
array([[ 0.09135484, -0.01032258],
       [-0.01032258,  0.00123871]])
Returns: ndarray
A numerical safe version of C = (X^t X)^{-1} where X is the design matrix, ^t the transpose, ^{-1} the inverse matrix. The multiplications are matrix multiplications.

conditionNumber(self)

source code 

Computes the condition number.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm..conditionNumber()
35.996606504814814
Returns: double
The ratio of the highest to the lowest singular value, where the singular values are obtained by singular value decomposition of the design matrix. A large condition number implies an ill-conditioned design matrix.

singularValues(self)

source code 

Return the non-zero singular values as obtained with the singular value decomposition

Remarks:

  • the singular values can be used a diagnostic to see how ill-conditioned the matrix inversion is.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.singularValues()
array([ 118.34194851,    3.28758625])
Returns: ndarray
The non-zero singular values.

hatMatrix(self)

source code 

Computes the hat matrix.

The hat matrix is defined by H = X (X^t X)^{-1} X^t with

  • X the (weighted) design matrix
  • ^t the transpose,
  • ^{-1} the inverse matrix

and where all multiplications are matrix multiplications. It has the property that \hat{y} = H * y where

  • \hat{y} are the (weighted) predictions
  • y the (weighted) observations.

So, it "puts a hat" on the (weighted) observation vector.

Remarks:

  • as the number of data points may be large, this matrix may become so huge that it no longer fits into your computer's memory
  • for weighted (and/or correlated) observations, the resulting weighted hat matrix does not put a hat on the unweighted observations vector, only on the weighted one.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.hatMatrix()
array([[  9.32150093e-32,   1.56705591e-16,   1.79092104e-16,
          6.71595390e-17,  -1.79092104e-16],
       [  1.56705591e-16,   2.96774194e-01,   3.67741935e-01,
          2.12903226e-01,  -1.67741935e-01],
       [  1.79092104e-16,   3.67741935e-01,   4.77419355e-01,
          3.29032258e-01,  -7.74193548e-02],
       [  6.71595390e-17,   2.12903226e-01,   3.29032258e-01,
          3.48387097e-01,   2.70967742e-01],
       [ -1.79092104e-16,  -1.67741935e-01,  -7.74193548e-02,
          2.70967742e-01,   8.77419355e-01]])
Returns: ndarray
The hat matrix.

traceHatMatrix(self)

source code 

Returns the trace of the (possibly weighted) hat matrix.

The computation of the entire MxM hat matrix (M the number of observations) is avoided to calculate the trace. This is useful when the number of observations is so high that the hat matrix does no longer fit into the memory.

Remarks:

  • If the hat matrix was computed anyway it will be used.
  • The trace of the hat matrix that puts a hat on the original observations equals the trace of the hat matrix that puts a hat on the weighted observations.

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.hatMatrix()
1.9999999999999993
Returns: ndarray
The trace of the hat matrix.

degreesOfFreedom(self)

source code 

Returns the effective number of degrees of freedom.

The d.o.f. is defined as the trace of the hat matrix. See also: Wikipedia

Example:

>>> x = linspace(0,10,5)
>>> lm = LinearModel([x, x**2], ["x", "x^2"])
>>> lm.degreesOfFreedom()
3
Returns: integer
The effective number of degrees of freedom.

regressorNames(self)

source code 

Returns the regressor names.

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([sin(x),x*x], ["sin(x)", "x^2"]) 
>>> lm.regressorNames()
["sin(x)", "x^2"]
Returns: list
list with strings containing the names of the regressors.

containsRegressorName(self, regressorName)

source code 

Checks if the linear model contains the given regressor.

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([sin(x), x*x], ["sin(x)", "x^2"])
>>> lm.containsRegressorName("x^2")
True
>>> lm.containsRegressorName("cos(x)")
False
Parameters:
  • regressorName (string) - name of a regressor
Returns: boolean
True if one of the regressor names is identical to the input 'regressorName', else False.

someRegressorNameContains(self, someString)

source code 

Checks if a regressor name contains a given substring.

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([sin(x), x*x], ["sin(x)", "x^2"])
>>> lm.someRegressorNameContains("sin")
True
Parameters:
  • someString (string) - substring to be checked
Returns: boolean
True if at least one of the regressor names contains the substring someString, else return false.

withIntercept(self)

source code 

Checks if there is a constant regressor

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([1, x*x], ["1", "x^2"])
>>> lm.withIntercept()
True
Returns: boolean
True if there is a constant regressor (the intercept). False otherwise.

nObservations(self)

source code 

Returns the number of observations

Example:

>>> x = linspace(0,10,23)
>>> lm = LinearModel([x], ["x"])
>>> lm.nObservations()
23
Returns: integer
the number of observations that the LinearModel expects

nParameters(self)

source code 

Returns the number of fit coefficients

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([sin(x),x*x], ["sin(x)", "x^2"])
>>> lm.nParameters()
2
Returns: integer
the numbero fit coefficients of the linear model

fitData(self, observations)

source code 

Create and return a LinearFit object.

From this object the user can extract all kinds of quantities related to the linear fit. See LinearFit.

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([x], ["x"])
>>> obs = x + normal(0.0, 1.0, 100)         # some simulated observations
>>> linearfit = lm.fitData(obs)
Parameters:
  • observations (ndarray) - array with the observations y_i. The size of the array should be compatible with the number of expected observations in the LinearModel.
Returns: LinearFit
a LinearFit instance, containing the fit of the observations with the linear model.

submodels(self, Nmin=1, Nmax=None, nested=True, ranks=None, simpleFirst=True)

source code 

Returns a generator capable of generating submodels

The generator sequentially returns the submodels which are instances of LinearModel with only a subset of the regressors of the parent model. The submodels with fewer regressors will always appear first in the list. The order in which the regressors appear in the submodels will be from low rank to high rank.

Examples:

If the regressors are [1, x, x**2] then

  • Nmin=1, nested=False, ranks=None gives the submodels [1], [x], [x**2], [1,x], [1,x**2], [x,x**2], [1,x,x**2]
  • Nmin=1, Nmax=2, nested=False, ranks=None gives the submodels [1], [x], [x**2], [1,x], [1,x**2], [x,x**2]
  • Nmin=2, nested=False, ranks=None gives the submodels [1,x], [1,x**2], [x,x**2], [1,x,x**2]
  • Nmin=3, nested=True, ranks=None gives the submodel [1,x,x**2]
  • Nmin=3, nested=True, ranks=[1,0,3] gives the submodel [x,1,x**2]
  • Nmin=2, nested=False, ranks=[1,0,3] gives the submodels [x,1], [x,x**2], [1,x**2], [x,1,x**2]
  • Nmin=2, nested=True, ranks=[1,0,3] gives the submodels [x,1], [x,1,x**2]

If the regressors are [1,x,sin(2x),cos(2x)] then

  • Nmin=3, nested=True, ranks=[0,1,2,2] gives the submodel [1,x,sin(2x),cos(2x)]
  • Nmin=2, nested=True, ranks=[0,1,2,2] gives the submodels [1,x], [1,x,sin(2x),cos(2x)]
  • Nmin=2, nested=False, ranks=[0,1,2,2] gives the submodels [1,x], [1,sin(2x),cos(2x)], [x,sin(2x),cos(2x)], [1,x,sin(2x),cos(2x)]
  • Nmin=1, Nmax=2, nested=False, ranks=[0,1,2,2] gives the submodels [1], [x], [sin(2x),cos(2x)], [1,x], [1,sin(2x),cos(2x)], [x,sin(2x),cos(2x)]
Parameters:
  • Nmin (integer) - Minimum number of regressors (or groups of equally ranking regressors). Should be minimal 1.
  • Nmax (integer) - Maximum number of regressors (or groups of equally ranking regressors). If == None, it will take the maximum number possible. Note: if the ranks = [0,1,2,2] then there are maximum 3 "groups" of regressors, as the latter 2 regressors will always be taken together.
  • nested (boolean) - if False: return all possible submodels, i.e. all possible combinations of regressors. If true: return only nested submodels with combinations of regressors (or groups of equally ranked regressors) so that a higher ranked regressors will never appear without the lower ranked regressors
  • ranks (list with integers) - list with (integer) rankings of the regressors, only the relative ranking of the numbers will be used, so the exact numbers are not important. For example, [1,2,3] has the same result as [10,24,35]. Regressors having the same rank should always be included together in a submodel, regardless how the input parameter 'nested' has been set. In case of nested submodels, a regressor with a higher rank should never occur without the regressors with lower rank in the same submodel.
  • simpleFirst (boolean) - if True: generate the more simple submodels first. if False: generate the complicated submodels first.
Returns: generator
a python generator implementing the yield method

copy(self)

source code 

Returns a duplicate of the current LinearModel

Example:

>>> x = linspace(0,10,100)
>>> lm = LinearModel([x], ["x"])
>>> lm2 = lm.copy()
Returns: LinearModel
of duplicate of self. Not a reference, a real copy.

__str__(self)
(Informal representation operator)

source code 

Generates what is written to the console if the LinearModel is printed

Returns:
nothing
Overrides: object.__str__