1
2 """
3 Easy linear fitting.
4 Author: Joris De Ridder
5
6 A crash course
7 ==============
8 The following example will illustrate some of the core functionality of the package.
9 We first make some "fake" observations I{obs} that were measured as a function of I{x}.
10
11 >>> from numpy import *
12 >>> noise = array([0.44, -0.48, 0.26, -2.00, -0.93, 2.21, -0.57, -2.04, -1.09, 1.53])
13 >>> x = linspace(0, 5, 10)
14 >>> obs = 2.0 + 3.0 * exp(x) + noise # our "observations"
15
16 The model we want to fit is y = a_0 + a_1 * exp(x) = a_0 * 1 + a_1 * exp(x):
17
18 >>> myModel = LinearModel([ones(10), exp(x)], ["1", "exp(x)"])
19 >>> print(myModel)
20 Model: y = a_0 + a_1 * exp(x)
21 Expected number of observations: 10
22
23 Now we fit the observations:
24
25 >>> myFit = myModel.fitData(obs)
26 >>> myFit.summary()
27 Model: y = a_0 + a_1 * exp(x)
28 Regression coefficients:
29 a_0 = 1.519644 +/- 0.574938
30 a_1 = 3.006151 +/- 0.010033
31 Sum of squared residuals: 16.755934
32 Estimated variance of the residuals: 2.094492
33 Coefficient of determination R^2: 0.999911
34 F-statistic: 89767.825042
35 AIC: 15.161673
36 BIC: 12.069429
37
38
39 The more advanced features
40 ==========================
41
42 A lot of interesting information can be extracted from the LinearModel and LinearFit classes.
43 A simple way to list the available methods on the prompt is:
44
45 >>> [method for method in dir(myModel) if not method.startswith("_")]
46
47 With the LinearModel class we can compute, for example:
48
49 >>> myModel.conditionNumber()
50 71.994675255533011
51 >>> myModel.singularValues()
52 array([ 181.21519026, 2.51706379])
53 >>> myModel.degreesOfFreedom()
54 8
55 >>> myModel.traceHatMatrix()
56 1.9999999999999996
57
58 and more. Some examples with the LinearFit class:
59
60 >>> myFit.regressionCoefficients()
61 array([ 1.51964437, 3.00615141])
62 >>> myFit.errorBars()
63 array([ 0.57493781, 0.01003345])
64 >>> myFit.covarianceMatrix() # correlation matrix also available
65 array([[ 3.30553480e-01, -3.49164676e-03],
66 [ -3.49164676e-03, 1.00670216e-04]])
67 >>> myFit.confidenceIntervals(0.05)
68 (array([ 0.19383541, 2.98301422]), array([ 2.84545333, 3.0292886 ]))
69
70 The 95% confidence interval of coefficient a_0 is thus [0.19, 2.85]. See the list
71 of methods for more functionality. Note that also weights can be taken into account.
72
73 To evaluate your linear model in a new set of covariates 'xnew', use the method evaluate().
74 Be aware, however, that you need to give the regressors evaluated in the new covariates,
75 not only the covariates. For example:
76
77 >>> xnew = linspace(-5.0, +5.0, 20)
78 >>> y = myFit.evaluate([ones_like(xnew), exp(xnew)])
79 >>> print(y)
80 [ 1.53989966 1.55393018 1.57767944 1.61787944 1.68592536
81 1.80110565 1.99606954 2.32608192 2.8846888 3.83023405
82 5.43074394 8.13990238 12.72565316 20.48788288 33.62688959
83 55.86708392 93.51271836 157.23490405 265.09646647 447.67207215]
84
85
86 Assessing the models
87 ====================
88
89 Test if we can reject the null hypothesis that all coefficients are zero
90 with a 5% significance level:
91
92 >>> myFit.FstatisticTest(0.05)
93 True
94
95 The "True" result means that H0 can indeed be rejected. Now we test if we can
96 reject the null hypothesis that one particular coefficient is zero with a 5%
97 significance level
98
99 >>> myFit.regressorTtest(0.05)
100 array([ True, True], dtype=bool)
101
102 We obtained "True" for both coefficients, so both a_0 and a_1 are significant.
103
104
105 Some handy derived classes
106 ==========================
107
108 There is a shortcut to define polynomials:
109
110 >>> myPolynomial = PolynomialModel(x, "x", [3,1])
111 >>> print myPolynomial
112 Model: y = a_0 * x^3 + a_1 * x^1
113 Expected number of observations: 10
114
115 and to define harmonic models:
116
117 >>> myHarmonicModel = HarmonicModel(x, "x", [2.13, 3.88], ["f1", "f2"], maxNharmonics=2)
118 >>> print myHarmonicModel
119 Model: y = a_0 * sin(2*pi*1*f1*x) + a_1 * cos(2*pi*1*f1*x) + a_2 * sin(2*pi*2*f1*x) + a_3 * cos(2*pi*2*f1*x) + a_4 * sin(2*pi*1*f2*x) + a_5 * cos(2*pi*1*f2*x) + a_6 * sin(2*pi*2*f2*x) + a_7 * cos(2*pi*2*f2*x)
120 Expected number of observations: 10
121
122 One can even add the two models:
123
124 >>> myCombinedModel = myPolynomial + myHarmonicModel
125 >>> print myCombinedModel
126 Model: y = a_0 * x^3 + a_1 * x^1 + a_2 * sin(2*pi*1*f1*x) + a_3 * cos(2*pi*1*f1*x) + a_4 * sin(2*pi*2*f1*x) + a_5 * cos(2*pi*2*f1*x) + a_6 * sin(2*pi*1*f2*x) + a_7 * cos(2*pi*1*f2*x) + a_8 * sin(2*pi*2*f2*x) + a_9 * cos(2*pi*2*f2*x)
127 Expected number of observations: 10
128
129
130 Notes
131 =====
132 The package is aimed to be robust and efficient.
133 - B{Robust}: Singular value decomposition is used to do the fit, where the
134 singular values are treated to avoid numerical problems. This is
135 especially useful when you tend to overfit the data.
136 - B{Efficient}: asking twice for the same information will not lead to a
137 recomputation. Results are kept internally. The reason for
138 having a separate LinearModel and LinearFit class is that
139 the bulk of the fitting computations can actually be done
140 without the observations. Fitting the same model to different
141 sets of observations can therefore be done a lot faster this
142 way.
143
144 """
145
146
147 import sys
148 import copy
149 from math import sqrt,log,pi
150 from itertools import combinations
151 import numpy as np
152 import scipy as sp
153 import scipy.stats as stats
154
155
156
157
158
160
161 """
162 A linear model class
163
164 The class implements a linear model of the form
165
166 C{y(x) = a_0 * f_0(x) + a_1 * f_1(x) + ... + a_{n-1} f_{n-1}(x)}
167
168 where
169 - y are the responses (observables)
170 - x are the covariates
171 - a_i are the regression coefficients (to be determined)
172 - f_i(x) are the regressors evaluated in the covariates.
173
174 Note: LinearModel instances can be added and printed.
175 """
176
177
178 - def __init__(self, regressors, nameList, covMatrix = None, regressorsAreWeighted = False):
179
180 """
181 Initialiser of the LinearModel class.
182
183 Example:
184
185 >>> x = linspace(0,10,100)
186 >>> lm = LinearModel([sin(x),x*x], ["sin(x)", "x^2"])
187
188 @param regressors: either a list of equally-sized numpy arrays with the regressors
189 evaluated in the covariates: [f_0(x),f_1(x),f_2(x),...],
190 or an N x M design matrix (numpy array) where these regressor arrays
191 are column-stacked, with N the number of regressors, and M the number
192 of data points.
193 @type regressors: list or ndarray
194 @param nameList: list of strings of the regressors names. E.g. ["sin(x)", "cos(x)"]
195 @type nameList: list
196 @param covMatrix: square array containing the covariance matrix of the
197 observations. Weights, and correlations can hence be
198 taken into account.
199 @type covMatrix: ndarray
200 @param regressorsAreWeighted: False if regressors are not yet weighted, True otherwise
201 @type regressorsAreWeighted: boolean
202 @return: a LinearModel instance
203 @rtype: LinearModel
204
205 """
206
207
208
209 if isinstance(regressors, list):
210 self._designMatrix = np.column_stack(np.double(regressors))
211 self._nParameters = len(regressors)
212 self._nObservations = len(regressors[0])
213 elif isinstance(regressors, np.ndarray):
214 self._designMatrix = np.double(regressors)
215 self._nParameters = regressors.shape[1]
216 self._nObservations = regressors.shape[0]
217 else:
218 raise TypeError, "LinearModel only accepts a list of regressors, or a design matrix"
219
220
221
222
223 if len(nameList) != self._nParameters:
224 raise ValueError, "Number of names not equal to number of regressors"
225 else:
226 self._regressorNames = copy.copy(nameList)
227
228
229
230
231 if covMatrix == None:
232 self._covMatrixObserv = None
233 elif not isinstance(covMatrix, np.ndarray):
234 raise TypeError, "Covariance matrix of observations needs to be an ndarray object"
235 else:
236 if len(covMatrix.shape) != 2:
237 raise TypeError, "Covariance matrix not a 2-dimensional array"
238 elif covMatrix.shape[0] != self._nObservations:
239 raise TypeError, "Size of covariance matrix not compatible with number of observations"
240 elif covMatrix.shape[1] != covMatrix.shape[0]:
241 raise TypeError, "Covariance matrix is not a square matrix"
242 else:
243 self._covMatrixObserv = covMatrix
244
245
246
247
248
249
250
251
252
253
254 if (self._covMatrixObserv != None) and (regressorsAreWeighted == False):
255 self._choleskyLower = np.linalg.cholesky(self._covMatrixObserv)
256 for n in range(self._designMatrix.shape[1]):
257 self._designMatrix[:,n] = np.linalg.solve(self._choleskyLower, self._designMatrix[:,n])
258 else:
259 self._choleskyLower = None
260
261
262
263
264
265 self._pseudoInverse = None
266 self._standardCovarianceMatrix = None
267 self._conditionNumber = None
268 self._hatMatrix = None
269 self._traceHatMatrix = None
270 self._degreesOfFreedom = None
271 self._withIntercept = None
272 self._U = None
273 self._V = None
274 self._w = None
275 self._B = None
276 self._svdTOL = np.finfo(np.double).eps * self._nObservations
277
278
279
280
281
283
284 """
285 Defines operator '+' of two LinearModel instances.
286
287 Example:
288
289 >>> x = np.linspace(0,10,100)
290 >>> lm1 = LinearModel([x], ["x"])
291 >>> lm2 = LinearModel([x**2], ["x^2"])
292 >>> lm3 = lm1 + lm2 # the same as LinearModel([x, x**2], ["x", "x^2"])
293
294
295 @param linearModel: a LinearModel instance with the same number of observations as the
296 linear model in 'self'
297 @type linearModel: LinearModel class
298 @return: a newly created LinearModel object representing a linear model that is the sum
299 of 'self' and 'linearModel'. 'self' and 'linearModel' are unaltered.
300 @rtype: LinearModel
301
302 """
303
304 if not isinstance(linearModel, LinearModel):
305 raise TypeError, "Only a LinearModel can be added to a LinearModel"
306
307 if linearModel.designMatrix().shape[0] != self._nObservations:
308 raise ValueError, "Linear model has incompatible design matrix"
309
310 if not np.alltrue(linearModel._covMatrixObserv == self._covMatrixObserv):
311 raise ValueError, "Linear model has a different covariance matrix"
312
313 designMatrix = np.hstack([self._designMatrix, linearModel.designMatrix()])
314 regressorNames = self._regressorNames + linearModel.regressorNames()
315
316 if self._covMatrixObserv == None:
317 return LinearModel(designMatrix, regressorNames)
318 else:
319 return LinearModel(designMatrix, regressorNames, self._covMatrixObserv, regressorsAreWeighted = True)
320
321
322
323
324
325
326
327
328
329
330
332
333 """
334 Returns the design matrix of the linear model
335
336 Example:
337
338 >>> x = np.linspace(0,10,5)
339 >>> lm = LinearModel([x], ["x"])
340 >>> lm.designMatrix()
341 array([[ 0. , 0. ],
342 [ 2.5 , 6.25],
343 [ 5. , 25. ],
344 [ 7.5 , 56.25],
345 [ 10. , 100. ]])
346
347 @return: a N x M design matrix. If the model is written as
348 y(x) = a_0 * f_0(x) + a_1 * f_1(x) + ... + a_{n-1} f_{n-1}(x)
349 then the design matrix A is defined by A_{i,j} = f_i(x_j). N is
350 the number of regressors, M is the number of data points.
351 @rtype: ndarray
352
353 """
354
355 return self._designMatrix
356
357
358
359
360
361
362
363
364
365
366
367
369
370 """
371 Performs and stores the singular value decomposition of the design matrix
372
373 Private class method, not to be used by the user.
374
375 The singular value decomposition of the design matrix is defined as
376
377 C{X = U * W * V^t }
378
379 where
380 - X: M x N design matrix.
381 - U: M x M unitary matrix
382 - W: M x N diagonal matrix
383 - V: N x N unitary matrix
384 with M the number of observations, and N the number of regressors. Note that for a
385 unitary matrix A: A^{-1} = A^t
386
387 @return: nothing
388
389 """
390
391
392
393
394
395
396
397
398
399 self._U,self._w, Vt = np.linalg.svd(self._designMatrix, full_matrices=0)
400 self._V = Vt.T
401
402
403
404
405 wmax = self._w.max()
406 self._invSingularValues = np.zeros_like(self._w)
407 for n in range(len(self._w)):
408 if self._w[n] > self._svdTOL * wmax:
409 self._invSingularValues[n] = 1.0 / self._w[n]
410
411
412
413
414
415
416
417
418
419
421
422 """
423 Computes the pseudo-inverse in a robust way
424
425 Remarks:
426 - The "robustness" is obtained by using a singular value decomposition
427 of X, and treating the singular values in order to avoid numerical
428 problems.
429
430 Example:
431
432 >>> x = linspace(0,10,5)
433 >>> lm = LinearModel([x, x**2], ["x", "x^2"])
434 >>> lm.pseudoInverse()
435 array([[ 8.95460520e-17, 1.63870968e-01, 1.98709677e-01,
436 1.04516129e-01, -1.18709677e-01],
437 [ -1.07455262e-17, -1.80645161e-02, -2.06451613e-02,
438 -7.74193548e-03, 2.06451613e-02]])
439
440 @return: A numerical safe version of the N x M matrix
441 X^t X)^{-1} X^t where X is the M x N design matrix,
442 ^t the transpose, ^{-1} the inverse matrix.
443 The multiplications are matrix multiplications.
444 @rtype: ndarray
445
446
447 """
448
449 if self._pseudoInverse is not None:
450 return self._pseudoInverse
451 else:
452 if self._U == None: self._singularValueDecompose()
453 M = self._U.shape[0]
454 N = self._U.shape[1]
455 temp = np.zeros((N,M))
456 for n in range(N):
457 temp[n] = self._invSingularValues[n] * self._U[:,n]
458 self._pseudoInverse = np.dot(self._V, temp)
459
460 return self._pseudoInverse
461
462
463
464
465
466
467
468
469
470
471
472
473
475
476 """
477 Computes the standard covariance matrix.
478
479 Remarks:
480 - The "safeness" is obtained by using a singular value
481 decomposition of X, and treating the singular values to avoid
482 numerical problems.
483 - The matrix C is the covariance matrix of the regression
484 coefficients if the signal noise would be i.i.d. with sigma = 1.
485 - In the case of sigma != 1, the actual covariance matrix
486 can be obtained by simply rescaling the standard covariance
487 matrix. See L{LinearFit.covarianceMatrix}.
488
489 Example:
490
491 >>> x = linspace(0,10,5)
492 >>> lm = LinearModel([x, x**2], ["x", "x^2"])
493 >>> lm.standardCovarianceMatrix()
494 array([[ 0.09135484, -0.01032258],
495 [-0.01032258, 0.00123871]])
496
497 @return: A numerical safe version of C = (X^t X)^{-1} where
498 X is the design matrix, ^t the transpose, ^{-1} the
499 inverse matrix. The multiplications are matrix
500 multiplications.
501 @rtype: ndarray
502
503
504 """
505
506 if self._standardCovarianceMatrix != None:
507 return self._standardCovarianceMatrix
508 else:
509 if self._V == None: self._singularValueDecompose()
510 self._standardCovarianceMatrix = np.dot(self._V, np.dot(np.diag(self._invSingularValues**2), self._V.T))
511 return self._standardCovarianceMatrix
512
513
514
515
516
517
518
519
520
521
522
524
525 """
526 Computes the condition number.
527
528 Example:
529
530 >>> x = linspace(0,10,5)
531 >>> lm = LinearModel([x, x**2], ["x", "x^2"])
532 >>> lm..conditionNumber()
533 35.996606504814814
534
535 @return: The ratio of the highest to the lowest singular value,
536 where the singular values are obtained by singular value
537 decomposition of the design matrix. A large condition number
538 implies an ill-conditioned design matrix.
539 @rtype: double
540
541
542 """
543
544 if self._conditionNumber is not None:
545 return self._conditionNumber
546 else:
547 if self._w == None: self._singularValueDecompose()
548 self._conditionNumber = self._w.max() / self._w.min()
549 return self._conditionNumber
550
551
552
553
554
555
556
557
558
559
560
562
563 """
564 Return the non-zero singular values
565 as obtained with the singular value decomposition
566
567 Remarks:
568 - the singular values can be used a diagnostic to see how
569 ill-conditioned the matrix inversion is.
570
571 Example:
572
573 >>> x = linspace(0,10,5)
574 >>> lm = LinearModel([x, x**2], ["x", "x^2"])
575 >>> lm.singularValues()
576 array([ 118.34194851, 3.28758625])
577
578 @return: The non-zero singular values.
579 @rtype: ndarray
580
581 """
582
583 if self._w != None:
584 return self._w
585 else:
586 self._singularValueDecompose()
587 return self._w
588
589
590
591
592
593
594
595
596
597
598
599
601
602 """
603 Computes the hat matrix.
604
605 The hat matrix is defined by H = X (X^t X)^{-1} X^t with
606 - X the (weighted) design matrix
607 - ^t the transpose,
608 - ^{-1} the inverse matrix
609 and where all multiplications are matrix multiplications.
610 It has the property that \hat{y} = H * y where
611 - \hat{y} are the (weighted) predictions
612 - y the (weighted) observations.
613 So, it "puts a hat" on the (weighted) observation vector.
614
615 Remarks:
616 - as the number of data points may be large, this matrix may
617 become so huge that it no longer fits into your computer's
618 memory
619 - for weighted (and/or correlated) observations, the resulting
620 weighted hat matrix does not put a hat on the unweighted
621 observations vector, only on the weighted one.
622
623 Example:
624
625 >>> x = linspace(0,10,5)
626 >>> lm = LinearModel([x, x**2], ["x", "x^2"])
627 >>> lm.hatMatrix()
628 array([[ 9.32150093e-32, 1.56705591e-16, 1.79092104e-16,
629 6.71595390e-17, -1.79092104e-16],
630 [ 1.56705591e-16, 2.96774194e-01, 3.67741935e-01,
631 2.12903226e-01, -1.67741935e-01],
632 [ 1.79092104e-16, 3.67741935e-01, 4.77419355e-01,
633 3.29032258e-01, -7.74193548e-02],
634 [ 6.71595390e-17, 2.12903226e-01, 3.29032258e-01,
635 3.48387097e-01, 2.70967742e-01],
636 [ -1.79092104e-16, -1.67741935e-01, -7.74193548e-02,
637 2.70967742e-01, 8.77419355e-01]])
638
639 @return: The hat matrix.
640 @rtype: ndarray
641
642 """
643
644 if self._hatMatrix is not None:
645 return self._hatMatrix
646 else:
647 if self._U is None: self._singularValueDecompose()
648 self._hatMatrix = np.dot(self._U, self._U.T)
649 return self._hatMatrix
650
651
652
653
654
655
656
657
658
659
660
661
663
664 """
665 Returns the trace of the (possibly weighted) hat matrix.
666
667 The computation of the entire MxM hat matrix (M the number of
668 observations) is avoided to calculate the trace. This is useful
669 when the number of observations is so high that the hat matrix
670 does no longer fit into the memory.
671
672 Remarks:
673 - If the hat matrix was computed anyway it will be used.
674 - The trace of the hat matrix that puts a hat on the original
675 observations equals the trace of the hat matrix that puts a
676 hat on the weighted observations.
677
678 Example:
679
680 >>> x = linspace(0,10,5)
681 >>> lm = LinearModel([x, x**2], ["x", "x^2"])
682 >>> lm.hatMatrix()
683 1.9999999999999993
684
685 @return: The trace of the hat matrix.
686 @rtype: ndarray
687
688 """
689
690 if self._traceHatMatrix is not None:
691 return self._traceHatMatrix
692 else:
693 if self._hatMatrix is not None:
694 self._traceHatMatrix = self._hatMatrix.trace()
695 else:
696 if self._U is None: self._singularValueDecompose()
697 self._traceHatMatrix = 0.0
698 for k in range(self._U.shape[1]):
699 self._traceHatMatrix += np.sum(np.square(self._U[:,k]))
700 return self._traceHatMatrix
701
702
703
704
705
706
707
708
709
710
711
712
713
715
716 """
717 Returns the effective number of degrees of freedom.
718
719 The d.o.f. is defined as the trace of the hat matrix. See also:
720 U{Wikipedia<http://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)>}
721
722 Example:
723
724 >>> x = linspace(0,10,5)
725 >>> lm = LinearModel([x, x**2], ["x", "x^2"])
726 >>> lm.degreesOfFreedom()
727 3
728
729 @return: The effective number of degrees of freedom.
730 @rtype: integer
731
732 """
733
734 if self._degreesOfFreedom is not None:
735 return self._degreesOfFreedom
736 else:
737 self._degreesOfFreedom = self._nObservations - int(round(self.traceHatMatrix()))
738 return self._degreesOfFreedom
739
740
741
742
743
744
745
746
747
748
749
751
752 """
753 Returns the regressor names.
754
755 Example:
756
757 >>> x = linspace(0,10,100)
758 >>> lm = LinearModel([sin(x),x*x], ["sin(x)", "x^2"])
759 >>> lm.regressorNames()
760 ["sin(x)", "x^2"]
761
762 @return: list with strings containing the names of the regressors.
763 @rtype: list
764
765 """
766
767 return self._regressorNames
768
769
770
771
772
773
774
775
776
777
778
779
780
781
783
784 """
785 Checks if the linear model contains the given regressor.
786
787 Example:
788
789 >>> x = linspace(0,10,100)
790 >>> lm = LinearModel([sin(x), x*x], ["sin(x)", "x^2"])
791 >>> lm.containsRegressorName("x^2")
792 True
793 >>> lm.containsRegressorName("cos(x)")
794 False
795
796 @param regressorName: name of a regressor
797 @type regressorName: string
798 @return: True if one of the regressor names is identical
799 to the input 'regressorName', else False.
800 @rtype: boolean
801
802 """
803
804 return regressorName in self._regressorNames
805
806
807
808
809
810
811
812
813
815
816 """
817 Checks if a regressor name contains a given substring.
818
819 Example:
820
821 >>> x = linspace(0,10,100)
822 >>> lm = LinearModel([sin(x), x*x], ["sin(x)", "x^2"])
823 >>> lm.someRegressorNameContains("sin")
824 True
825
826 @param someString: substring to be checked
827 @type someString: string
828
829 @return: True if at least one of the regressor names contains
830 the substring someString, else return false.
831 @rtype: boolean
832
833 """
834
835 for name in self._regressorNames:
836 if someString in name: return True
837
838 return False
839
840
841
842
843
844
845
846
847
848
850
851 """
852 Checks if there is a constant regressor
853
854 Example:
855
856 >>> x = linspace(0,10,100)
857 >>> lm = LinearModel([1, x*x], ["1", "x^2"])
858 >>> lm.withIntercept()
859 True
860
861 @return: True if there is a constant regressor (the intercept).
862 False otherwise.
863 @rtype: boolean
864
865 """
866
867 if self._withIntercept is not None:
868 return self._withIntercept
869 else:
870 self._withIntercept = False
871 for n in range(self._nParameters):
872 if len(np.unique(self._designMatrix[:,n])) == 1:
873 self._withIntercept = True
874 break
875 return self._withIntercept
876
877
878
879
880
881
882
883
884
885
887
888 """
889 Returns the number of observations
890
891 Example:
892
893 >>> x = linspace(0,10,23)
894 >>> lm = LinearModel([x], ["x"])
895 >>> lm.nObservations()
896 23
897
898 @return: the number of observations that the LinearModel expects
899 @rtype: integer
900
901 """
902
903 return self._nObservations
904
905
906
907
908
909
910
911
912
913
914
915
917
918 """
919 Returns the number of fit coefficients
920
921 Example:
922
923 >>> x = linspace(0,10,100)
924 >>> lm = LinearModel([sin(x),x*x], ["sin(x)", "x^2"])
925 >>> lm.nParameters()
926 2
927
928 @return: the numbero fit coefficients of the linear model
929 @rtype: integer
930
931 """
932
933 return self._nParameters
934
935
936
937
938
939
940
941
942
943
944
945
947
948 """
949 Create and return a LinearFit object.
950
951 From this object the user can extract all kinds of quantities
952 related to the linear fit. See L{LinearFit}.
953
954 Example:
955
956 >>> x = linspace(0,10,100)
957 >>> lm = LinearModel([x], ["x"])
958 >>> obs = x + normal(0.0, 1.0, 100) # some simulated observations
959 >>> linearfit = lm.fitData(obs)
960
961 @param observations: array with the observations y_i. The size
962 of the array should be compatible with the number
963 of expected observations in the LinearModel.
964 @type observations: ndarray
965 @return: a LinearFit instance, containing the fit of the observations
966 with the linear model.
967 @rtype: LinearFit
968
969 """
970
971 if len(observations) != self._nObservations:
972 raise ValueError, "Number of observations should be %d != %d" % (self._nObservations, len(observations))
973
974 return LinearFit(self, observations)
975
976
977
978
979
980
981
982
983
984
985
986
987
988 - def submodels(self, Nmin = 1, Nmax = None, nested = True, ranks = None, simpleFirst=True):
989
990 """
991 Returns a generator capable of generating submodels
992
993 The generator sequentially returns the submodels which are instances
994 of LinearModel with only a subset of the regressors of the parent model.
995 The submodels with fewer regressors will always appear first in the list.
996 The order in which the regressors appear in the submodels will be from
997 low rank to high rank.
998
999 Examples:
1000
1001 If the regressors are [1, x, x**2] then
1002 - Nmin=1, nested=False, ranks=None gives the submodels
1003 [1], [x], [x**2], [1,x], [1,x**2], [x,x**2], [1,x,x**2]
1004 - Nmin=1, Nmax=2, nested=False, ranks=None gives the submodels
1005 [1], [x], [x**2], [1,x], [1,x**2], [x,x**2]
1006 - Nmin=2, nested=False, ranks=None gives the submodels
1007 [1,x], [1,x**2], [x,x**2], [1,x,x**2]
1008 - Nmin=3, nested=True, ranks=None gives the submodel
1009 [1,x,x**2]
1010 - Nmin=3, nested=True, ranks=[1,0,3] gives the submodel
1011 [x,1,x**2]
1012 - Nmin=2, nested=False, ranks=[1,0,3] gives the submodels
1013 [x,1], [x,x**2], [1,x**2], [x,1,x**2]
1014 - Nmin=2, nested=True, ranks=[1,0,3] gives the submodels
1015 [x,1], [x,1,x**2]
1016 If the regressors are [1,x,sin(2x),cos(2x)] then
1017 - Nmin=3, nested=True, ranks=[0,1,2,2] gives the submodel
1018 [1,x,sin(2x),cos(2x)]
1019 - Nmin=2, nested=True, ranks=[0,1,2,2] gives the submodels
1020 [1,x], [1,x,sin(2x),cos(2x)]
1021 - Nmin=2, nested=False, ranks=[0,1,2,2] gives the submodels
1022 [1,x], [1,sin(2x),cos(2x)], [x,sin(2x),cos(2x)], [1,x,sin(2x),cos(2x)]
1023 - Nmin=1, Nmax=2, nested=False, ranks=[0,1,2,2] gives the submodels
1024 [1], [x], [sin(2x),cos(2x)], [1,x], [1,sin(2x),cos(2x)], [x,sin(2x),cos(2x)]
1025
1026 @param Nmin: Minimum number of regressors (or groups of equally ranking
1027 regressors). Should be minimal 1.
1028 @type Nmin: integer
1029 @param Nmax: Maximum number of regressors (or groups of equally ranking
1030 regressors). If == None, it will take the maximum number
1031 possible. Note: if the ranks = [0,1,2,2] then there are
1032 maximum 3 "groups" of regressors, as the latter 2 regressors
1033 will always be taken together.
1034 @type Nmax: integer
1035 @param nested: if False: return all possible submodels, i.e. all possible
1036 combinations of regressors. If true: return only nested
1037 submodels with combinations of regressors (or groups of
1038 equally ranked regressors) so that a higher ranked regressors
1039 will never appear without the lower ranked regressors
1040 @type nested: boolean
1041 @param ranks: list with (integer) rankings of the regressors, only the
1042 relative ranking of the numbers will be used, so the exact
1043 numbers are not important. For example, [1,2,3] has the
1044 same result as [10,24,35]. Regressors having the same rank
1045 should always be included together in a submodel, regardless
1046 how the input parameter 'nested' has been set. In case of
1047 nested submodels, a regressor with a higher rank should
1048 never occur without the regressors with lower rank in the
1049 same submodel.
1050 @type ranks: list with integers
1051 @param simpleFirst: if True: generate the more simple submodels first.
1052 if False: generate the complicated submodels first.
1053 @type simpleFirst: boolean
1054 @return: a python generator implementing the yield method
1055 @rtype: generator
1056
1057 """
1058
1059
1060
1061
1062
1063 if ranks == None:
1064 ranks = np.arange(self._nParameters)
1065
1066
1067
1068 uniqueRanks = np.unique(ranks)
1069 if Nmax == None:
1070 Nmax = len(uniqueRanks)
1071
1072
1073
1074
1075
1076 if simpleFirst:
1077 nRegressorRange = range(Nmin,Nmax+1)
1078 else:
1079 nRegressorRange = range(Nmax,Nmin-1,-1)
1080
1081
1082
1083 if nested == True:
1084 for n in nRegressorRange:
1085 nameListSub = []
1086 regressorListSub = []
1087 indices = [k for m in uniqueRanks[:n] for k in np.where(ranks == m)[0]]
1088 nameListSub += [self._regressorNames[k] for k in indices]
1089 regressorListSub += [self._designMatrix[:,k] for k in indices]
1090 if self._covMatrixObserv != None:
1091 covMatrixObservSub = self._covMatrixObserv[indices,:][:,indices]
1092 yield LinearModel(regressorListSub, nameListSub, covMatrixObservSub, regressorsAreWeighted=True)
1093 else:
1094 yield LinearModel(regressorListSub, nameListSub)
1095 else:
1096 comboList = [combo for comboSize in nRegressorRange for combo in combinations(uniqueRanks, comboSize)]
1097 for combo in comboList:
1098 indices = [k for n in combo for k in np.where(ranks==n)[0]]
1099 nameListSub = [self._regressorNames[k] for k in indices]
1100 regressorListSub = [self._designMatrix[:,k] for k in indices]
1101 if self._covMatrixObserv != None:
1102 covMatrixObservSub = self._covMatrixObserv[indices,:][:,indices]
1103 yield LinearModel(regressorListSub, nameListSub, covMatrixObservSub, regressorsAreWeighted=True)
1104 else:
1105 yield LinearModel(regressorListSub, nameListSub)
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1118
1119 """
1120 Returns a duplicate of the current LinearModel
1121
1122 Example:
1123
1124 >>> x = linspace(0,10,100)
1125 >>> lm = LinearModel([x], ["x"])
1126 >>> lm2 = lm.copy()
1127
1128 @return: of duplicate of self. Not a reference, a real copy.
1129 @rtype: LinearModel
1130
1131 """
1132
1133 if self._covMatrixObserv != None:
1134 return LinearModel(self._designMatrix.copy(), copy.copy(self._regressorNames), self._covMatrixObserv.copy())
1135 else:
1136 return LinearModel(self._designMatrix.copy(), copy.copy(self._regressorNames))
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1150
1151 """
1152 Generates what is written to the console if the LinearModel is printed
1153
1154 @return: nothing
1155
1156 """
1157
1158 if self._regressorNames[0] == "1":
1159 str = "Model: y = a_0"
1160 else:
1161 str = "Model: y = a_0 * %s" % self._regressorNames[0]
1162
1163 for n in range(1, self._nParameters):
1164 if self._regressorNames[n] == "1":
1165 str += " + a_%d" % n
1166 else:
1167 str += " + a_%d * %s" % (n, self._regressorNames[n])
1168
1169 str += "\nExpected number of observations: %d" % self._nObservations
1170
1171 return str
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1187
1188 """
1189 Class to implement a polynomial model
1190
1191 The class implements a polynomial model function of the form
1192
1193 C{y(x) = a_0 * a_1 * x + a_2 * x^2 + ...}
1194
1195 where y are the responses (observables), x are the covariates,
1196 the a_i are the regression coefficients (to be determined).
1197 Such a class provides convenient way of creating a polynomial model.
1198
1199 Remarks:
1200 - PolynomialModel instances can be added to other PolynomialModel
1201 or LinearModel instances to give a new LinearModel instance.
1202 """
1203
1204
1205 - def __init__(self, covariate, covariateName, orderList, covMatrix = None):
1206
1207 """
1208 Initialiser of the PolynomialModel class.
1209
1210 Example:
1211
1212 >>> time = linspace(0,10,100)
1213 >>> pm = PolynomialModel(time, "t", [2,1])
1214
1215 @param covariate: an array of size N with the covariate values,
1216 where N is the number of observations
1217 @type covariate: ndarray
1218 @param covariateName: the (preferably short) name of the covariate.
1219 E.g. "t", or "x".
1220 @type covariateName: string
1221 @param orderList: list of the orders that should be included in the
1222 model. E.g. [3,0,1] implies the model
1223 a_0 * x^3 + a_1 + a_2 * x
1224 @type orderList: list
1225 @param covMatrix: square array containing the covariance matrix of the
1226 observations. This way, weights and correlations can
1227 be taken into account.
1228 @type covMatrix: ndarray
1229 @return: an instance of PolynomialModel, subclass of LinearModel
1230 @rtype: PolynomialModel
1231
1232 """
1233
1234
1235
1236 designMatrix = np.empty((len(covariate), len(orderList)))
1237 for n in range(len(orderList)):
1238 designMatrix[:,n] = covariate**orderList[n]
1239
1240
1241
1242
1243 regressorNames = []
1244 for n in range(len(orderList)):
1245 if orderList[n] == 0:
1246 regressorNames += ["1"]
1247 else:
1248 regressorNames += ["%s^%d" % (covariateName, orderList[n])]
1249
1250
1251
1252
1253 super(PolynomialModel, self).__init__(designMatrix, regressorNames, covMatrix, regressorsAreWeighted=False)
1254
1255
1256
1257
1258 if 0 in orderList:
1259 self._withIntercept = True
1260 else:
1261 self._withIntercept = False
1262
1263
1264
1265
1266
1268
1269 """
1270 Class to implement a harmonic model function.
1271
1272 The class implements a harmonic model of the form
1273
1274 C{y(x) = a_0 * sin(2*pi*f1*t) + a_1 * cos(2*pi*f1*t)
1275 + a_2 * sin(2*pi*f2*t) + a_3 * cos(2*pi*f2*t) + ...}
1276
1277 where y are the responses (observables), x are the covariates,
1278 the a_i are the regression coefficients (to be determined).
1279 Such a class provides a more convenient short way of creating a
1280 harmonic model.
1281
1282 Remarks:
1283 - HarmonicModel instances can be added to other HarmonicModel
1284 or LinearModel instances to give a new LinearModel instance.
1285 """
1286
1287
1288 - def __init__(self, covariate, covariateName, freqList, freqNameList,
1289 maxNharmonics=1, covMatrix = None):
1290
1291 """
1292 Initialiser of HarmonicModel class, with *known* frequencies.
1293
1294 Example:
1295
1296 >>> time = linspace(0,10,100)
1297 >>> hm = HarmonicModel(time, "t", [2.34, 8.9], ["f_1", "f_2"], 1)
1298
1299
1300 @param covariate: an array of size N with the covariate values,
1301 where N is the number of observations.
1302 @type covariate: ndarray
1303 @param covariateName: a (preferably short) name of the covariate.
1304 E.g. "t", or "x".
1305 @type covariateName: string
1306 @param freqList: the frequencies (unit: inverse unit of covariate)
1307 @type freqList: list
1308 @param freqNameList: the (preferably short) names of the frequencies.
1309 E.g. "f_1" or "nu_1".
1310 @type freqNameList: list
1311 @param maxNharmonics: integer values > 1. The number of harmonics for
1312 each frequency. For example:
1313 1 : f_1, f_2, ...
1314 2 : f_1, 2*f_1, f_2, 2*f_2, ...
1315 @type maxNharmonics: list
1316 @param covMatrix: square array containing the covariance matrix of the
1317 observations. Weights, and correlations can hence be
1318 taken into account.
1319 @type covMatrix: ndarray
1320 @return: an instance of HarmonicModel, subclass of LinearModel
1321 @rtype: HarmonicModel
1322
1323 """
1324
1325
1326
1327 nParam = 2*len(freqList)*maxNharmonics
1328 designMatrix = np.empty((len(covariate), nParam))
1329 runningIndex = 0
1330 for n in range(len(freqList)):
1331 for k in range(1,maxNharmonics+1):
1332 designMatrix[:,runningIndex] = np.sin(2.*pi*k*freqList[n]*covariate)
1333 designMatrix[:,runningIndex+1] = np.cos(2.*pi*k*freqList[n]*covariate)
1334 runningIndex += 2
1335
1336
1337
1338 regressorNames = []
1339 for n in range(len(freqList)):
1340 for k in range(1,maxNharmonics+1):
1341 regressorNames += ["sin(2*pi*%d*%s*%s)" % (k, freqNameList[n], covariateName),
1342 "cos(2*pi*%d*%s*%s)" % (k, freqNameList[n], covariateName)]
1343
1344
1345
1346
1347 super(HarmonicModel, self).__init__(designMatrix, regressorNames, covMatrix, regressorsAreWeighted=False)
1348
1349
1350
1351
1352 self._withIntercept = False
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1370
1371 """
1372 A class that allows to easily extract the fit coefficients,
1373 covariance matrix, predictions, residuals, etc.
1374
1375 Remarks: LinearFit instances can be printed
1376 """
1377
1378
1379 - def __init__(self, linearModel, observations):
1380
1381 """
1382 Initialises the LinearFit instance
1383
1384 @param linearModel: a LinearModel instance
1385 @type linearModel: LinearModel
1386 @param observations: the observations. The size of the array must be
1387 compatible with the number of observations that
1388 the linearModel expects.
1389 @type observations: ndarray
1390 @return: a LinearFit instance
1391 @rtype: LinearFit
1392
1393 """
1394
1395
1396
1397 self._linearModel = linearModel
1398 self._nObservations = len(observations)
1399 self._nParameters = linearModel.designMatrix().shape[1]
1400
1401
1402
1403
1404 self._regressionCoefficients = None
1405 self._residuals = None
1406 self._weightedResiduals = None
1407 self._predictions = None
1408 self._weightedPredictions = None
1409 self._sumSqResiduals = None
1410 self._sumSqWeightedResiduals = None
1411 self._covarianceMatrix = None
1412 self._correlationMatrix = None
1413 self._errorBars = None
1414 self._t_values = None
1415 self._coefficientOfDetermination = None
1416 self._weightedCoeffOfDetermination = None
1417 self._residualVariance = None
1418 self._weightedResidualVariance = None
1419 self._BICvalue = None
1420 self._AICvalue = None
1421 self._Fstatistic = None
1422 self._weightedFstatistic = None
1423
1424
1425
1426
1427 self._originalObservations = observations
1428 if linearModel._covMatrixObserv != None:
1429 self._weightedObservations = np.linalg.solve(linearModel._choleskyLower, observations)
1430 else:
1431 self._weightedObservations = observations
1432
1433
1434
1436
1437 """
1438 Returns the observations
1439
1440 Remarks:
1441 - if no covariance matrix of the observations was specified for
1442 the model, the "decorrelated" observations are identical to the
1443 original observations.
1444
1445 @param weighted: If false return the original observations, if
1446 True, return de decorrelated observations.
1447 @type weighted: boolean
1448 @return: the observations that were used to initialize the LinearFit instance
1449 @rtype: ndarray
1450
1451 """
1452
1453 if weighted:
1454 return self._weightedObservations
1455 else:
1456 return self._originalObservations
1457
1458
1459
1460
1462
1463 """
1464 Returns the regression coefficients.
1465
1466 @return: the fit coefficients
1467 @rtype: ndarray
1468
1469 """
1470
1471 if self._regressionCoefficients is not None:
1472 return self._regressionCoefficients
1473 else:
1474 self._regressionCoefficients = np.dot(self._linearModel.pseudoInverse(),
1475 self._weightedObservations)
1476 return self._regressionCoefficients
1477
1478
1479
1480
1482
1483 """
1484 Returns the sum of the squared residuals
1485
1486 @param weighted: If false the unweighted residuals are used,
1487 if True, the weighted ones are used.
1488 @type weighted: boolean
1489 @return: the sum of the squared residuals
1490 @rtype: double
1491
1492 """
1493
1494 if weighted:
1495 if self._sumSqWeightedResiduals is not None:
1496 return self._sumSqWeightedResiduals
1497 else:
1498 self._sumSqWeightedResiduals = np.sum(np.square(self.residuals(weighted=True)))
1499 return self._sumSqWeightedResiduals
1500 else:
1501 if self._sumSqResiduals is not None:
1502 return self._sumSqResiduals
1503 else:
1504 self._sumSqResiduals = np.sum(np.square(self.residuals(weighted=False)))
1505 return self._sumSqResiduals
1506
1507
1508
1509
1510
1511
1513
1514 """
1515 Estimates the variance of the residuals of the time series.
1516
1517 As normalization the degrees of freedom is used
1518
1519 @param weighted: If false the unweighted residuals are used,
1520 if True, the weighted ones are used.
1521 @type weighted: boolean
1522 @return: the variance of the residuals
1523 @rtype: double
1524
1525 """
1526
1527 if weighted:
1528 if self._weightedResidualVariance is not None:
1529 return self._weightedResidualVariance
1530 else:
1531 self._weightedResidualVariance = self.sumSqResiduals(weighted=True) \
1532 / self._linearModel.degreesOfFreedom()
1533 return self._weightedResidualVariance
1534 else:
1535 if self._residualVariance is not None:
1536 return self._residualVariance
1537 else:
1538 self._residualVariance = self.sumSqResiduals(weighted=False) \
1539 / self._linearModel.degreesOfFreedom()
1540 return self._residualVariance
1541
1542
1543
1544
1546
1547 """
1548 Returns the covariance matrix of the fit coefficients
1549
1550 @return: the MxM covariance matrix of the fit coefficients with
1551 M the number of fit coefficients
1552 @rtype: ndarray
1553
1554 """
1555
1556 if self._covarianceMatrix is not None:
1557 return self._covarianceMatrix
1558 else:
1559 self._covarianceMatrix = self._linearModel.standardCovarianceMatrix() \
1560 * self.residualVariance(weighted=True)
1561 return self._covarianceMatrix
1562
1563
1564
1565
1567
1568 """
1569 Returns the correlation matrix of the fit coefficients
1570
1571 @return: the MxM correlation matrix of the fit coefficients with
1572 M the number of fit coefficients
1573 @rtype: ndarray
1574
1575 """
1576
1577 if self._correlationMatrix is not None:
1578 return self._correlationMatrix
1579 else:
1580 cov = self.covarianceMatrix()
1581 self._correlationMatrix = np.empty_like(cov)
1582 cor = self._correlationMatrix
1583 for n in range(cor.shape[0]):
1584 for m in range(n):
1585 cor[n,m] = cov[n,m] / sqrt(cov[n,n] * cov[m,m])
1586 cor[m,n] = cor[n,m]
1587 cor[n,n] = 1.0
1588 return self._correlationMatrix
1589
1590
1591
1592
1594
1595 """
1596 Returns the formal error bars of the fit coefficients
1597
1598 @return: The square root of the diagonal of the covariance matrix
1599 @rtype: ndarray
1600
1601 """
1602
1603 if self._errorBars is not None:
1604 return self._errorBars
1605 else:
1606 self._errorBars = np.sqrt(self.covarianceMatrix().diagonal())
1607 return self._errorBars
1608
1609
1610
1612
1613 """
1614 Returns the symmetric (1-alpha) confidence interval around the fit coefficients
1615 E.g. if alpha = 0.05, the 95% symmetric confidence interval is returned.
1616
1617 Remarks:
1618 - The formula used assumes that the noise on the observations is
1619 independently, identically and gaussian distributed.
1620
1621 @param alpha: confidence level in ]0,1[
1622 @type alpha: double
1623 @return: the arrays lowerBounds[0..K-1] and upperBounds[0..K-1] which contain
1624 respectively the lower and the upper bounds of the symmetric interval
1625 around the fit coefficients, where K is the number of fit coeffs.
1626 @rtype: tuple
1627
1628 """
1629
1630 t = stats.t.ppf(1.0-alpha/2., self._linearModel.degreesOfFreedom())
1631
1632 lowerBounds = self.regressionCoefficients() - t * self.errorBars()
1633 upperBounds = self.regressionCoefficients() + t * self.errorBars()
1634
1635 return lowerBounds, upperBounds
1636
1637
1638
1640
1641 """
1642 Returns the formal t-values of the fit coefficients
1643
1644 @return: t-values = fit coefficients divided by their formal error bars
1645 @rtype: ndarray
1646
1647 """
1648
1649 if self._t_values is not None:
1650 return self._t_values
1651 else:
1652 self._t_values = self.regressionCoefficients() / self.errorBars()
1653 return self._t_values
1654
1655
1656
1657
1659
1660 """
1661 Performs a hypothesis T-test on each of the regressors
1662 Null hypothesis: H0 : fit coefficient == 0
1663 Alternative hypothesis : H1 : fit coefficient != 0
1664
1665 Remarks:
1666 - This test assumes that the noise is independently,
1667 identically, and gaussian distributed. It is not robust
1668 against this assumption.
1669
1670 @param alpha: significance level of the hypothesis test. In ]0,1[.
1671 E.g.: alpha = 0.05
1672 @type alpha: double
1673 @return: a boolean array of length K with K the number of regressors.
1674 "True" if the null hypothesis was rejected for the regressor,
1675 "False" otherwise.
1676 @rtype: ndarray
1677
1678 """
1679
1680 t = stats.t.ppf(1.0-alpha/2., self._linearModel.degreesOfFreedom())
1681
1682 return np.fabs(self.t_values()) > t
1683
1684
1685
1687
1688 """
1689 Returns the predicted (fitted) values
1690
1691 It concerns the predictions for the original observations,
1692 not the decorrelated ones.
1693
1694 Remarks:
1695 - If no covariance matrix of the observations was specified
1696 for the model, the weighted predictions are identical to
1697 the unweighted ones.
1698
1699 @param weighted: If True/False, the predictions for the
1700 decorrelated/original observations will be used.
1701 @type weighted: boolean
1702
1703 """
1704
1705 if weighted:
1706 if self._weightedPredictions is not None:
1707 return self._weightedPredictions
1708 else:
1709 if self._linearModel._covMatrixObserv != None:
1710 self._weightedPredictions = np.dot(self._linearModel._choleskyLower, np.dot(self._linearModel.designMatrix(), self.regressionCoefficients()))
1711 else:
1712 self._weightedPredictions = np.dot(self._linearModel.designMatrix(), self.regressionCoefficients())
1713 return self._weightedPredictions
1714 else:
1715 if self._predictions is not None:
1716 return self._predictions
1717 else:
1718 if self._linearModel._covMatrixObserv != None:
1719 self._predictions = np.dot(self._linearModel._choleskyLower, np.dot(self._linearModel.designMatrix(), self.regressionCoefficients()))
1720 else:
1721 self._predictions = np.dot(self._linearModel.designMatrix(), self.regressionCoefficients())
1722 return self._predictions
1723
1724
1725
1726
1728
1729 """
1730 Returns an array with the residuals.
1731 Residuals = observations minus the predictions
1732
1733 Remarks:
1734 - If no covariance matrix of the observations was specified for the
1735 model, the weighted residuals are identical to the unweighted ones.
1736
1737 @param weighted: If True/False, the residuals for the
1738 decorrelated/original observations will be used.
1739 @type weighted: boolean
1740
1741 """
1742
1743 if weighted:
1744 if self._weightedResiduals is not None:
1745 return self._weightedResiduals
1746 else:
1747 self._weightedResiduals = self._weightedObservations - self.predictions(weighted=True)
1748 return self._weightedResiduals
1749 else:
1750 if self._residuals is not None:
1751 return self._residuals
1752 else:
1753 self._residuals = self._originalObservations - self.predictions(weighted=False)
1754 return self._residuals
1755
1756
1757
1758
1759
1761
1762 """
1763 Returns the coefficient of determination
1764
1765 The coeff of determination is defined by 1 - S1 / S2 with
1766 - S1 the sum of squared residuals:
1767 S1 = \frac{\sum_{i=1}^N (y_i - \hat{y}_i)^2}
1768 - S2 the sample variance w.r.t. the mean:
1769 S2 = \frac{\sum_{i=1}^N (y_i - \overline{y})^2}
1770 - If there is no intercept term in the model, then S2 is computed
1771 S2 = \frac{\sum_{i=1}^N (y_i)^2}
1772
1773 Remarks:
1774 - If no covariance matrix of the observations was specified
1775 for the model, the weighted coeff of determination is
1776 identical to the unweighted one.
1777
1778 @param weighted: If True/False, the residuals for the
1779 decorrelated/original observations will be used.
1780 @type weighted: boolean
1781 @return: coefficient of determination
1782 @rtype: double
1783
1784 """
1785
1786 if weighted:
1787 if self._weightedCoeffOfDetermination is not None:
1788 return self._weightedCoeffOfDetermination
1789 else:
1790 if self._linearModel.withIntercept():
1791 sampleVariance = np.sum(np.square(self._weightedObservations - np.mean(self._weightedObservations)))
1792 self._weightedCoeffOfDetermination = 1.0 - self.sumSqResiduals(weighted=True) / sampleVariance
1793 else:
1794 sampleVariance = np.sum(np.square(self._weightedObservations))
1795 self._weightedCoeffOfDetermination = 1.0 - self.sumSqResiduals(weighted=True) / sampleVariance
1796
1797 return self._coefficientOfDetermination
1798 else:
1799 if self._coefficientOfDetermination is not None:
1800 return self._coefficientOfDetermination
1801 else:
1802 if self._linearModel.withIntercept():
1803 sampleVariance = np.sum(np.square(self._originalObservations - np.mean(self._originalObservations)))
1804 self._coefficientOfDetermination = 1.0 - self.sumSqResiduals(weighted=False) / sampleVariance
1805 else:
1806 sampleVariance = np.sum(np.square(self._originalObservations))
1807 self._coefficientOfDetermination = 1.0 - self.sumSqResiduals(weighted=False) / sampleVariance
1808
1809 return self._coefficientOfDetermination
1810
1811
1812
1813
1814
1816
1817 """
1818 Returns the Bayesian Information Criterion value.
1819
1820 Remarks:
1821 - Also called the Schwartz Information Criterion (SIC)
1822 - Gaussian noise is assumed, with unknown variance sigma^2
1823 - Constant terms in the log-likelihood are omitted
1824
1825 TODO: . make a weighted version
1826
1827 @return: BIC value
1828 @rtype: double
1829
1830 """
1831
1832 if self._BICvalue is not None:
1833 return self._BICvalue
1834 else:
1835
1836 nParam = self._nParameters + 1
1837 self._BICvalue = self._nObservations * log(self.sumSqResiduals(weighted=False)/self._nObservations) \
1838 + nParam * log(self._nObservations)
1839 return self._BICvalue
1840
1841
1842
1843
1845
1846 """
1847 Returns the 2nd order Akaike Information Criterion value
1848
1849 Remarks:
1850 - Gaussian noise is assumed, with unknown variance sigma^2
1851 - Constant terms in the log-likelihood were omitted
1852 - If the number of observations equals the number of parameters + 1
1853 then the 2nd order AIC is not defined. In this case the method
1854 gives a nan.
1855
1856 TODO: . make a weighted version
1857
1858 @return: AIC value
1859 @rtype: double
1860
1861 """
1862
1863 if self._AICvalue is not None:
1864 return self._AICvalue
1865 else:
1866 nParam = self._nParameters + 1
1867 if (self._nObservations - nParam - 1) == 0:
1868
1869 self._AICvalue = np.nan
1870 else:
1871
1872 self._AICvalue = self._nObservations * log(self.sumSqResiduals(weighted=False)/self._nObservations) \
1873 + 2*nParam + 2*nParam*(nParam+1)/(self._nObservations - nParam - 1)
1874 return self._AICvalue
1875
1876
1877
1878
1880
1881 """
1882 Returns the F-statistic, commonly used to assess the fit.
1883
1884 Remarks:
1885 - if no covariance matrix of the observations was specified for the
1886 model, the weighted F-statistic is identical to the unweighted one
1887
1888 @param weighted: If True/False, the residuals for the
1889 decorrelated/original observations will be used.
1890 @type weighted: boolean
1891 @return: the F-statistic
1892 @rtype: double
1893
1894 """
1895
1896 if weighted:
1897 if self._weightedFstatistic is not None:
1898 return self._weightedFstatistic
1899 else:
1900 Rsq = self.coefficientOfDetermination(weighted=True)
1901 df = self._linearModel.degreesOfFreedom()
1902
1903 if self._nParameters == 1:
1904 self._Fstatistic = Rsq / (1-Rsq) * df / self._nParameters
1905 else:
1906 self._Fstatistic = Rsq / (1-Rsq) * df / (self._nParameters-1)
1907
1908 return self._weightedFstatistic
1909 else:
1910 if self._Fstatistic is not None:
1911 return self._Fstatistic
1912 else:
1913 Rsq = self.coefficientOfDetermination(weighted=False)
1914 df = self._linearModel.degreesOfFreedom()
1915
1916 if self._nParameters == 1:
1917 self._Fstatistic = Rsq / (1-Rsq) * df / self._nParameters
1918 else:
1919 self._Fstatistic = Rsq / (1-Rsq) * df / (self._nParameters-1)
1920
1921 return self._Fstatistic
1922
1923
1924
1926
1927 """
1928 Performs a hypothesis F-test on the fit
1929
1930 Null hypothesis: H0 : all fit coefficients == 0
1931 Alternative hypothesis : H1 : at least one fit coefficient != 0
1932
1933 Stated otherwise (with R^2 the coefficient of determination):
1934 Null hypothesis: H0 : R^2 == 0
1935 Alternative hypothesis: H1: R^2 != 0
1936
1937 Remarks:
1938 - if no covariance matrix of the observations was specified for the
1939 model, the weighted F-test is identical to the unweighted one
1940
1941
1942 @param alpha: significance level of the hypothesis test. In ]0,1[.
1943 @type alpha: double
1944 @param weighted: If True/False, the weighted/not-weighted
1945 F-statistic will be used.
1946 @type weighted: boolean
1947 @return: "True" if null hypothesis was rejected, "False" otherwise
1948 @rtype: boolean
1949
1950 """
1951
1952 df = self._linearModel.degreesOfFreedom()
1953
1954 if self._nParameters == 1:
1955 Fcritical = stats.f.ppf(1.0-alpha, self._nParameters, df)
1956 else:
1957 Fcritical = stats.f.ppf(1.0-alpha, self._nParameters-1, df)
1958
1959 return (self.Fstatistic(weighted) > Fcritical)
1960
1961
1962
1963 - def summary(self, outputStream = sys.stdout):
1964
1965 """
1966 Writes some basic results of fitting the model to the observations.
1967
1968 @param outputStream: defaulted to the standard output console, but can be
1969 replaced by another open stream, like a file stream.
1970 @type outputStream: stream class. The .write() method is used.
1971 @return: nothing
1972
1973 """
1974
1975 regressorNames = self._linearModel.regressorNames()
1976 if regressorNames[0] == "1":
1977 outputStream.write("Model: y = a_0")
1978 else:
1979 outputStream.write("Model: y = a_0 * %s" % regressorNames[0])
1980 for n in range(1, self._nParameters):
1981 if regressorNames[n] == "1":
1982 outputStream.write(" + a_%d" % n)
1983 else:
1984 outputStream.write(" + a_%d * %s" % (n, regressorNames[n]))
1985 outputStream.write("\n")
1986
1987 coeff = self.regressionCoefficients()
1988 errors = self.errorBars()
1989 outputStream.write("Regression coefficients:\n")
1990 for n in range(self._nParameters):
1991 outputStream.write(" a_%d = %f +/- %f\n" % (n, coeff[n], errors[n]))
1992
1993 outputStream.write("Sum of squared residuals: %f\n" % self.sumSqResiduals())
1994 outputStream.write("Estimated variance of the residuals: %f\n" % self.residualVariance())
1995 outputStream.write("Coefficient of determination R^2: %f\n" % self.coefficientOfDetermination())
1996 outputStream.write("F-statistic: %f\n" % self.Fstatistic())
1997 outputStream.write("AIC: %f\n" % self.AICvalue())
1998 outputStream.write("BIC: %f\n" % self.BICvalue())
1999
2000
2001
2003
2004 """
2005 Returns the string written by printing the LinearFit object
2006
2007 """
2008
2009 return "Linear fit of a dataset of %d observations, " % self._nObservations \
2010 + "using a linear model with %d regressors" % self._nParameters
2011
2012
2013
2014
2015
2016
2017
2019
2020 """
2021 Evaluates your current best fit in regressors evaluated in new covariates.
2022
2023 Remark:
2024 - The new regressor functions should be exactly the same ones as you used
2025 to define the linear model. They should only be evaluated in new covariates.
2026 This is not checked for!
2027
2028 Example:
2029
2030 >>> noise = array([0.44, -0.48, 0.26, -2.00, -0.93, 2.21, -0.57, -2.04, -1.09, 1.53])
2031 >>> x = linspace(0, 5, 10)
2032 >>> obs = 2.0 + 3.0 * exp(x) + noise
2033 >>> myModel = LinearModel([ones(10), exp(x)], ["1", "exp(x)"])
2034 >>> print(myModel)
2035 Model: y = a_0 + a_1 * exp(x)
2036 Expected number of observations: 10
2037
2038 >>> myFit = myModel.fitData(obs)
2039 >>> xnew = linspace(-5.0, +5.0, 20)
2040 >>> y = myFit.evaluate([ones_like(xnew), exp(xnew)])
2041 >>> print(y)
2042 [ 1.53989966 1.55393018 1.57767944 1.61787944 1.68592536
2043 1.80110565 1.99606954 2.32608192 2.8846888 3.83023405
2044 5.43074394 8.13990238 12.72565316 20.48788288 33.62688959
2045 55.86708392 93.51271836 157.23490405 265.09646647 447.67207215]
2046
2047 @param regressors: either a list of equally-sized numpy arrays with the regressors
2048 evaluated in the new covariates: [f_0(xnew),f_1(xnew),f_2(xnew),...],
2049 or an N x M design matrix (numpy array) where these regressor arrays
2050 are column-stacked, with N the number of regressors, and M the number
2051 of data points.
2052 @type regressors: either a list or an ndarray
2053 @return: the linear model evaluated in the new regressors
2054 @rtype: ndarray
2055 """
2056
2057
2058
2059 if not isinstance(regressors, list) and not isinstance(regressors, np.ndarray):
2060 raise TypeError, "LinearModel only accepts a list of regressors, or a design matrix"
2061
2062
2063
2064 if isinstance(regressors, list):
2065 designMatrix = np.column_stack(np.double(regressors))
2066 else:
2067 designMatrix = regressors
2068
2069
2070
2071 if designMatrix.ndim != 2:
2072 raise TypeError, "Design matrix is not 2-dimensional"
2073
2074 if designMatrix.shape[1] != self._nParameters:
2075 raise TypeError, "Number of regressors not equal to the one of the original model"
2076
2077
2078
2079 return np.dot(designMatrix, self.regressionCoefficients())
2080
2081
2082
2083
2084 __all__ = [LinearModel, PolynomialModel, HarmonicModel, LinearFit]
2085