Class MillerUpdatingRegression

java.lang.Object
org.apache.commons.math3.stat.regression.MillerUpdatingRegression
All Implemented Interfaces:
UpdatingMultipleLinearRegression

public class MillerUpdatingRegression extends Object implements UpdatingMultipleLinearRegression
This class is a concrete implementation of the UpdatingMultipleLinearRegression interface.

The algorithm is described in:

 Algorithm AS 274: Least Squares Routines to Supplement Those of Gentleman
 Author(s): Alan J. Miller
 Source: Journal of the Royal Statistical Society.
 Series C (Applied Statistics), Vol. 41, No. 2
 (1992), pp. 458-478
 Published by: Blackwell Publishing for the Royal Statistical Society
 Stable URL: http://www.jstor.org/stable/2347583 

This method for multiple regression forms the solution to the OLS problem by updating the QR decomposition as described by Gentleman.

Since:
3.0
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private final double[]
    diagonals of cross products matrix
    private final double
    zero tolerance
    private boolean
    boolean flag whether a regression constant is added
    private final boolean[]
    flags for variables with linear dependency problems
    private long
    number of observations entered
    private final int
    number of variables in regression
    private final double[]
    the off diagonal portion of the R matrix
    private final double[]
    the elements of the R`Y
    private final double[]
    residual sum of squares for all nested regressions
    private boolean
    has rss been called?
    private double
    sum of squared errors of largest regression
    private double
    summation of squared Y values
    private double
    summation of Y variable
    private final double[]
    the tolerance for each of the variables
    private boolean
    has the tolerance setting method been called
    private final int[]
    order of the regressors
    private final double[]
    workspace for singularity method
    private final double[]
    scratch space for tolerance calc
    private final double[]
    singular x values
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
    Set the default constructor to private access to prevent inadvertent instantiation
     
    MillerUpdatingRegression(int numberOfVariables, boolean includeConstant)
    Primary constructor for the MillerUpdatingRegression.
     
    MillerUpdatingRegression(int numberOfVariables, boolean includeConstant, double errorTolerance)
    This is the augmented constructor for the MillerUpdatingRegression class.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    addObservation(double[] x, double y)
    Adds an observation to the regression model.
    void
    addObservations(double[][] x, double[] y)
    Adds multiple observations to the model.
    void
    As the name suggests, clear wipes the internals and reorders everything in the canonical order.
    private double[]
    cov(int nreq)
    Calculates the cov matrix assuming only the first nreq variables are included in the calculation.
    double
    getDiagonalOfHatMatrix(double[] row_data)
    Gets the diagonal of the Hat matrix also known as the leverage matrix.
    long
    Gets the number of observations added to the regression model.
    int[]
    Gets the order of the regressors, useful if some type of reordering has been called.
    double[]
    In the original algorithm only the partial correlations of the regressors is returned to the user.
    boolean
    A getter method which determines whether a constant is included.
    private void
    include(double[] x, double wi, double yi)
    The include method is where the QR decomposition occurs.
    private void
    inverse(double[] rinv, int nreq)
    This internal method calculates the inverse of the upper-triangular portion of the R matrix.
    private double[]
    regcf(int nreq)
    The regcf method conducts the linear regression and extracts the parameter vector.
    Conducts a regression on the data in the model, using all regressors.
    regress(int numberOfRegressors)
    Conducts a regression on the data in the model, using a subset of regressors.
    regress(int[] variablesToInclude)
    Conducts a regression on the data in the model, using regressors in array Calling this method will change the internal order of the regressors and care is required in interpreting the hatmatrix.
    private int
    reorderRegressors(int[] list, int pos1)
    ALGORITHM AS274 APPL.
    private void
    The method which checks for singularities and then eliminates the offending columns.
    private double
    smartAdd(double a, double b)
    Adds to number a and b such that the contamination due to numerical smallness of one addend does not corrupt the sum.
    private void
    ss()
    Calculates the sum of squared errors for the full regression and all subsets in the following manner:
    private void
    This sets up tolerances for singularity testing.
    private void
    vmove(int from, int to)
    ALGORITHM AS274 APPL.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • nvars

      private final int nvars
      number of variables in regression
    • d

      private final double[] d
      diagonals of cross products matrix
    • rhs

      private final double[] rhs
      the elements of the R`Y
    • r

      private final double[] r
      the off diagonal portion of the R matrix
    • tol

      private final double[] tol
      the tolerance for each of the variables
    • rss

      private final double[] rss
      residual sum of squares for all nested regressions
    • vorder

      private final int[] vorder
      order of the regressors
    • work_tolset

      private final double[] work_tolset
      scratch space for tolerance calc
    • nobs

      private long nobs
      number of observations entered
    • sserr

      private double sserr
      sum of squared errors of largest regression
    • rss_set

      private boolean rss_set
      has rss been called?
    • tol_set

      private boolean tol_set
      has the tolerance setting method been called
    • lindep

      private final boolean[] lindep
      flags for variables with linear dependency problems
    • x_sing

      private final double[] x_sing
      singular x values
    • work_sing

      private final double[] work_sing
      workspace for singularity method
    • sumy

      private double sumy
      summation of Y variable
    • sumsqy

      private double sumsqy
      summation of squared Y values
    • hasIntercept

      private boolean hasIntercept
      boolean flag whether a regression constant is added
    • epsilon

      private final double epsilon
      zero tolerance
  • Constructor Details

    • MillerUpdatingRegression

      private MillerUpdatingRegression()
      Set the default constructor to private access to prevent inadvertent instantiation
    • MillerUpdatingRegression

      public MillerUpdatingRegression(int numberOfVariables, boolean includeConstant, double errorTolerance) throws ModelSpecificationException
      This is the augmented constructor for the MillerUpdatingRegression class.
      Parameters:
      numberOfVariables - number of regressors to expect, not including constant
      includeConstant - include a constant automatically
      errorTolerance - zero tolerance, how machine zero is determined
      Throws:
      ModelSpecificationException - if numberOfVariables is less than 1
    • MillerUpdatingRegression

      public MillerUpdatingRegression(int numberOfVariables, boolean includeConstant) throws ModelSpecificationException
      Primary constructor for the MillerUpdatingRegression.
      Parameters:
      numberOfVariables - maximum number of potential regressors
      includeConstant - include a constant automatically
      Throws:
      ModelSpecificationException - if numberOfVariables is less than 1
  • Method Details

    • hasIntercept

      public boolean hasIntercept()
      A getter method which determines whether a constant is included.
      Specified by:
      hasIntercept in interface UpdatingMultipleLinearRegression
      Returns:
      true regression has an intercept, false no intercept
    • getN

      public long getN()
      Gets the number of observations added to the regression model.
      Specified by:
      getN in interface UpdatingMultipleLinearRegression
      Returns:
      number of observations
    • addObservation

      public void addObservation(double[] x, double y) throws ModelSpecificationException
      Adds an observation to the regression model.
      Specified by:
      addObservation in interface UpdatingMultipleLinearRegression
      Parameters:
      x - the array with regressor values
      y - the value of dependent variable given these regressors
      Throws:
      ModelSpecificationException - if the length of x does not equal the number of independent variables in the model
    • addObservations

      public void addObservations(double[][] x, double[] y) throws ModelSpecificationException
      Adds multiple observations to the model.
      Specified by:
      addObservations in interface UpdatingMultipleLinearRegression
      Parameters:
      x - observations on the regressors
      y - observations on the regressand
      Throws:
      ModelSpecificationException - if x is not rectangular, does not match the length of y or does not contain sufficient data to estimate the model
    • include

      private void include(double[] x, double wi, double yi)
      The include method is where the QR decomposition occurs. This statement forms all intermediate data which will be used for all derivative measures. According to the miller paper, note that in the original implementation the x vector is overwritten. In this implementation, the include method is passed a copy of the original data vector so that there is no contamination of the data. Additionally, this method differs slightly from Gentleman's method, in that the assumption is of dense design matrices, there is some advantage in using the original gentleman algorithm on sparse matrices.
      Parameters:
      x - observations on the regressors
      wi - weight of the this observation (-1,1)
      yi - observation on the regressand
    • smartAdd

      private double smartAdd(double a, double b)
      Adds to number a and b such that the contamination due to numerical smallness of one addend does not corrupt the sum.
      Parameters:
      a - - an addend
      b - - an addend
      Returns:
      the sum of the a and b
    • clear

      public void clear()
      As the name suggests, clear wipes the internals and reorders everything in the canonical order.
      Specified by:
      clear in interface UpdatingMultipleLinearRegression
    • tolset

      private void tolset()
      This sets up tolerances for singularity testing.
    • regcf

      private double[] regcf(int nreq) throws ModelSpecificationException
      The regcf method conducts the linear regression and extracts the parameter vector. Notice that the algorithm can do subset regression with no alteration.
      Parameters:
      nreq - how many of the regressors to include (either in canonical order, or in the current reordered state)
      Returns:
      an array with the estimated slope coefficients
      Throws:
      ModelSpecificationException - if nreq is less than 1 or greater than the number of independent variables
    • singcheck

      private void singcheck()
      The method which checks for singularities and then eliminates the offending columns.
    • ss

      private void ss()
      Calculates the sum of squared errors for the full regression and all subsets in the following manner:
       rss[] ={
       ResidualSumOfSquares_allNvars,
       ResidualSumOfSquares_FirstNvars-1,
       ResidualSumOfSquares_FirstNvars-2,
       ..., ResidualSumOfSquares_FirstVariable} 
    • cov

      private double[] cov(int nreq)
      Calculates the cov matrix assuming only the first nreq variables are included in the calculation. The returned array contains a symmetric matrix stored in lower triangular form. The matrix will have ( nreq + 1 ) * nreq / 2 elements. For illustration
       cov =
       {
        cov_00,
        cov_10, cov_11,
        cov_20, cov_21, cov22,
        ...
       } 
      Parameters:
      nreq - how many of the regressors to include (either in canonical order, or in the current reordered state)
      Returns:
      an array with the variance covariance of the included regressors in lower triangular form
    • inverse

      private void inverse(double[] rinv, int nreq)
      This internal method calculates the inverse of the upper-triangular portion of the R matrix.
      Parameters:
      rinv - the storage for the inverse of r
      nreq - how many of the regressors to include (either in canonical order, or in the current reordered state)
    • getPartialCorrelations

      public double[] getPartialCorrelations(int in)
      In the original algorithm only the partial correlations of the regressors is returned to the user. In this implementation, we have
       corr =
       {
         corrxx - lower triangular
         corrxy - bottom row of the matrix
       }
       Replaces subroutines PCORR and COR of:
       ALGORITHM AS274  APPL. STATIST. (1992) VOL.41, NO. 2 

      Calculate partial correlations after the variables in rows 1, 2, ..., IN have been forced into the regression. If IN = 1, and the first row of R represents a constant in the model, then the usual simple correlations are returned.

      If IN = 0, the value returned in array CORMAT for the correlation of variables Xi invalid input: '&' Xj is:

       sum ( Xi.Xj ) / Sqrt ( sum (Xi^2) . sum (Xj^2) )

      On return, array CORMAT contains the upper triangle of the matrix of partial correlations stored by rows, excluding the 1's on the diagonal. e.g. if IN = 2, the consecutive elements returned are: (3,4) (3,5) ... (3,ncol), (4,5) (4,6) ... (4,ncol), etc. Array YCORR stores the partial correlations with the Y-variable starting with YCORR(IN+1) = partial correlation with the variable in position (IN+1).

      Parameters:
      in - how many of the regressors to include (either in canonical order, or in the current reordered state)
      Returns:
      an array with the partial correlations of the remainder of regressors with each other and the regressand, in lower triangular form
    • vmove

      private void vmove(int from, int to)
      ALGORITHM AS274 APPL. STATIST. (1992) VOL.41, NO. 2. Move variable from position FROM to position TO in an orthogonal reduction produced by AS75.1.
      Parameters:
      from - initial position
      to - destination
    • reorderRegressors

      private int reorderRegressors(int[] list, int pos1)
      ALGORITHM AS274 APPL. STATIST. (1992) VOL.41, NO. 2

      Re-order the variables in an orthogonal reduction produced by AS75.1 so that the N variables in LIST start at position POS1, though will not necessarily be in the same order as in LIST. Any variables in VORDER before position POS1 are not moved. Auxiliary routine called: VMOVE.

      This internal method reorders the regressors.

      Parameters:
      list - the regressors to move
      pos1 - where the list will be placed
      Returns:
      -1 error, 0 everything ok
    • getDiagonalOfHatMatrix

      public double getDiagonalOfHatMatrix(double[] row_data)
      Gets the diagonal of the Hat matrix also known as the leverage matrix.
      Parameters:
      row_data - returns the diagonal of the hat matrix for this observation
      Returns:
      the diagonal element of the hatmatrix
    • getOrderOfRegressors

      public int[] getOrderOfRegressors()
      Gets the order of the regressors, useful if some type of reordering has been called. Calling regress with int[]{} args will trigger a reordering.
      Returns:
      int[] with the current order of the regressors
    • regress

      Conducts a regression on the data in the model, using all regressors.
      Specified by:
      regress in interface UpdatingMultipleLinearRegression
      Returns:
      RegressionResults the structure holding all regression results
      Throws:
      ModelSpecificationException - - thrown if number of observations is less than the number of variables
    • regress

      public RegressionResults regress(int numberOfRegressors) throws ModelSpecificationException
      Conducts a regression on the data in the model, using a subset of regressors.
      Parameters:
      numberOfRegressors - many of the regressors to include (either in canonical order, or in the current reordered state)
      Returns:
      RegressionResults the structure holding all regression results
      Throws:
      ModelSpecificationException - - thrown if number of observations is less than the number of variables or number of regressors requested is greater than the regressors in the model
    • regress

      public RegressionResults regress(int[] variablesToInclude) throws ModelSpecificationException
      Conducts a regression on the data in the model, using regressors in array Calling this method will change the internal order of the regressors and care is required in interpreting the hatmatrix.
      Specified by:
      regress in interface UpdatingMultipleLinearRegression
      Parameters:
      variablesToInclude - array of variables to include in regression
      Returns:
      RegressionResults the structure holding all regression results
      Throws:
      ModelSpecificationException - - thrown if number of observations is less than the number of variables, the number of regressors requested is greater than the regressors in the model or a regressor index in regressor array does not exist