Multiple Regression Analysis refers to a group of techniques for studying the straight-line relationships among two or more variables. In general multiple regression procedures will estimate a linear equation of the form:
Y = a + b1X1 + b2X2 + ... + bnXn
The regression coefficients (or B coefficients) represent the independent contributions of each independent variable to the prediction of the dependent variable. Another way to express this fact is to say that, for example, variable X1 is correlated with the Y variable, after controlling for all other independent variables. This type of correlation is also referred to as a partial correlation. Although the regression problem may be solved by a number of techniques, the most-used method is least squares. The least squares regression analysis selects the b's (coefficients of regression function) to minimize the sum of the squared e's (residuals are differences between actual and predicted Y's).
Assumptions
Linearity
Multiple regression models the linear (straight-line) relationship between Y and the X's. This is most easily evaluated by scatter plots. If curvature in the relationships is evident, one may consider transforming the variables.
Constant Variance
The variance of the residuals (e's) is constant for all values of the X's. This can be detected by residual plots of e's versus Y's or the X's. If a residual plot shows an increasing or decreasing wedge or bowtie shape, nonconstant variance exists and must be corrected.
Outliers
We assume that all outliers (extreme cases) have been removed from the data. If not, they may cause nonconstant variance, nonnormality, or can seriously bias the results by "pulling" or "pushing" the regression line in a particular direction. Often, excluding just a single extreme case can yield a completely different set of results.
Normality
We assume the e's are normally distributed when hypothesis tests and confidence limits are to be used.
Independence
The X's are assumed to be uncorrelated with one another.
The major conceptual limitation of all regression techniques is that one can only ascertain relationships, but never be sure about underlying causal mechanism.
Example: Is product's sale (Y) affected by its price (X1) and advertising costs (X2)?