Analysis of dependent nominal variable Y on independent interval variables X

Discriminant analysis

Discriminant Analysis finds a set of prediction equations based on independent variables that are used to classify individuals into groups.

There are two objectives in Discriminant analysis:
(1) Finding a predictive equation for classifying new individuals
(2) Interpreting the predictive equation to better understand the relationships that may exist among the variables.

Issues and guidelines (Tabachnick, 1989)

Unequal Group Size
When using discriminant analysis, you should have more observations per group than you have independent variables. If the relative group sample sizes are not representative of their sizes in the overall population, the classification procedure will be erroneous. (You can make appropriate adjustments to prevent these erroneous classifications by adjusting the prior probabilities.)

Multivariate Normality
A sample size of at least twenty observations in the smallest group is usually adequate to ensure robustness of any inferential tests that may be made.

Outliers
Outliers (extreme cases) can cause severe problems that even the robustness of discriminant analysis will not overcome. You should screen your data carefully for outliers using the various univariate and multivariate normality tests and plots to determine if the normality assumption is reasonable. You should perform these tests on one group at a time.

Homogeneity of Covariance Matrices
This assumption may be tested with Box's M test. If the covariance matrices appear to be grossly different, you should take some corrective action. Corrective action usually includes the close screening for outliers and the use of variance-stabilizing transformations such as the logarithm.

Linearity
Discriminant analysis assumes linear relations among the independent variables. You should study scatter plots of each pair of independent variables, using a different color for each group. Look carefully for curvilinear patterns and for outliers. The occurrence of a curvilinear relationship will reduce the power and the discriminating ability of the discriminant equation. Transformation can be helpful again.

Multicollinearity
Multicollinearity occurs when one predictor variable is almost a weighted average of the others. This collinearity will only show up when the data are considered one group at a time. Forms of multicollinearity may show up when you have very small group sample sizes. In this case, you must reduce the number of independent variables.

Example: Is there any relationship between employee's performance (Y) and score in entrance tests (X1, X2,...), he/she achieved as a candidate? How can personal manager from database of present employees and candidate's scores guess whether his/her performance will be top, average or low.