Linear relations between two sets of interval variables

Canonical Correlation

Canonical correlation finds a weighted average of the variables from the first set and correlates this with a weighted average of the variables from the second set. The weights are constructed to maximize the correlation between these two averages. This correlation is called the first canonical correlation coefficient. You can create another set of weighted averages unrelated to the first and calculate their correlation. This correlation is the second canonical correlation coefficient. This process continues until the number of canonical correlations equals the number of variables in the smallest group.

Canonical correlation terminology makes an important distinction between the words variables and variates (roots). The term variables is reserved for referring to the original variables being analyzed. The term variates is used to refer to variables that are constructed as weighted averages of the original variables.

Assumptions

Multivariate Normality
The tests of significance of the canonical correlations is based on the assumption that the distributions of the variables in the population (from which the sample was drawn) are multivariate normal. With a sufficiently large sample size (see below) the results from canonical correlation analysis are usually quite robust.

Sample sizes
To arrive at reliable estimates for two canonical roots, Barcikowski and Stevens (1975) recommend, based on a Monte Carlo study, to include 40 to 60 times as many cases as variables.

Outliers
Outliers (extreme cases) can greatly affect the magnitudes of correlation coefficients. Since canonical correlation is based on correlation coefficients, they can also seriously affect the canonical correlations. It is a good idea to examine various scatterplots to detect possible outliers.

Multicollinearity
Multicollinearity occurs when one (original) variable is almost a weighted average of the others. In this case, you must reduce the number of variables.

Discriminant analysis, MANOVA, and multiple regression are all special cases of canonical correlation.