ANCOVA (Analysis of Covariance)

Overview

Analysis of covariance is used to test the main and interaction effects of categorical variables on a continuous dependent variable, controlling for the effects of selected other continuous variables, which co-vary with the dependent. The control variables are called the "covariates."

ANCOVA is used for several purposes:

    * In experimental designs, to control for factors which cannot be randomized but which can be measured on an interval scale.
    * In observational designs, to remove the effects of variables which modify the relationship of the categorical independents to the interval dependent.
    * In regression models, to fit regressions where there are both categorical and interval independents. (This third purpose has become displaced by logistic regression and other methods.
 

Key Concepts and Terms

     Covariate: An interval-level (i.e. continuous) independent variable. If there are no covariates, ANOVA must be used instead of ANCOVA, and if there are covariates, ANCOVA is used instead of ANOVA. Covariates are commonly used as control variables. For instance, use of a baseline pre-test score can be used as a covariate to control for initial group differences on math ability or whatever is being assessed in the ANCOVA study. That is, in ANCOVA we look at the effects of the categorical      independents on an interval dependent (i.e. response) variable, after effects of interval covariates are controlled. (This is similar to regression, where the beta weights of categorical independents represented as dummy variables entered after interval independents reflect the control effect of these independents).

     F-test: The F-test of significance is used to test each main and interaction effect, for the case of a single interval dependent and multiple (>2) groups formed by a categorical independent. F is between-groups variance divided by within-groups variance. If the computed p-value is small, then significant relationships exist.
     Adjusted means are usually part of ANCOVA output and are examined if the F-test demonstrates significant relationships exist. Comparison of the original and adjusted group means can provide insight into the role of the covariates. For k groups formed by categories of the categorical independents and measured on the dependent variable, the adjustment shows how these k means were altered to control for the covariates. Typically, this adjustment is one of linear regression of the type: Yadj.mean = Ymean – b*(Xith.mean-Xmean), where Y is the interval dependent, X is the covariate, i is one of the k groups, and b is the regression coefficient. There is no constant when Y is standardized. For multiple covariates, of course, there are additional similar X terms in the equation.

     t-test: A test of significance of the difference in the means of a single interval dependent, for the case of two groups formed by a categorical independent.

Can ANCOVA be modeled using regression? Yes, if dummy variables are used for the categorical independents. When creating dummy variables, one must use one less category than there are values of each independent. For full ANCOVA one would also add the interaction cross-product terms for each pair of independents included in the
equation, including the dummies. Then one computes multiple regression. The resulting F tests will be the same as in classical ANCOVA. F ratio can also be computed through the extra sum of squares using Full-Reduced Model approach.

Assumptions

     (At least one categorical and at least one interval independent) The independent variable(s) may be categorical, except at least one must be a covariate (interval level). Likewise, at least one independent must be categorical.

     (Interval dependent) The dependent variable is continuous and interval level.

     (Low measurement error of the covariate) The covariate variables are continuous and interval level, and are assumed to be measured without error.

     (Covariate linearly or in known relationship to the dependent) The form of the relationship between the covariate and the dependent must be known and most computer programs assume this relationship is linear, adjusting the dependent mean based on linear regression. Scatterplots of the covariate and the dependent for each of the k groups
formed by the independents is one way to assess violations of this assumption.

     (Homogeneity of covariate regression coefficients; i.e. “parallel lines model”) The covariate coefficients (the slopes of the regression lines) are the same for each group formed by the categorical variables and measured on the dependent. The more this assumption is violated, the more conservative ANCOVA becomes (the more likely it is to make Type I errors - accepting a false null hypothesis). There is a statistical test of the assumption of homogeneity of regression coefficients.

     (Additivity) The values of the dependent are an additive combination of its overall mean, the effect of the categorical independents, the covariate effect, and an error term. ANCOVA is robust against violations of additivity but in severe violations the researcher may transform the data, as by using a logarithmic transformation to change a multiplicative model into an additive model. Note, however, that ANCOVA automatically handles interaction effects and thus is not an additive procedure in the sense of regression models without interaction terms.

     (Independence of the error term) The error term is independent of the covariates and the categorical independents. Randomization in experimental designs assures this assumption will be met.

     (Independent variables orthogonal to covariates) The independents are orthogonal. If the covariate is influenced by the categorical independents, then the control adjustment ANCOVA makes on the dependent variable prior to assessing the effects of the categorical independents will be biased since some indirect effects of the independents will be removed from the dependent.

     (Homogeneity of variances) There is homogeneity of variances in the cells formed by the independent categorical variable Heteroscedasticity is lack of homogeneity of variances, in violation of this assumption.

     (Multivariate normality) For purposes of significance testing, variables follow multivariate normal distributions.

     (Compound sphericity) The groups display sphericity (the variance of the difference between the estimated means for any two different groups is the same) A more restrictive assumption, called compound symmetry, is that the correlations between any two different groups are the same value. If compound symmetry exists, sphericity exists. Tests or adjustments for lack of sphericity are usually actually based on possible lack of compound symmetry.