Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Educational Leadership, Research and Counseling

First Advisor

Charles Teddlie


This study investigated both the effect of alternating criterion-referenced tests (CRT) with norm-referenced tests (NRT) in evaluating schools and whether mean scores were masking poor delivery of educational services to low achievers in such evaluations. The sample included 242 Louisiana public elementary schools (18,000 third graders tested in 1989). The study employed ten separate multiple regression models, each producing studentized residuals used as school effectiveness indicators (SEIs). The independent variables for all models were student's free lunch status, mother's educational level, and father's employment level. The dependent variables were school mean and lower quartile scores for CRT language arts and mathematics tests, and NRT reading, language, and mathematics tests. The study used SEIs to classify schools as effective, average, or ineffective. It classified each school according to ten different models using $\pm$1.00 standard error units (se) as the a priori decision criteria; it subsequently classified the schools again using $\pm$.674 se as the post hoc criteria. The study separately analyzed appropriate cross classification results: (1) CRT language arts & NRT language, (2) CRT language arts & NRT reading, (3) CRT mathematics & NRT mathematics, (4) CRT language arts mean & lower quartile, (5) CRT mathematics mean & lower quartile, (6) NRT language mean & lower quartile, (7) NRT reading mean & lower quartile, (8) NRT mathematics mean & lower quartile. The study tested each comparison with the kappa z-test; it measured agreement with the weighted kappa coefficient (chance-controlled agreement), the weighted agreement ratio (adjusted agreement), and the unweighted agreement ratio (absolute agreement). The study found the kappa-z tests significant beyond the.05 level. It found that magnitude measures were generally moderately consistent for CRT-NRT comparisons and high-moderately consistent for mean-quartile comparisons. The differences between kappa coefficients and agreement ratios diminished with the criteria change, suggesting the chance agreement was decreased by such change. It also found that all SEI sets demonstrated no significant relationship with the independent variables in the regression models. The study concludes that findings do not support alternating tests modes in evaluating schools, but do indicate that little mean-masking of lower quartile achievement is present. Finally, it suggests that the criteria $\pm$.440 se best controls chance agreement.