An Agenda for Research on Educational Testing
Marguerite Clarke, George Madaus, Joseph Pedulla, and Arnold
National Board on Educational Testing and Public Policy
Carolyn A. and Peter S. Lynch School of Education
Volume 2, Number 1 January 2001
Educational testing has become a large part of the lives of students and their families. From entry to kindergarten to admission to higher education, tests mark some of the most important educational and career decisions in a student's life. For some students, they serve as a gateway to educational opportunities; for others, as a gatekeeper, preventing or limiting their access to these opportunities. The National Board on Educational Testing and Public Policy was formed to monitor the effects of tests on students, schools, and society, and to encourage their use as gateways to education, not gatekeepers.
The National Board believes that we must as a nation conduct research that helps testing contribute to student learning, classroom practice, and state and district management of school resources. The research agenda proposed by the National Board has five priorities:
Monitoring the effects of state-level tests
Designing state systems for accountability
Understanding the role of tests in standards-based educational reform
Understanding how standardized tests are used in college admissions
Understanding the link between technology and testing
Below we explore each of these five priorities. In formulating an agenda that cuts across the fields of measurement, teaching, and educational administration on the one hand and
family and state interests on the other, we hope to begin a dialogue that will serve students, families, practitioners, and elected officials.
Monitoring the Effects of State-Level Tests
Promotion and Exit Exams
More and more states are requiring a passing grade on a promotion exam before a student can move on to the next grade, or an exit exam for high school graduation, or both. So far the focus has been mainly on the exams themselves, especially their validity and reliability, and hardly at all on their effects on teaching and curriculum. The National Board urges that applied studies be done that look deeply into the effects of these tests in school systems, school districts, and individual schools, among different student populations and on different curricula. Such studies could also examine the validity of the inferences or decisions that are based on the test information. This information on the educational effects of tests will give feedback to policy makers as to what is working well and what needs to be changed, making it possible to improve instruction through the improved use of test information.
The National Board also urges that states using promotion and exit exams develop and implement so-called formative evaluations along the lines suggested here. A vital issue in any such evaluation is how the scores defining performance categories so-called cut scores are set. It is especially important to address carefully the validity of the constructed performance levels and the inferences drawn from them.
Increasingly, states are testing candidates for their ability to become teachers, and those already teaching for their competence to remain teachers. A valid teacher certification test can help ensure that entry-level teachers have the skills and knowledge necessary for teaching. These tests have high stakes not only for those tested but also for students, families, and school systems. Without an adequate supply of well-prepared candidates, the pipeline of new teachers will not come close to supplying the numbers required by some 50 million young people in K-12 classrooms. Without professional development to enhance their skills, those already teaching are not likely to move school systems forward on a reform agenda that calls for excellence in teaching and learning.
Validity research on teacher testing needs to address the following four issues in particular:
Does the test measure, appropriately and in a technically sound way, teachers' mastery of subject matter?
Does the test measure, appropriately and in a technically sound way, teachers' mastery of instructional methods?
Do the scores established for passing grades and categories of proficiency represent fairly how well prepared a teacher is, in terms of content and method, to provide reasonable instruction?
How does the test affect admissions decisions to schools of education and curriculum and instruction in teacher training programs?
In a fully developed system of testing, the schools would link teacher testing to teacher training, teacher preparation (as a product of training), and teacher supply. Such a system would in time provide an adequate number of high-quality teachers for service in our nation's schools.
Designing State Systems for Accountability
All states need to develop accountability systems that deal thoughtfully and usefully with test results, that deploy educational resources so as to aid teaching and learning, and that involve families in educational policy and in schools. Almost every state is now instituting accountability systems to measure progress in standards-based reform, and almost every such system depends heavily on testing as an indicator of student or school performance. In order to fulfill their purpose, the tests must be both technically sound and practically useful that is, they must accurately test what has actually been taught and should be used in combination with other indicators of student, school, and district performance.
Studies on the design of state accountability systems should start with the needs of various constituencies and move to a determination of the extent to which timely, straightforward, and equitable results can be produced, and in what manner. The result will be not a compromise but a model accountability program that meets technical requirements adequately while addressing public policy concerns reasonably.
The importance of this point is reflected in a problem currently encountered in many state accountability systems. A number of states are setting goals for academic improvement that are politically desirable but educationally and technically infeasible given time and other constraints involved. That is, they are mandating educational growth that cannot be achieved within the one or two years allotted. Technical possibility and policy desiderata need to link squarely and directly if these accountability systems are to enhance educational performance.
This is relatively new territory for measurement professionals, educators, and public policy representatives who have tended to go it alone in deliberating educational policy. The development of accountability systems therefore requires boldness in approach and cooperation in execution if the systems are to work for all the interested constituencies.
Understanding the Role of Tests in Standards-Based Educational Reform
Regular feedback in the form of surveys is needed to understand how those charged with implementing standards-based educational reform teachers, superintendents, parents, and policy makers think about the uses of tests and the high-impact decisions that follow from them. Studies of this sort in the 1960s (out of the Russell Sage Foundation), in the 1980s (with National Science Foundation funding), and in the 1990s (in Texas) have produced highly useful and revealing information on standards-based reform in action. They need to be repeated every two years or so, so that trends can be documented.
Testing Programs and Dropout Rates
In addition to directly surveying the implementers of educational reform, we need to investigate a possible connection between the high stakes testing programs that are often part of standards-based reform and student dropout rates. The two may be related (See National Board publication High Stakes Testing and High School Completion), though for now, all we can offer is hunches without good data. The National Board recommends rigorous applied studies to help understand how high stakes tests affect students' decisions to drop out and how schools, intentionally or otherwise, may encourage or discourage students to remain in school.
Understanding How Standardized Tests Are Used in College Admissions
In addition to refining research into the use of standardized tests in making admissions decisions, we need to understand better the relationship between testing and the diversity of the college student body. Given challenges to affirmative action, we need to know how the admissions process works, the role of tests in admissions decisions, and the effects of alternative definitions of diversity on the composition of the admitted student body. Admissions decisions will need to be simulated with and without race as a consideration, and the admissions process as it varies from highly selective to less selective institutions will have to be studied and understood in detail.
Because of the importance of these studies, the National Board has already begun research work along the lines described.
Understanding the Link between Technology and Testing
Computer-adaptive testing and computer-based testing are coming of age. They are already in place for some standardized tests. They may soon become a regular feature in higher education and are expected to make their way into secondary education in the near future.
Evidence suggests that students' experience with computers directly affects their scores on computerized tests (See National Board publication The Gap between Testing and Technology in Schools). We need to scale up this research to take into account variation by subject, type of test, testing algorithms, and student characteristics. The results will guide measurement professionals, educators, families, students, and elected officials in (1) decisions on introducing computer-adaptive and computer-based testing, (2) interpretation of scores, and (3) establishing when and under what conditions to avoid marrying testing with computer technology.
A mandate of the National Board on Educational Testing and Public Policy is to help set a national agenda for research on testing. That agenda should establish the best uses of tests in educational reform, school system accountability, and college admissions and the role of computerized tests in teaching and learning. Our purpose is to begin a dialogue that defines needed research on testing and delivers high-quality information for policy decision making. We look forward to engaging educators, families, students, and measurement professionals in considering the research proposed here and in suggesting further priorities for research.
About the Authors
testing in the news