Issues in Computer Adaptive Testing of Second Language
Reading Proficiency
March 20 - 22, 1996
This seminar represented a milestone in the field of second language
assessment as the first international meeting to solely focus on second
language computer adaptive testing (CAT). Over 80 participants from around
the world came to learn about this cutting-edge area of language assessment.
The University of Minnesota has long been heralded as a leader in the
arena of language proficiency testing, where its Center for Advanced Research
on Language Acquisition (CARLA) has forwarded the agenda of proficiency
assessment through test development and research. The grant "Improving
and Strengthening Proficiency-Based Testing in Foreign Language Using Computer
Adaptive Testing Technologies," awarded to CARLA in the summer of
1996 enabled the CARLA research team to begin constructing computer adaptive
tests for assessing and providing diagnostic information regarding students'
reading proficiency in French, German, and Spanish. A principle objective
of this seminar was to address key issues that will inform the construction
of these tests.
This ground-breaking seminar featured leading experts in the fields of
computer adaptive testing, technology, second language reading proficiency,
and second language assessment. The presentations addressed theoretical
issues and empirical findings involving second language reading and assessment
with computer adaptive testing.
The seminar covered the following topics:
- Computer Adaptive Reading Proficiency Test Development at CARLA
- Second Language Reading Models and Research: Their Relation to Computer
Adaptive Testing
- Newest Trends in Computer Adaptive Testing, Including Scoring Algorithms
and Item Selection Heuristics
- Computerized Testing Technology Including Multimedia, Simulations,
Item Formats, Exposure and Security
- Second Language Computer Adaptive Testing and Assessment
- Item Response Theory
- Multiple Item Pool Use for Proficiency and Diagnostic Testing
Conference Presentations
A Perspective on Computerized Second Language Testing
David Weiss, Ph.D., Professor, Dept. of Psychology, University
of Minnesota
Improvements in second language testing can result from the computerized
mode of administration and from various forms of adaptive administration.
Some advantages of computerized administration will be discussed. The
origins and approaches of adaptive testing will be described and several
applications of adaptive testing to second language testing will be presented.
Computerized Testing on a Large Network: Issues for Today and
Tomorrow
Charles Johnston, Ph.D., Vice President for Technology, Drake Prometric
Corp.
Many institutions and organizations have moved to computerized delivery
of their exams, where computer adaptive testing represents one increasingly
growing delivery mode. The delivery often requires a large national or
global network of delivery points. This system must reflect the latest
trends in both test development and psychometrics, including multimedia
presentation, simulations, new item/testlet formats, expert scoring systems,
and security, among others.
Exploring New Item-Types for Computerized Testing: New Possibilities
and Challenges
Michael Yoes, Ph.D., President, Assessment Systems Corp.
Computerized tests are becoming more widely used. Little consideration
has been given to opportunities for new item-types uniquely offered by
computerization. Most computerized tests (including CATs) use item-types
from printed tests. Test developers can consider new item-types. A discussion
of possible new directions, and psychometric challenges will be presented.
Learning to Read in a Foreign Language and C-A Reading Assessment
William Grabe, Ph.D., Associate Professor, Dept. of English, Northern
Arizona University
This talk will first outline briefly a number of major findings from
L1 reading research which have important consequences for learning to
read in university foreign language (FL) contexts. The talk will then
present a set of issues (or dilemmas) which influence the development
of reading abilities in a university FL setting. Given these issues (or
dilemmas), and given the goals of a specific university modern languages
department, the last section will consider the concerns that need to be
addressed for implementing computer adaptive reading assessment.
If Reading is Reader-Based, Can There Be a Computer-Adaptive
Reading Test?
Elizabeth B. Bernhardt, Ph.D., Director of Language Center
& Professor of German Studies, Stanford University
This presentation reviews theories of reading in both first and second
languages. In addition, it examines the data buttressing each theory with
particular emphasis on recent re-examinations of the L1/L2 literacy relationship
data. The paper argues, from these individual perspectives and from their
syntheses that CAT is a potentially alien endeavor when attempting to
assess reading comprehension.
Computer Adaptive Testing: An Outsider's View
Tim McNamara, Ph.D., Associate Professor, Dept. of Linguistics
and Applied Linguistics, University of Melbourne
Technologically innovative forms of assessment inevitably generate excitement,
but such innovations need to be evaluated in the context of a broad range
of assessment needs. What can CAT do, and what can it not do? This paper
evaluates CAT from the point of view of current thinking on assessment,
particularly performance assessment.
Content Considerations for Testing Reading Proficiency Via Computerized-Adaptive
Tests
Jerry Larson, Ph.D., Director of Humanities Research Center & Professor
of Spanish, Brigham Young University
This presentation will focus on issues related to content of items found
in computerized-adaptive tests of reading proficiency. Of particular concern
is the need to provide reading passages that represent current language
in a variety of language settings. CAT algorithms to achieve appropriate
item selection will be demonstrated.
Checking the Utility and Appropriacy of the Content and Measurement
Models Used to Develop L2 Listening Comprehension CATs: Implications for
Further Development of Comprehensive CATs
Patricia Dunkel, Ph.D., Professor and Chair, Dept. of Applied
Linguistics & ESL, Georgia State University
Research and development of multi-media listening comprehension CATs
in ESL and Hausa will first be described. Then, the presenter will share
the insights gained from developing the CATs and from trialing the item
banks on examinees learning (or having learned) ESL and Hausa. The insights
derived both from observed data and from experience will be discussed
largely in relation to decisions made by the CAT developers a priori concerning
the following: (1) identification of the comprehension content/task model;
(2) designation of the framework used for item writing and creation of
the item banks; (3) selection of the Rasch IRT model as the CAT measurement
model; and (4) specification of the algorithm for item-selection and stopping
the CAT.
Towards Integrated Learning and Testing Using Structured Item
Banks for CAT
John de Jong, Ph.D., Head of the Language Testing Unit, CITO-The Dutch
National Institute for Educational Measurement
From a global perspective an ample amount of instruments seems to be
available for testing foreign language reading comprehension. At closer
inspection, however, it appears that many of these instruments lack in
quality and that most of them concentrate on a limited number of domains
in a restricted number of languages. Taking into account the diversity
of language needs in our present-day society, this chaotic situation leads
to the paradox that in fact the number of tests available is far from
sufficient. It is argued, therefore, that international collaboration
in building structured item banks is crucial if education wishes to meet
the marketing requirements and technological standards at the turn of
the century. Examples and suggestions will be presented to illustrate
how structures item banks can be set up for CAT.
Constructing a Reading Strength Profile with CAT
J. Michael Linacre, Ph.D., Associate Director, MESA Psychometric Laboratory,
University of Chicago
CAT offers flexibility, thoroughness, diagnosis and test security. Reading
short messages can be tested by multiple-choice paraphrases in the second
language, long texts by customized testlets of first language MCQ questions.
For screening use, time is minimized. For placement, longer tests diagnose
strengths. Test theory and reports are presented.
Adaptive Assessment of Reading Comprehension for TOEFL
Daniel R. Eignor, Ph.D., Principal Measurement Specialist, Education
Testing Service
ETS is presently in the process of assessing the feasibility of introducing
computer adaptive versions of each of the three sections of TOEFL, the
last of which currently measures reading comprehension. In this presentation,
the IRT model, item selection algorithm, and procedure for controlling
item exposure that have been chosen for use with the adaptive version
of the TOEFL reading comprehension section will be discussed, along with
reasons for making these choices.
The Practical Utility of Rasch Measurement Models
Richard Luecht, Ph.D., Senior Psychometrician, Director of Computer
Adaptive Testing, National Board of Medical Examiners
All statistical models are incomplete representations of reality; however,
some models are useful. The utility of a model depends on many factors,
including statistical fit, structural identifiability, parameter estimation
costs, and the substantive theory underlying the selection of the model.
This paper presents a comprehensive framework for evaluating the practical
utility of IRT models, in general, and empirically demonstrates the overall
usefulness of the rather parsimonious Rasch family of models, with a particular
emphasis on CAT and reading assessment applications.
Visit the Computer AdaptiveTesting
project page for more information.
|