 |
Writing items requires a decision about
the nature of the item or question to which we ask students to respond,
that is, whether discreet or integrative, how we will score the
item; for example, objectively or subjectively, the skill we purport
to test, and so on. We also consider the characteristics of the
test takers and the test taking strategies respondents will need
to use. What follows is a short description of these considerations
for constructing items.
Test Items
A test item is a specific task test takers are asked to
perform.Test items can assess one or more points or objectives,
and the actual item itself may take on a different constellation
depending on the context. For example, an item may test one point
(understaning of a given vocabulary word) or several points (the
ability to obtain facts from a passage and then make inferences
based on the facts). Likewise, a given objective may be tested by
a series of items. For example, there could be five items all testing
one grammatical point (e.g., tag questions). Items of a similar
kind may also be grouped together to form subtests within
a given test.
Classifying Items
Discrete – A completely discrete-point item
would test simply one point or objective such as testing for the
meaning of a word in isolation. For example:
Choose the correct meaning of the word paralysis.
(A) inability to move
(B) state of unconscious
(C) state of shock
(D) being in pain |
Integrative – An integrative item would
test more than one point or objective at a time. (e.g., comprehension
of words, and ability to use them correctly in context). For example:
Demonstrate your comprehension of the following words by
using them together in a written paragraph: “paralysis,”
“accident,” and “skiing.” |
Sometimes an integrative item is really more a procedure than an
item, as in the case of a free composition, which could test a number
of objectives; for example, use of appropriate vocabulary, use of
sentence level discourse, organization, statement of thesis and
supporting evidence. For example:
Write a one-page essay describing three sports and the
relative likelihood of being injured while playing them competitively. |
Objective – A multiple-choice item, for
example, is objective in that there is only one right answer.
Subjective – A free composition may be more
subjective in nature if the scorer is not looking for any one right
answer, but rather for a series of factors (creativity, style, cohesion
and coherence, grammar, and mechanics).
The Skill Tested
The
language skills that we test include the more receptive skills on
a continuum – listening and reading, and the more productive
skills – speaking and writing. There are, of course, other
language skills that cross-cut these four skills, such as vocabulary.
Assessing vocabulary will most likely vary to a certain extent across
the four skills, with assessment of vocabulary in listening and
reading – perhaps covering a broader range than assessment
of vocabulary in speaking and writing. We can also assess nonverbal
skills, such as gesturing, and this can be both receptive (interpreting
someone else’s gestures) and productive (making one’s
own gestures).
The Intellectual Operation Required
Items may require test takers to employ different levels of intellectual
operation in order to produce a response (Valette,
1969, after Bloom et
al., 1956). The following levels of intellectual operation have
been identified:
- knowledge (bringing to mind the appropriate
material);
- comprehension (understanding the basic meaning
of the material);
- application (applying the knowledge of the
elements of language and comprehension to how they interrelate
in the production of a correct oral or written message);
- analysis (breaking down a message into its
constituent parts in order to make explicit the relationships
between ideas, including tasks like recognizing the connotative
meanings of words and correctly processing a dictation, and making
inferences);
- synthesis (arranging parts so as to produce
a pattern not clearly there before, such as in effectively organizing
ideas in a written composition); and
- evaluation (making quantitative and qualitative
judgments about material).
It has been popularly held that these levels demand increasingly
greater cognitive control as one moves from knowledge to evaluation
– that, for example, effective operation at more advanced
levels, such as synthesis and evaluation, would call for more advanced
control of the second language. Yet this has not necessarily been
borne out by research (see Alderson
& Lukmani, 1989). The truth is that what makes items difficult,
sometimes defies the intuitions of the test constructors.
The Tested Response Behavior
Items can also assess different types of response behavior. Respondents
may be tested for accuracy in pronunciation or grammar. Likewise,
they could be assessed for fluency, for example, without concern
for grammatical correctness. Aside from accuracy and fluency, respondents
could also be assessed for speed – namely, how quickly they
can produce a response, to determine how effectively the respondent
replies under time pressure.
In recent years, there has also been an increased concern for developing
measures of performance – that is, measures of the ability
to perform real-world tasks, with criteria for successful performance
based on a needs analysis for the given task (Brown,
1998; Norris, Brown,
Hudson, & Yoshioka, 1998).
Performance tasks might include “comparing credit card offers
and arguing for the best choice” or “maximizing the
benefits from a given dating service.” At the same time that
there is a call for tasks that are more reflective of the real world,
there is a commensurate concern for more authentic language
assessment. At least one study, however, notes that the differences
between authentic and pedagogic written and spoken texts may not
be readily apparent, even to an audience specifically listening
for differences (Lewkowicz,
1997). In addition, test takers may not necessarily concern
themselves with task authenticity in a test situation. Test familiarity
may be the overriding factor affecting performance.
Characteristics of Respondents
Items can be designed to be appropriate for groups of test-takers
with differing characteristics. Bachman
and Palmer (1996: 64-78) classify these characteristics into
four categories:
- the personal characteristics of the respondents – for
example, their age, gender, and native language;
- the knowledge of the topic that they bring to the language
testing situation;
- their affective schemata (that is, their prior likes and dislikes
with regard to assessment); and
- their language ability.
Research into the impact of these characteristics continues. For
example, with regard to the age variable, researchers have suggested
that educators revisit this issue and perhaps conceive of new ways
to consider the impact of the age variable in assessing language
ability (Marinova-Todd,
Marshall, & Snow, 2000).
With regard to performance on language measures, it would appear
that age interacts with other variables such as attitudes, motivation,
the length of exposure to the target language, as well as the nature
and quality of language instruction (see García
Mayo & García Lecumberri, 2003).
With regard to language ability, both Bachman
and Palmer (1996) and Alderson
(2000) detail the many types of knowledge that respondents may
need to draw on to perform well on a given item or task:
- world knowledge and culturally-specific knowledge,
- knowledge of how the specific grammar works,
- knowledge of different oral and written text types,
- knowledge of the subject matter or topic, and
- knowledge of how to perform well on the given task.
Item-Elicitation Format
The format for item elicitation has to be determined for any given
item. An item can have a spoken, written, or visual stimulus, as
well as any combination of the three. Thus, while an item or task
may ostensibly assess one modality, it may also be testing some
other as well.
So, for example, a subtest referred to as “listening”
which has respondents answer oral questions by means of written
multiple-choice responses is testing reading as well as
listening.
It would be possible to avoid introducing this reading element
by having the multiple-choice alternatives presented orally as well.
But then the tester would be introducing yet another factor, namely,
short-term memory ability, since the respondents would have to remember
all the alternatives long enough to make an informed choice.
Item-Response Format
The item-response format can be fixed, structured, or open-ended.
Item responses with a fixed format include true/false,
multiple-choice, and matching items.
Item responses, which call for a structured format
include ordering (where respondents are requested to arrange words
to make a sentence, and several orders are possible), duplication
– both written (such as., dictation) and oral (for example,
recitation, repetition, mimicry), identification (explaining the
part of speech of a form), and completion.
Those item responses calling for an open-ended format
include composition – both written (for example, creative
fiction, expository essays) and oral (such as a speech) –
as well as other activities, such as free oral response in role-playing
situations.
Grammatical competence
According to Canale and
Swain (1980, p. 29), grammatical competence includes
phonology, morphology, syntax, knowledge of lexical items, and semantics,
as well as matters of mechanics (spelling, punctuation, capitalization,
and handwriting). It would seem that this definition is perhaps
too broad for practical purposes.
A truly perplexing issue is determining what constitutes a grammatical
error, as well as determining the severity of this error. In
other words, will the use of the error stigmatize the speaker? Let
us say that we are using a grammatical scale which deals
with how acceptably words, phrases, and sentences are formed and
pronounced in the respondents' utterances. Let us assume that the
focus is on both of the following:
- clear cases of errors in form, such as the use of the present
perfect for an action completed in the past (e.g., ”We have
had a great time at your house last night."), and
- matters of style, such as the use of a passive verb form in
a context where a native would use the active form (e.g., Question
- “What happened to the CD I lent you, Jorge?” Reply
- "The CD was lost." vs. "I lost your CD.").
Major grammatical errors might be considered those that either
interfere with intelligibility or stigmatize the speaker. Minor
errors would be those that do not get in the way of the listener's
comprehension nor would they annoy the listener to any extent.
Thus, getting the tense wrong in the above example, "We have
had a great time at your house last night" could be viewed
as a minor error, whereas in another case, producing "I don't
have what to say" ("I really have no excuse" by translating
directly from the appropriate Hebrew language) could be considered
a major error since it is not only ungrammatical but also could
stigmatize the speaker as rude and unconcerned, rather than apologetic.
|
 |