 |
Objective and subjective
evaluation
If we view objectivity and subjectivity
of evaluation along a continuum, we can represent various assessment
and scoring methods along its length.

Test items that can be evaluated objectively have
one right answer (or one correct response pattern, in the case of
more complex item formats). Scorers do not need to exercise judgment
in marking responses correct or incorrect. They generally mark a
test by following an answer key. In some cases, objective tests
are scored by scanning machines and computers. Objective tests are
often constructed with selected-response
item formats, such as multiple-choice, matching, and true-false.
An advantage to including selected-response items in objectively
scored tests is that the range of possible answers is limited to
the options provided by the test writer—the test taker cannot
supply alternative, acceptable responses.
Because much of what we assess in reading and listening
comprehension measures is first interpreted by the test writer,
some degree of subjectivity is present in objectively scored items.
For that reason, assessments of the Interpretive mode, even those
comprised of "one-right-answer" items, might not be placed
all the way at the objective end of the continuum.
Evaluating responses objectively can be more difficult with even
the simplest of constructed-response
item formats. An answer key may specify the correct answer for
a one word, gap-filling item, but there may in fact be multiple,
acceptable alternative responses to that item that the
teacher or test developer did not anticipate. In classroom testing
situations, teachers may perceive some responses as equally or partially
correct, and apply some subjective judgment in refining their scoring
criteria as they mark tests. Informal scoring criteria for short-answer
items probably work well for classroom testing as long as they are
applied consistently and are defensible.
Just as there may be few truly objective measures
of second language knowledge and skill, so too is it rare to find
purely subjective evaluations of performance. Allowing the subjective
impressions of scorers to determine learners' grades would not be
acceptable to most students, their parents, or other stakeholders.
We do not usually have to justify our opinion that a work of art
is good or bad—we simply like it or we don't. Since our judgment
has no significant consequences for the artist (unless we are art
critics), a subjective evaluation is acceptable. It is also not
a matter of concern that the many viewers of the artwork do not
agree about its quality.
In assessment, we strive to ensure two types of reliability:
inter-rater (raters agree with each other) and intra-rater
(a rater gives the same score to a performance rated on separate
occasions). The higher the stakes, the more reliable (consistent)
judgments must be. Scoring criteria, in the form of rubrics, are
generally used to guide raters to arrive at the same, or nearly
the same, evaluation of a product. Thus, although it is common to
refer to scoring which requires human judgment as subjective evaluation,
in most cases we might place it near the midpoint on our objective-subjective
continuum.
In rated assessments, the scoring criteria form an
integral part of the evaluation. Specialists in language testing
often identify three key components in performance assessment. These
components are:
- Tasks that are effective in eliciting the performance to be
assessed.
- Rating criteria to evaluate the quality of the performance.
The criteria reflect the relative importance of various aspects
of the performance, and are appropriate for the population being
assessed.
- Raters that are trained to apply the criteria and can do so
consistently.
Rating criteria and rater training are the topics of the next pages. |
 |