Constructing rating scales for second language tests

Title	:	Constructing rating scales for second language tests
Authors	:	John A. Upshur and Carolyn E. Turner
	:	ELT Journal Volume 49/1 January 1995

Abstract:

Second language testing increasingly uses rated tasks in place of objectively scored items. Rating language performance is more demanding than scoring discrete-point tests. Lower reliability and validity for ratings are to be expected, especially in instructional settings where differences among students are small, and training for the use of rating scales is infrequent. The need for rating scales has not always led to effective and efficient scales, however.

Standard rating scales require the matching of examinee performance to a verbal description. An alternative type of scale consists of an empirically derived, ordered set of binary questions relating to boundaries between levels on the performance being evaluated. Rating depends on a series of decisions. Two scales of this type were developed to assess the grammatical accuracy and communicative effectiveness of a story-retell task. All scale categories were used, thus demonstrating that category range was appropriate to the learners. Reasons are offered to explain why this type of scale may ameliorate reliability and validity problems associated with standard rating scales.

Download: Constructing rating scales for second language tests [PDF]