Bài giảng Phương pháp kiểm tra và đánh giá học tập - Chapter 10: Criteria and Test Types

1. Validity

Validity  the extent to which it

measures what it is supposed to

measure & nothing else (content)

Face validity

Content validity

Construct validity

Empirical validity Face validity

 If a test item looks right to other testers,

teachers, moderators & testees 

described as having face validity

 In the past, regarded by test writers simply

as a public relations exercise

 Now, designers of communicative tests:

face validity- the most important of all

types of validity

Download

Trang 1

Trang 2

Trang 3

Trang 4

Trang 5

Trang 6

Trang 7

Trang 8

Trang 9

Trang 10

Tải về để xem bản đầy đủ

24 trang xuanhieu 11800

Download

Bạn đang xem 10 trang mẫu của tài liệu "Bài giảng Phương pháp kiểm tra và đánh giá học tập - Chapter 10: Criteria and Test Types", để tải tài liệu gốc về máy hãy click vào nút Download ở trên

Tóm tắt nội dung tài liệu: Bài giảng Phương pháp kiểm tra và đánh giá học tập - Chapter 10: Criteria and Test Types

Chapter 10:
Criteria and Test Types
A. Criteria
1. Validity
2. Reliability
3. Discrimination
4. Administration
5. Test instructions to candidates
6. Backwash effects
1. Validity
Validity the extent to which it 
measures what it is supposed to 
measure & nothing else (content) 
Face validity
Content validity
Construct validity
Empirical validity
 Face validity
 If a test item looks right to other testers, 
teachers, moderators & testees 
described as having face validity
 In the past, regarded by test writers simply 
as a public relations exercise
 Now, designers of communicative tests: 
face validity- the most important of all 
types of validity 
 Content validity
 Depending on a careful analysis of the 
language being tested & of the 
particular course objective
 When constructing tests, writers should 
first draw up a table of test 
specifications (language skills, areas 
included) 
 Construct validity
 A test having construct validity is capable 
of measuring specific characteristics in 
accordance with a theory of language 
behavior and learning 
 For example, a test consisting of multiple 
choice items will lack construct validity if 
the communicative approach is adopted 
during the language course 
 Empirical /statistical validity 
This kind of validity obtained as a result of 
comparing the results of the test with the results 
of some criterion measure such as:
 An existing test, known to be valid and given at 
the same time
 The teacher’s ratings or any other such form of 
independent assessment given at the same time
 Empirical /statistical validity
 The subsequent (later) performance of the 
testees on a certain task measured by some 
valid test
 The teacher’s ratings or any other such form 
of independent assessment given later 
Summary (Validity)
 The test situation
 The technique used
 important factor in determining the 
overall validity of any test
2. Reliability (definitions)
 A test administrated to the same candidates on 
different occasions produces the same 
results reliable 
 Reliability denotes the extent to which the same 
marks /grades awarded if the same test 
papers marked by
(i) 2 or more ≠ examiners
(ii) the same examiner on ≠ occasions
2. Reliability (affecting factors)
 Reliability affected by the size of the sample & 
the administration of the test 
 Other factors:
(1) test instructions (rubrics)
(2) personal factors like motivation & illness
(3) scoring of the test (the most important factor-
objective tests overcome this problem of marker 
reliability) 
2. Reliability (measuring methods)
(1) Re-administering the same test (the 
same group of candidates) after a lapse 
time
(2) Administering parallel forms of the test 
to the same group (tests must be identical 
in the nature of sampling, difficulty, length 
& rubrics). If the correlation between 2 
tests is high, the test can be termed 
reliable. 
3. Reliability versus Validity
 2 chief criteria for evaluating any test ( 
an ideal test should be valid & reliable)
 The greater the reliability of a test, the 
less validity it usually has.
4. Discrimination
 An important feature of a test is its capacity:
(1) To discriminate among ≠ candidates
(2) To reflect the differences in the 
performances of individuals in a group 
 The extent of the need to discriminate will 
vary depending on the purpose of the test
5. Administration/Practicality
 A test must be practicable, i.e. fairly straight 
forward to administrate or able to administrate (the 
length of time for administrating, collecting answer 
sheets, reading instructions).
 Another practical consideration concerns the 
answer sheets and the stationery used. 
6. Test instructions to the candidates
 All instructions are clearly written.
 Samples are given.
 Grammatical terminology should be 
avoided.
7. Backwash effects
 Def.: the influences of testing on teaching & 
learning
 Positive backwash effect (reading tests 
development of reading skills) 
 Negative backwash effect (objective tests 
reducing learners’ motivation
 Implications: influences of tests on the 
compilation of syllabus & language teaching 
programmes 
B. Types of tests
1. Achievement /attainment tests
2. Proficiency tests
3. Aptitude tests
4. Diagnostic tests
1. Achievement /attainment tests
Class progress tests, the most widely 
used types of tests
Achievement tests, formal tests 
 Class progress tests
 Designed to measure the extent to which Ss 
have mastered the material taught in the 
classroom, allowing Ss to show what they 
have mastered
 Used as a teaching device: backwash effects 
on teaching & motivation 
 Good tests encouraging Ss to perform well 
& gain confidence 
 Achievement tests
 Intended to measure achievement on a large 
scale, to show mastery of a particular syllabus
 Standardized tests: pre-tested, items are 
analysed & revised where necessary
 A good achievement test should reflect the 
particular approach to learning & teaching 
adopted
2. Proficiency tests
 Defining a student’s language proficiency 
with reference to a particular task which 
he/she will be required to perform (TOEFL, 
TOEIC) 
 In no way related to any syllabus or teaching 
programme
3. Aptitude tests
 Designed to measure the Ss’ probable 
performance in a foreign language which 
he/she has not started to learn
 Generally, seeking to predict Ss’ probable 
strengths & weaknesses in learning a foreign 
language by measuring performance in an 
artificial language 
4. Diagnostic tests
 Achievement & proficiency tests: frequently 
used for diagnostic purposes such as 
diagnosing areas of difficulty Ss may have so 
that appropriate remedial action can be taken 
later.
 Diagnostic testing: frequently carried out for 
groups of Ss rather than for individuals

File đính kèm:

bai_giang_phuong_phap_kiem_tra_va_danh_gia_hoc_tap_chapter_1.pdf