Shrock and Coscarelli

Testing Human Competence

Testing: Getting Started

Criterion-referenced test development

is the process of systematically establishing objectives–also called competency statements–and then systemtatically creating an assessment that measures the ability of an individual to meet those objectives–rather than comparing an individual against the performance of others.

Criterion-referenced test development versus Norm-referenced test development

There are two major philosophical differences in the interpretation of test scores:  criterion-referenced versus norm-referenced interpretation.

Tests should be developed in order to facilitate either their criterion-referenced or their norm-referenced interpretation.  Basically, norm-referenced tests need to be composed of items that will separate the scores of test-takers from one another.

Norm-Referenced Test Interpretation

A norm-referenced test (NRT) interpretation defines the performance of test-takers in relation to one another. Now, you will rarely have a test that separates everyone quite so perfectly.  Instead, most NRTs will have distributions that look like this:

This is the classic shape of the NRT distribution; you may have heard it called the bell curve or the normal distribution. Norm-referenced tests can be very useful.  Medical schools use the Medical College Aptitude Test (MCAT) to help predict success in medical school. Because of the large number of people applying for medical school and the limited number of openings, medical schools have chosen to use the MCAT as one way of insuring that the best students are admitted.  (As a patient, you would probably prefer to know that only the best students are being admitted.) Norm-referenced tests are ideal for making this kind of selection decision, when we must choose the best test-takers among a group.

Criterion-Referenced Test Interpretation

In contrast to NRTs, the criterion-referenced test (CRT) defines the performance of each test-taker without regard to the performance of others. Unlike the NRT where success is defined in terms of being ahead of someone else, the CRT interpretation defines success as being able to perform a specific task or set of competencies. There is no limit to the number of people who can succeed on a criterion-referenced test, unlike the NRT. Very often the CRT frequency distribution looks like this:


This distribution is often called a mastery curve

The reason that a CRT frequency distribution often looks like the distribution above is that the test items are based on specific competencies, and the instruction that the test-takers receive in anticipation of the test is usually addressed specifically to these competencies.  Therefore, many test-takers do well on the CRT, resulting in a distribution with most test-takers clustered near the high end.


When should you use Criteron-referenced tests

Criterion-referenced tests should be used whenever you are concerned with assessing a person’s ability to demonstrate a specific skill.  The medical boards licensing exam is an example of a test whose philosophy is criterion-referenced.  If you are being operated on, you should want to know that your surgeon is competent to perform the operation, not just that he or she is better than 90% of those who graduated.  The reason is that merely knowing more than the others in the class doesn’t guarantee that your surgeon can perform the operation; maybe nobody in the class mastered the operation.  The danger of NRTs in corporate training situations is that without reference to specific competencies, what test-takers can actually do is unverifiable.