Validity Of Test
Validity tells you how accurately a method measures something.
If a method measures what it claims to measure, and the results closely
correspond to real-world values, then it can be considered valid. There are
four main types of validity:
- Construct
validity: Does the test measure the
concept that it’s intended to measure?
- Content validity:
Is the test fully representative of what it aims to measure?
- Face validity:
Does the content of the test appear to be suitable to its aims?
- Criterion
validity: Do the results
correspond to a different test of the same thing?
Note that this
article deals with types of test validity, which determine the accuracy of the
actual components of a measure. If you are doing experimental research, you
also need to consider internal and external validity, which deal with the experimental
design and the generalizability of results.
What is a construct?
A construct
refers to a concept or characteristic that can’t be directly observed, but can
be measured by observing other indicators that are associated with it.
Constructs can
be characteristics of individuals, such as intelligence, obesity, job
satisfaction, or depression; they can also be broader concepts applied to
organizations or social groups, such as gender equality, corporate social
responsibility, or freedom of speech.
Example
There is no objective,
observable entity called “depression” that we can measure directly. But based
on existing psychological research and theory, we can measure depression based
on a collection of symptoms and indicators, such as low self-confidence and low
energy levels.
What is construct validity?
Construct
validity is about ensuring that the method of measurement matches the construct
you want to measure. If you develop a questionnaire to diagnose depression, you
need to know: does the questionnaire really measure the construct of
depression? Or is it actually measuring the respondent’s mood, self-esteem, or
some other construct?
To achieve
construct validity, you have to ensure that your indicators and measurements
are carefully developed based on relevant existing knowledge. The questionnaire
must include only relevant questions that measure known indicators of
depression.
The other types
of validity described below can all be considered as forms of evidence for
construct validity.
Content validity
Content
validity assesses whether a test is representative of all aspects of the
construct.
To produce
valid results, the content of a test, survey or measurement method must cover
all relevant parts of the subject it aims to measure. If some aspects are
missing from the measurement (or if irrelevant aspects are included), the
validity is threatened.
Example
A mathematics teacher
develops an end-of-semester algebra test for her class. The test should cover
every form of algebra that was taught in the class. If some types of algebra
are left out, then the results may not be an accurate indication of students’
understanding of the subject. Similarly, if she includes questions that are not
related to algebra, the results are no longer a valid measure of algebra
knowledge.
Face validity
Face validity
considers how suitable the content of a test seems to be on the surface. It’s
similar to content validity, but face validity is a more informal and
subjective assessment.
Example
You create a survey to
measure the regularity of people’s dietary habits. You review the survey items,
which ask questions about every meal of the day and snacks eaten in between for
every day of the week. On its surface, the survey seems like a good
representation of what you want to test, so you consider it to have high face
validity.
As face
validity is a subjective measure, it’s often considered the weakest form of
validity. However, it can be useful in the initial stages of developing a
method.
Criterion validity
Criterion
validity evaluates how closely the results of your test correspond to the
results of a different test.
What is a criterion?
The criterion
is an external measurement of the same thing. It is usually an established or
widely-used test that is already considered valid.
What is criterion validity?
To evaluate
criterion validity, you calculate the correlation between the results
of your measurement and the results of the criterion measurement. If there is a
high correlation, this gives a good indication that your test is measuring what
it intends to measure.
Example
A university professor
creates a new test to measure applicants’ English writing ability. To assess
how well the test really does measure students’ writing ability, she finds an
existing test that is considered a valid measurement of English writing
ability, and compares the results when the same group of students take both
tests. If the outcomes are very similar, the new test has a high criterion
validity.