Test (student assessment)

In education, certification, counselling, and many other fields, a test or exam (short for examination) is a tool or technique intended to measure students' expression of knowledge, skills and/or abilities. A test has more questions of greater difficulty and requires more time for completion than a quiz. It is usually divided into two or more sections, each covering a different area of the domain or taking a different approach to assessing the same aspects.

A standardized test is one that compares the performance of every individual subject with a norm. The norm may be established independently, or by statistical analysis of a large number of subjects.


Types of questions

Multiple-choice questions

For a multiple-choice question, the author of the test provides several possible answers (usually four or five) from which the test subjects must choose. There is only one right answer, usually represented by only one answer option, though sometimes divided into two or more, all of which subjects must identify correctly. Such a question may look like this:

  The number of planets in the solar system is:
    a) 7    b) 8    c) 9    d) 10

If subjects make an error and produce an answer that is not an option, they will see that they are wrong. As a result, they will probably continue working until an allowable answer is achieved. If subject fail to produce an allowable answer after an inordinate amount of time, they will probably suffer from dramatically increased anxiety, leading to a cascade of further errors not representative of their actual level of knowledge.

To prevent this phenomenon, the test author should create the incorrect answer options so that they correspond to the likeliest subject errors. This process is very problematic, and only the best teachers are able to do it well. As a result, proper multiple-choice questions requiring more than a little thought are extremely difficult to write. This question type is used primarily for testing factual knowledge.

On the other hand, multiple-choice questions are remarkably easy to score. They have proliferated in recent years, due to the development of machines that can score large numbers of them with minimal human effort.

Free-response questions

Free-response questions require subjects to write. The length of the written response may be as short as a single word or mathematical expression, in which case the question acquires some of the characteristics of the multiple-choice type. However, at higher levels of education, this type of question usually requires deeper, more analytical thinking. The most difficult free-response questions may involve an essay or original composition of a page or more in length, or a scientific proof or solution requiring over an hour.

Free-response questions do not pose as much of a challenge to the test author, but evaluating the responses is a different story. Scoring may be done according to superficial qualities of the response, such as the presence of certain important terms. In this case, it is easy for test subjects to fool scorers by writing a stream of generalizations, non sequiturs and sheer nonsense that incorporates the terms that the scorers are looking for. Proper scoring involves reading the answer carefully and looking for clarity and logic. If scorers must score a large number of tests, this becomes very tiresome, especially since they usually know the material at a much higher level than that expected of the subject.

Practical examination

Knowledge of how to do something does not lend itself well to either free-response or multiple-choice questions. It may only be demonstrated outright. Art, music and language fall into this category, as do non-academic disciplines such as sports and driving. Students of engineering are often required to present an original design or computer program developed over the course of days or even months.

A practical examination may be administered by an examiner in person, in which case it may be called an audition or a tryout, or by means of an audio or video recording. It may be administered on its own, or in combination with other types of questions; for instance, many driving tests in the United States include a practical examination as well as a multiple-choice section regarding traffic laws.

Tests of the natural sciences may include laboratory experiments (practica) to make sure that the student has learned not only the body of knowledge comprising the science, but also the experimental methods through which it has been developed.

The flaws and politics testing

A primary and growingly weighty criticism, expecially by university professors about standard tests taken by college applicants, is the poor correlation between test scores and what they are intended or interpreted to predict. Nearly identical to a fitness function, an exam is a means to quantify the ability of a test-taker with respect to some success predicate. After years of analysis, many exams were found to not correlate with what they claimed. This has caused the SAT to be renamed from the Scholastic Aptitude Test to the Scholastic Assessment Test. Still, new evidence shows that SAT scores of 11th and 12th graders do not correlate highly with freshman year grades and correlate very poorly with overall undergraduate ranking — this has caused pressure for ETS to re-evaluate their exams before universities start requiring applicants to provide exam scores for ACT, an exam which also does not correlate very well with freshmen GPA but does correlate better than the SAT. Reasons for poor correlation are as follows:

  • Questions on the exam may be improperly weighting the types of problems encountered within the environment the exam intends to predict. Improper weights arise from the problem of reducing a multi-dimensional score into an orderable value. An example of improperly weighting would be for an exam to have the ratio of questions in geometry, calculus, and number theory dissimilar to the ratio of these questions present in the environment for which the exam is predicting. More egregiously, a mathematics exam may ask solely about the names, birthdates, and country of origin of various mathematicians when such knowledge is of little importance in a mathematics curriculum.
  • People are variously susceptible to stress. Some are virtually unaffected, and excel on tests, while others become very nervous and forget entire tracts of exam material. To compensate for this, some teachers and professors don't grade their students on tests alone, placing considerable weight on homework, attendance, in-class discussion activity, and laboratory investigations (where applicable). However, some teachers will actually give students tests that contain problems that are harder than usual because they know they can. Large-scale standardized tests can usually be taken more than once; individuals who make decisions based on standardized-test scores generally consider a student's best score to be the truest one.
  • Students have created through academic dishonesty (cheating) a formidable arsenal of strategies to garner test scores that do not represent their actual level of knowledge. On a multiple-choice test, lists of answers may be obtained beforehand. On a free-response test, the questions may be obtained beforehand, or the subject may write an answer that creates the illusion of knowledge. Cheating makes tests unreliable at best, and absolutely useless at worst. Some teachers argue that making rules stricter on tests will stop cheating; however, this may actually increase cheating, because the more pressure there is on students, the higher the chances are they will cheat. Instead, many teachers argue that there should be no tests at all, as this will eliminate any possibility of dishonesty.

In their defense, tests are less susceptible to cheating than other tools of learning evaluation. Laboratory results can be fabricated, and homework can be done by one student and copied by rote by others. The presence of a responsible test administrator, in a controlled environment, ensures that those who cheat on tests have at least some chance of being discovered and punished.

Additionally, in some cases, high-stakes testing induces examinees to rise to meet the exam's high expectations. It is common for high-stakes tests to be more important than other tests, and for tests to be classified by level of importance. Midterm tests and final tests are examples of tests being much more important than all other tests. These tests have been criticized for not evaluating a student's knowlege correctly and weighting too much on the student's average grade. There are also many more problems with these types of tests, including the fact that any errors or problems with the test being taken will be unfairly magnified so that it will impact a student's average grade, despite the fact that the error made may not have been the student's fault. For many, tests like midterms and finals are symbols of conformism for many reasons.

Here are a few cons that testing has been associated with:

  • Student stress and nervousness that he/she might fail
  • Student disapproval
  • Lack of freedom
  • Increased pressure to cheat
  • Inflexible time limits, which can cause students to unfairly get low grades because they didn't finish on time
  • The fact that testing isn't teaching
  • The fact that many tests do not accurately reflect a student's knowledge based on hundreds of factors
  • Students treated as "numbers" based on their test scores, rather then who he/she is as a person
  • Non-creativeness and rigidity
  • Increased school uniformity
  • The "mandatory" aspect of testing
  • racially, socially, or regionally biased results
  • Lack of student initiative, lack of the ability to formulate questions, and lack of students to engage in deep thinking
  • Lower grades due to denigration of teachers
  • Information on tests that was never taught to the students
  • Student failure of a test blamed on the student, not the instructor
  • Tests weighting on a student's average grade more than they should
  • The cost of testing (money which could be put into other things)
  • Grading errors
  • Low test scores may obstruct a student's path to achievement instead of making it possible to take a student's difficulty learning seriously
  • The penalization of students who do badly on tests
  • Lack of leveled playing field
  • Non-standardized tests, in which a teacher can create anything he/she wants and at any difficulty level
  • Testing's effect on students (i.e. students having to study for hours with lack of free time and possible time crunch)
  • Students cramming information into their minds in order to pass a test, and then forgetting everything they learned.
  • Internet Testing, which is often criticized because of the large number of technical problems that can occur
  • If a student isn't prepared for a test, he/she will be forced to take it anyway and get a failing grade
  • Often, a teacher will refuse to let a student retake a test

Despite the problems with testing, little has been done to prevent them, especially in schools that are rigid and unwilling to change their rules. Many have suggested banning testing, but most schools with a rigid attitude will not have open minds about this idea. Schools that do not ban testing are considered to be rigid schools that "play by the book" and ignore not only the complaints from students and teachers, but also ignore all of the problems associated with testing (in the list above). Basically, the people who don't believe in tests don't have a lot of control because most of the control is held by people who believe int testing.

The SAT and other high-stakes exams

In the United States and other countries, tests based primarily on multiple-choice questions have come to be used for assessments of great importance, with consequences including the funding levels of public schools and the admission of students to institutions of higher education. The most important such test in the U.S. is the SAT, which consists almost entirely of multiple-choice questions (though some of these are specifically designed to inherent inaccuracies of that question type). Originally developed as a test of a student's intrinsic intelligence, its methodology has proven highly vulnerable to specialized test-preparation programs that dramatically improve the subject's score. The SAT is written and administered by the College Board.

The SAT has also been criticized for an alleged racial bias; ethnic minorities supposedly fare worse on the exam than they should. As a result, it began to fall out of favor in the late 1990s, with increasing emphasis on standardized tests that measure actual knowledge. Some of these replacements have likewise come from the College Board, but many states have taken the initiative to design tests of their own. The ACT examination, introduced in 1959 as a competitor to the SAT, also features more knowledge-based questions, and is accepted as an alternative to the SAT for admission to many United States colleges.

Even past college there are other high-stakes exams, like; Fundamentals of Engineering exam administered by National Council of Examiners for Engineering and Surveying (NCEES).

Airasian, P. (1994) "Classroom Assessment," Second Edition, NY" McGraw-Hill.

Cangelosi, J. (1990) "Designing Tests for Evaluating Student Achievement." NY: Addison-Wesley.

Grunlund, N (1993) "How to make achievement tests and assessments," 5th edition, NY: Allyn and Bacon.

Haladyna, T.M. & Downing, S.M. (1989) Validity of a Taxonomy of Multiple-Choice Item-Writing Rules. "Applied Measurement in Education," 2(1), 51-78.

