Glossary of Terms: Testing & Validation

A resource for HR Professionals utilizing pre-employment testing for hiring in their organization:

 Download the Testing & Validation Glossary of Terms (PDF)

Adverse Impact: A substantially different rate of selection in hiring, promotion, or other employment decision that works to the disadvantage of members of a race, sex, or ethnic group.

Angoff Ratings: Ratings that are provided by SMEs on the percentage of minimally qualified applicants they expect to answer the test item correctly. These ratings are averaged into a score called the “unmodified Angoff score” (also referred to as a “Critical Score”).

Critical Score: The score level of the test that was set by averaging the Angoff ratings that are provided by SMEs on the percentage of minimally qualified applicants they expect to answer the test items correctly.

Cutoff Score: The final pass/fail score set for the test (set by reducing the Critical Score by 1, 2, or 3 CSEMs).

CSEM: Conditional Standard Error of Measurement. The SEM at a particular score level in the score distribution (see SEM definition below).

DCR: Decision Consistency Reliability. A type of test reliability that estimates how consistently the test classifies “masters” and “non-masters” or those who pass the test versus fail.

DIF: Differential Item Functioning. A statistical analysis that identifies test items where a focal group (usually a minority group or women) scores lower than the majority group (usually whites or men), after matching the two groups on overall test score. DIF items are therefore potentially bias or unfair.

ETS: A person’s true score is defined as the expected number-correct score over an infinite number of independent administrations of the test

Item Difficulty Values: The percentage of all test takers who answered the item correctly.

Job Analysis: A document created by surveying SMEs that includes job duties (with relevant ratings such as frequency, importance, and performance differentiating), KSAPCs (with ratings such as frequency, importance, performance differentiating, and duty linkages), and other relevant information about the job (such as supervisory characteristics, licensing and certification requirements, etc.).

Job Duties: Statements of “tasks” or “work behaviors” that describe discreet aspects of work performance. Job duties typically start with an action word (e.g., drive, collate, complete, analyze, etc.) and include relevant “work products” or outcomes.

KSAPCs: Knowledges, skills, abilities, and personal characteristics. Job knowledges refer to bodies of information applied directly to the performance of a work function; skills refer to an observable competence to perform a learned psychomotor act (e.g., keyboarding is a skill because it can be observed and requires a learned process to perform); abilities refer to a present competence to perform an observable behavior or a behavior which results in an observable product (see the Uniform Guidelines, Definitions). Personal characteristics typically refer to traits or characteristics that may be more abstract in nature, but include “operational definitions” that specifically tie them into observable aspects of the job. For example, dependability is a personal characteristic (not a knowledge, skill, or ability), but can be included in a job analysis if it is defined in terms ofobservable aspects of job behavior. For example: “Dependability sufficient to show up for work on time, complete tasks in a timely manner, notify supervisory staff is delays are expected, and regularly complete critical work functions.”

Outlier: A statistical term used to define a rating, score, or some other measure that is outside the normal range of other similar ratings or scores. Several techniques are available for identifying outliers.

Point Biserial: A statistical correlation between a test item (in the form of a 0 for incorrect and 1 for correct) and the overall test score (in raw points). Items with negative point biserials are inversely related to higher test scores, which indicates that they are negatively impacting test reliability; positive point biserials are contributing to test reliability in various levels.

Reliability: The consistency of the test as a whole. Tests that have high reliability are consistent internally because the items are measuring a similar trait in a way that holds together between items. Tests that have low reliability include items that are pulling away statistically from other items either because they are poor items for the trait of interest, or they are good items that are measuring a different trait.

SEM: Standard Error of Measurement. A statistic that represents the likely range of a test taker’s “true score” (or speculated “real ability level”) from any given score. For example, if the test’s SEM is 3 and an applicant obtained a raw score of 60, his or her true score (with 68% likelihood) is between 57 and 63, between 54 and 66 (with 95% likelihood), and between 51 and 69 (with 99% likelihood). Because test takers have “good days” and “bad days” when taking tests, this statistic is useful for adjusting the test cutoff to account for such differences that may be unrelated to a test taker’s actual ability level.

SME: Subject-matter expert. A job incumbent who has been selected for providing input on the job analysis or test validation process. SMEs should have at least one-year on-the-job experience and not be on probationary or “light/modified duty” status. Supervisors and trainers can also serve as SMEs, provided that they know how to perform the target job.

Additional Resource:

Pre-Employment Testing Software Provides Defensibility (video)

Employment Test Validation

According to the federal government every test or assessment device used to screen potential employees must abide by the Uniform Guidelines for Employee Selection Procedure (1978), the Americans with Disabilities Act (1990), the Civil Rights Act of 1964 and 1991, and many other laws, regulations and/or professional guidelines.

Every selection device created by Biddle Consulting Group, Inc.  including the OPAC Office Skills Testing Software addresses all of these applicable laws and regulations. You should also check with your organization’s legal advisor to make certain that local and/or state regulations in your municipality are also addressed. If there are any doubts, Biddle Consulting Group, Inc. is available to work with organizations to assure them that OPAC addresses their local and/or state regulations.

Validating Your Employment TestOPAC and Test Validation

OPAC tests are created by industrial and organizational psychologists based upon information gathered from subject matter experts from around the nation. Most tests in OPAC are designed to mimic the tasks performed on the job, such as being able to use Microsoft Office programs to enter data, edit documents, manipulate spreadsheets, and create presentations. Those tests that do not mimic job tasks require the test taker to demonstrate the underlying knowledge, skill, and ability to succeed on the job. To help the employer address the validation requirements of the federal Uniform Guidelines on Employee Selection Procedures (1978) the OPAC System contains a built-in Validation Wizard feature. The feature was designed to help non-Equal Employment methodology experts validate any OPAC test. The Validation Wizard gives employers the capability to run a basic content-validation study and develop the most appropriate cutoff scores for the job as it is performed at your organization.

Test Validation Guide

According to the United States Department of Labor all tests should be evaluated based on technical information such as:

Test reliability. A good test manual should provide detailed information on the types of reliabilities reported, how reliability studies were conducted, and the size and nature of the sample used to develop the reliability coefficients. Independent reviews also should be consulted.

Test validity. A good test manual will contain clear and complete information on the valid uses of the test, including how validation studies were conducted, and the size and characteristics of the validation samples. Independent test reviews will let you know whether the sample size was sufficient, whether statistical procedures were appropriate, and whether the test meets professional standards.

Test fairness. Read the manual and independent reviews of the test to evaluate its fairness to these groups. To secure acceptance by all test takers, the test should also appear to be fair. The test items should not reflect racial, cultural, or gender stereotypes, or overemphasize one culture over another. The rules for test administration and scoring should be clear and uniform. Does the manual indicate any modifications that are possible and may be needed to test individuals with disabilities?

Potential for adverse impact. The manual and independent reviews should help you to evaluate whether the test you are considering has the potential for causing adverse impact. As discussed earlier, mental and physical ability tests have the potential for causing substantial adverse impact. However, they can be an important part of your assessment program. If these tests are used in combination with other employment tests and procedures, you will be able to obtain a better picture of an individual’s job potential and reduce the effect of average score differences between groups on one test.

If you have any questions about Test Validation and the OPAC Office Skills Software,
please contact us today!