22-Computerized-Adaptive-Testing.Rmd

# Computers and Adaptive Testing {#cat}

## Computer-Administered/Online Assessment {#computerAssessment}

Computer-administered and online assessments have the potential to be both desirable and dangerous [@Buchanan2002].\index{computerized assessment}\index{online assessment!zzzzz@\igobble|seealso{computerized assessment}}
The potential is to have a mental health screening instrument online that a person fills out, is automatically scored, and automatic feedback is provided with suggestions to follow a particular course of action.\index{computerized assessment}
This would save the clinician time and could lead to a more productive therapy session.\index{computerized assessment}
As an example, I have clients complete a brief measure of behavioral issues on a tablet when they arrive and are waiting in the waiting room.\index{computerized assessment}
This is an example of measurement-based care, in which the participant's treatment response is assessed throughout treatment to help determine whether treatment is working or not, and whether to try something different.\index{computerized assessment}\index{measurement-based care}
In addition, emerging techniques in machine learning may be useful for scoring of high-dimensional data in assessment in clinical practice [@GalatzerLevy2023].\index{machine learning}

### Advantages {#computerAdvantages}

There are many advantages of computer-administered or online assessment:\index{computerized assessment}

- Computer-administered and online assessment requires less time on the part of the clinician and administrators than traditional assessment methods.\index{computerized assessment}
They do not need the clinician to administer and score the assessment, and scores can be entered directly into the client's chart.\index{computerized assessment}
- Computer-administered and online assessments tend to be less costly than traditional assessment methods.\index{computerized assessment}
- Computer-administered and online assessments can increase access to services because some people can complete the assessment who otherwise would not be able to complete it.
For example, some people may be restricted by geographic or financial circumstances.\index{computerized assessment}
You can provide the questionnaire in multiple languages and modalities, including written, spoken, and pictorial.\index{computerized assessment}\index{questionnaire}
- People may disclose more information to a computer than to a person.\index{computerized assessment}
The degree of one's perceived anonymity matters.\index{computerized assessment}
Perceived anonymity may be especially important for information on sensitive yet important issues, such as suicidality, homicidality, abuse, neglect, criminal behavior, etc.\index{computerized assessment}
- Computerized and online assessment has the potential to be more structured and the potential to do a more comprehensive and accurate assessment than an in-person interview, which tends to be less structured and more [biased](#bias) by clinical judgment [@Garb2007].\index{computerized assessment}\index{structured!administration}\index{bias}\index{clinical judgment}
- Computerized and online assessment tends to provide more comprehensive information than is usually collected in practice.\index{computerized assessment}
Structured approaches tend to ask more questions.\index{computerized assessment}\index{structured!administration}
Computers can increase the likelihood that the client answers all questions, because it can prevent clients from skipping questions.\index{computerized assessment}
Computers can apply "branching logic" to adapt which questions are asked based on the participant's responses (i.e., [adaptive testing](#adaptiveTesting)) to more efficiently gain greater coverage.\index{computerized assessment}\index{adaptive testing}
Clinicians tend not to ask questions about important issues.\index{computerized assessment}
Clinicians fail to examine disorders other than the hypothesized ones, which is known as diagnostic overshadowing due to the confirmation [bias](#bias) of looking for evidence-confirming information rather than disconfirming evidence.\index{computerized assessment}\index{bias!confirmatory}
In general, computers give more diagnoses than clinicians.\index{computerized assessment}
Clinicians often miss co-occurring disorders.\index{clinical judgment}
Doing a more comprehensive assessment normalizes that people can and do experience the behavioral issues that they are asked about, and can increase self-disclosure.\index{computerized assessment}
Greater structure leads to higher [inter-rater reliability](#interrater-reliability), less [bias](#bias), and greater [validity](#validity).\index{computerized assessment}\index{reliability!inter-rater}\index{bias}\index{validity}
- Computerized and online assessment lends itself to continuous monitoring during treatment for measurement-based care.\index{computerized assessment}\index{measurement-based care}
Measurement-based care has been shown to be related to better treatment outcomes.\index{computerized assessment}\index{measurement-based care}
Computerized or online assessment may be important in risk assessment that might otherwise go unrecognized.\index{computerized assessment}
[Treatment utility](#treatmentUtility) of computerized assessment has been demonstrated for suicide and substance use.\index{computerized assessment}\index{validity!utility}

### Validity Challenges {#computerValidity}

There are a number of challenges to the [validity](#validity) of computer-administered and online assessment:\index{computerized assessment!validity challenges}

- The impersonality of computers; however, this does not seem to be a strong threat.\index{computerized assessment!validity challenges}
- Lack of control in the testing situation.\index{computerized assessment!validity challenges}
- The possibility that extraneous factors can influence responses.\index{computerized assessment!validity challenges}
- Language and cultural differences; it is difficult to ask follow-up questions to assess understanding unless a clinician is present.\index{computerized assessment!validity challenges}
- Inability to observe the client's body language or nonverbals.\index{computerized assessment!validity challenges}
- Computers are not a great medium to assess constructs that could be affected by computer-related anxiety.\index{computerized assessment!validity challenges}
- Some constructs show different ratings online versus in-person.\index{computerized assessment!validity challenges}
- Computer-administered and online assessment may require different [norms](#norm) and cutoffs than paper-and-pencil assessment.\index{computerized assessment!validity challenges}\index{norm}
Different [norms](#norm) could be necessary, for example, due to increased self-disclosure and higher ratings of negative affect in computer-administered assessment compared to in-person assessment.\index{computerized assessment!validity challenges}\index{norm}
- Careless and fraudulent responding can be common when recruiting an online sample to complete a survey [@Chandler2020].\index{computerized assessment!validity challenges}
Additional steps may need to be taken to ensure high-quality responses such as CAPTCHA (or reCAPTCHA), internet protocol (IP) verification of location, and attention or [validity](#validity) checks.\index{computerized assessment!validity challenges}

In general, online assessments of personality seem generally comparable to their paper-and-pencil versions, but their psychometric properties are not identical.\index{computerized assessment!validity challenges}
In general, computer-administered and online assessments yield the same factor structure as in-person assessments, but particular items may not function in the same way across computer versus in-person assessments.\index{computerized assessment!validity challenges}

### Ethical Challenges {#computerEthics}

There are important [ethical](#ethics) challenges of using computer-administered and online assessments:\index{computerized assessment!ethical challenges}

- The internet and app stores provide a medium for the dissemination and proliferation of quackery.\index{computerized assessment!ethical challenges}
Businesses sell access to assessment information despite no evidence of the [reliability](#reliability) or [validity](#validity) of the measure.\index{computerized assessment!ethical challenges}\index{reliability}\index{validity}
- When giving potentially distressing or threatening feedback, it is important to provide opportunities for follow-up sessions or counseling.\index{computerized assessment!ethical challenges}
When using online assessment in research, you can state that release of test scores is not possible, and that participants should consult a professional if they are worried or would like a professional assessment.\index{computerized assessment!ethical challenges}
- An additional [ethical](#ethics) challenge of computer-administered and online assessment deals with test security.\index{computerized assessment!ethical challenges}
The [validity](#validity) of many tests is based on the assumption that the client is not knowledgeable about the test materials.\index{computerized assessment!ethical challenges}
Detailed information on many protected tests is available on the internet [@Ruiz2002], and it could be used by malingerers to fake health or illness.\index{computerized assessment!ethical challenges}\index{malingering}
- Data security is another potential challenge of computer-administered and online assessments, including how to protect HIPAA compliance in the clinical context.\index{computerized assessment!ethical challenges}

### Best Practices {#computerBestPractices}

Below are best practices for computer-administered and online assessments:\index{computerized assessment!best practices}

- Only use measures with established [reliability](#reliability) and [validity](#validity), unless you are studying them, in which case you should evaluate their psychometrics.\index{computerized assessment!best practices}\index{reliability}\index{validity}
- Have a trained clinician review the results of the assessment.\index{computerized assessment!best practices}
- Combine computer-administered assessment with clinical judgment.\index{computerized assessment!best practices}\index{clinical judgment}
Clients can be expected to make errors completing the assessment, so ask necessary follow-up questions after getting results from computer-administered assessments.\index{computerized assessment!best practices}
Asking follow-up questions can help avoid [false positive](#falsePositive) errors in diagnosis.\index{computerized assessment!best practices}\index{false positive}
Check whether a client mistakenly reported something; or whether they under-/over-reported psychopathology.\index{computerized assessment!best practices}
- Provide opportunities for follow-up sessions or counseling if distressing or threatening feedback is provided.\index{computerized assessment!best practices}
- It may be important to correct for a person's level of experience with computers when using computerized tasks for clinical decision-making [@LeeMeeuwKjoeInPress].\index{computerized assessment!best practices}

## Adaptive Testing {#adaptiveTesting}

Adaptive testing involves having the respondent complete only those items that are needed to answer an assessment question.\index{adaptive testing}
It involves changing which items are administered based on responses to previous items, and administering only a subset of the possible items.\index{adaptive testing}
Adaptive testing is commonly used in [intelligence testing](#intelligence), [achievement testing](#achievementAptitude), and [aptitude testing](#achievementAptitude).\index{adaptive testing}\index{intelligence!testing}\index{achievement!testing}\index{aptitude!testing}
Adaptive testing is not yet commonly used for [personality](#objective-personality) or psychopathology assessment.\index{adaptive testing}\index{personality assessment}\index{diagnosis}\index{psychopathology!assessment of}
Most approaches to assessment of [personality](#objective-personality) and psychopathology rely on conventional testing approaches using [classical test theory](#ctt).\index{adaptive testing}\index{personality assessment}\index{psychopathology!assessment of}\index{diagnosis}\index{classical test theory}
But many [structured clinical interviews](#structuredInterview), use decision rules to skip a diagnosis, module, or section if a decision rule is not met.\index{adaptive testing}\index{interview!structured}\index{Structured Clinical Interview for DSM}\index{adaptive testing!manual administration!skip rules}
Adaptive testing has also been used with observational assessments [e.g., @GranziolInPress].

A goal of adaptive testing is to get the most accurate estimate of a person's level on a construct with the fewest items possible.\index{adaptive testing}
Ideally, you would get similar results between adaptive testing and conventional testing that uses all items, when adaptive testing is done well.\index{adaptive testing}

There are multiple approaches to adaptive testing.\index{adaptive testing}
Broadly, one class of approaches uses manual administration, whereas another class of approaches uses computerized adaptive testing (CAT) based on [item response theory](#irt) (IRT).\index{adaptive testing}\index{item response theory}\index{adaptive testing!manual administration}\index{adaptive testing!computerized}

### Manual Administration of Adaptive Testing (Adaptive Testing without IRT) {#adaptiveManual}

Manual administration of adaptive testing involves moving people up and down in the item difficulty according to their responses.\index{adaptive testing!manual administration}
Approaches to manual administration of adaptive testing include using skip rules, basal and ceiling criteria, and the countdown method.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!skip rules}\index{basal and ceiling criteria}\index{adaptive testing!manual administration!countdown method}

#### Skip Rules {#skipRules}

Many [structured clinical interviews](#structuredInterview), such as the Structured Clinical Interview for DSM Disorders (SCID) and Mini-International Neuropsychiatric Interview (MINI), use decision rules to skip a diagnosis, module, or section if a decision rule is not met.\index{adaptive testing}\index{interview!structured}\index{Structured Clinical Interview for DSM}\index{adaptive testing!manual administration!skip rules}
The skip rules allow the interviews to be more efficient and therefore clinically practical.\index{adaptive testing!manual administration!skip rules}

#### Basal and Ceiling Approach {#basalCeiling}

One approach to manual administration of adaptive testing involves establishing a respondent's basal and ceiling level.\index{adaptive testing!manual administration}\index{basal and ceiling criteria}
The basal level is the [difficulty](#itemDifficulty) level at which the examinee answers almost all items correctly.\index{adaptive testing!manual administration}\index{basal and ceiling criteria}\index{item response theory!item difficulty}
The respondent's ceiling level is the [difficulty](#itemDifficulty) level at which the examinee answers almost all items incorrectly.\index{adaptive testing!manual administration}\index{basal and ceiling criteria}\index{item response theory!item difficulty}
The basal and ceiling criteria set the rules for establishing the basal and ceiling level for each respondent and for when to terminate testing.\index{adaptive testing!manual administration}\index{basal and ceiling criteria}

The goal of adaptive testing is to administer the fewest items necessary to get an accurate estimate of the person's ability, to save time and prevent the respondent from becoming bored from too-easy items or frustrated from too-difficult items.\index{adaptive testing!manual administration}\index{basal and ceiling criteria}\index{item response theory!item difficulty}
The basal and ceiling approach starts testing at the recommended starting point for a person's age or ability level.\index{adaptive testing!manual administration}\index{basal and ceiling criteria}
The examiner scores as items are administered.\index{adaptive testing!manual administration}\index{basal and ceiling criteria}
If the person gets too many items wrong in the beginning, the examiner moves to an earlier starting point, i.e., provides easier items, until the respondent gets most items correct, which establishes their basal level.\index{adaptive testing!manual administration}\index{basal and ceiling criteria}
Then, after establishing the respondent's basal level, the examiner proceeds forward to progressively more difficult items until they reach their ceiling level, i.e., they get too many wrong in some set, which establishes their ceiling level.\index{adaptive testing!manual administration}\index{basal and ceiling criteria}

#### Countdown Approach {#adaptiveCountdown}

The countdown approach of adaptive testing is a variant of the variable termination criterion approach to adaptive testing.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!countdown method}
It classifies the respondent into one of two groups—elevated or not elevated—based on whether they exceed the cutoff criterion on a given scale.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!countdown method}
The cutoff criterion is usually the raw score on the scale that corresponds to a clinical elevation.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!countdown method}

Two countdown approaches include (1) the classification method and (2) the full scores on elevated scales (FSES) method.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!countdown method}
The countdown approach to adaptive testing is described by @Forbey2007.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!countdown method}

##### Classification Method {#classificationMethod}

Using the classification method to the countdown approach of adaptive testing, the examiner stops administering items once elevation is either ruled in or ruled out.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!countdown method}
The classification method only tells you *whether* a client produced an elevated score on the scale.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!countdown method}
It does not tell you their actual score on that scale.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!countdown method}

##### Full Scores on Elevated Scales Method {#fses}

Using the full scores on elevated scales method to the countdown approach of adaptive testing, the examiner stops administering items only if elevation is ruled out.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!countdown method}
The approach generates a score on that scale only for people who produced an elevated score.\index{adaptive testing!manual administration}\index{adaptive testing!manual administration!countdown method}

#### Summary {#manualAdaptiveTestingSummary}

When doing manual administration with clinical scales, you see the greatest item savings when ordering items from least to most frequently endorsed (i.e., from most to least [difficulty](#itemDifficulty)) so you can rule people out faster.\index{adaptive testing!manual administration}\index{item response theory!item difficulty}
Ordering items in this way results in a 20–30% item savings on the [Minnesota Multiphasic Personality Inventory](#mmpi) (MMPI), and corresponding time savings.\index{adaptive testing!manual administration}\index{Minnesota Multiphasic Personality Inventory}
Comparative studies using the [MMPI](#mmpi) tend to show comparable results (i.e., similar [validity](#validity)) between the countdown adaptive method and the conventional testing method.\index{adaptive testing!manual administration}\index{Minnesota Multiphasic Personality Inventory}
That is, the reduction in items does not impair the [validity](#validity) of the adaptive scales.\index{adaptive testing!manual administration}\index{Minnesota Multiphasic Personality Inventory}\index{validity}
However, you can get even greater item savings when using an [IRT](#irt) approach to adaptive testing.\index{adaptive testing!computerized}\index{item response theory}
Using [IRT](#irt), you can order the administration of items differently for each participant to administer the next item that will provide the most [information](#irtReliability) (i.e., precision) for a respondent's ability given their responses on all previous items.\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!information}

### CAT with IRT {#catIRT}

Computerized adaptive testing (CAT) based on [IRT](#irt) is widely used in educational testing, including the Graduate Record Examination (GRE) and the Graduate Management Admission Test (GMAT), but it is not widespread in mental health measurement for two reasons: (1) it works best with large item banks; large item banks are generally unavailable for mental health constructs; and (2) mental health constructs are multidimensional and CAT using [IRT](#irt) has primarily been restricted to unidimensional constructs, such as math achievement.\index{adaptive testing!computerized}\index{item response theory}\index{unidimensional}\index{multidimensional}
However, that is changing; there are now multidimensional [IRT](#irt) approaches that allow simultaneously estimating people's scores on multiple dimensions, as described in Section \@ref(irt-twoPLmultidimensional).\index{adaptive testing!computerized}\index{item response theory}\index{unidimensional}\index{multidimensional}
For example, the higher-order construct of externalizing problems includes sub-dimensions including aggression and rule-breaking.\index{multidimensional}

A CAT is designed to locate a person's level on the construct (theta) with as few items as possible.\index{adaptive testing!computerized}
CAT has potential for greater item savings than the [countdown approach](#adaptiveCountdown), but it has more assumptions.\index{adaptive testing!computerized}\index{adaptive testing!manual administration}
[IRT](#irt) can be used to create CATs using information gleaned from the [IRT](#irt) model's item parameters, including [discrimination](#itemDiscrimination) and [difficulty](#itemDifficulty) (or severity).\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!item discrimination}\index{item response theory!item difficulty}
The item's [discrimination](#itemDiscrimination) indicates how strongly the item is related to the construct.\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!item discrimination}
For a strong measure, you want highly [discriminating](#itemDiscrimination) items that are strongly related to the construct.\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!item discrimination}
In terms of [item difficulty/severity](#itemDifficulty), you want items that span the full distribution of [difficulty/severity](#itemDifficulty) so they are non-redundant.\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!item discrimination}\index{item response theory!item difficulty}

An [IRT](#irt)-based CAT starts with a highly [discriminating](#itemDiscrimination) item at the 50th percentile of the distribution ([severity](#itemDifficulty)) (or at whatever level is the best estimate of the person's ability/severity before the CAT is administered).\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!item discrimination}\index{item response theory!item difficulty}
Then, based on the participant's response, it generates a provisional estimate of the person's level on the construct and administers the item that will provide the most [information](#irtReliability).\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!information}
For example, if the respondent gets the first item correct, the CAT administers a highly [discriminating](#itemDiscrimination) item that might be at the 75th percentile of the distribution ([severity](#itemDifficulty)).\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!item discrimination}\index{item response theory!item difficulty}
And so on, based on the participants' responses.\index{adaptive testing!computerized}\index{item response theory}

As described in Section \@ref(irtReliability), [item information](#irtReliability) is how much measurement precision for the construct is provided by a particular item.\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!information}
In other words, [item information](#irtReliability) indicates how much the item reduces the [standard error of measurement](#standardErrorOfMeasurement); i.e., how much the item reduces uncertainty of our estimate of the respondent's construct level.\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!information}\index{item response theory!standard error of measurement}
An [IRT](#irt)-based CAT continues to generate provisional estimates of the person's construct level and updates it based on new responses.\index{adaptive testing!computerized}\index{item response theory}\index{item response theory}
The CAT tries to find the construct level where the respondent keeps getting items right and wrong (or endorsing versus not endorsing) 50% of the time.\index{adaptive testing!computerized}\index{item response theory}
This process continues until the uncertainty in the person's estimated construct level is smaller than a pre-defined threshold—that is, when we are fairly confident (to some threshold) about a person's construct level.\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!information}

CAT is a promising approach that saves time because it tailors which items are administered to which person and in which order based on their construct level to get the most [reliable](#reliability) estimate in the shortest time possible.\index{adaptive testing!computerized}\index{item response theory}
It administers items that are appropriate to the person's construct level, and it does not administer items that are way too easy or way too hard for the person, which saves time, boredom, and frustration.\index{adaptive testing!computerized}\index{item response theory!item difficulty}
It shortens the assessment because not all items are administered, and it typically results in around 50% item savings or more.\index{adaptive testing!computerized}\index{item response theory}
It also allows for increased measurement precision, because you are drilling down deeper (i.e., asking more items) near the person's construct level.\index{adaptive testing!computerized}\index{item response theory}\index{reliability!precision}\index{item response theory!information}
You can control the measurement precision of the CAT because you can specify the threshold of measurement precision (i.e., [standard error of measurement](#standardErrorOfMeasurement)) at which testing is terminated.\index{adaptive testing!computerized}\index{item response theory}\index{reliability!precision}\index{item response theory!information}\index{item response theory!standard error of measurement}
For example, you could have the CAT terminate when the [standard error of measurement](#standardErrorOfMeasurement) of a person's construct level becomes less than 0.2.\index{adaptive testing!computerized}\index{item response theory}\index{reliability!precision}\index{item response theory!information}\index{item response theory!standard error of measurement}

However, CATs assume that if you get a more difficult item correct, you would have gotten easier items correct, which might not be true in all contexts (especially for constructs that are not unidimensional).\index{adaptive testing!computerized}\index{item response theory}\index{item response theory!item difficulty}
For example, just because a person does not endorse low-severity symptoms does not necessarily mean that they will not endorse higher-severity symptoms, especially when assessing a multidimensional construct.\index{adaptive testing!computerized}\index{item response theory}\index{multidimensional}

Simpler CATs are built using unidimensional [IRT](#irt).\index{adaptive testing!computerized}\index{item response theory}\index{unidimensional}
But many aspects of psychopathology are not unidimensional, and would violate assumptions of unidimensional [IRT](#irt).\index{adaptive testing!computerized}\index{item response theory}\index{unidimensional}
For example, the externalizing spectrum includes sub-dimensions including aggression, disinhibition, and substance use.\index{multidimensional}
You can use multidimensional [IRT](#irt) to build CATs, though this is more complicated.\index{adaptive testing!computerized}\index{item response theory}\index{multidimensional}
For instance, you could estimate a [bifactor model](#bifactorModel) in which each item (e.g., "hits others") loads onto the general latent factor (e.g., externalizing problems) and its specific sub-dimension (e.g., aggression versus rule-breaking).\index{factor analysis!bifactor}
Computerized adaptive testing of mental health disorders is reviewed by @Gibbons2016.\index{adaptive testing!computerized}\index{item response theory}

Well-designed CATs show equivalent [reliability](#reliability) and [validity](#validity) to their full-scale counterparts.\index{adaptive testing!computerized!reliability}\index{adaptive testing!computerized!validity}\index{item response theory}
By contrast, many short forms are not as accurate as their full-scale counterparts.\index{short form}
Part of the reason that CATs tend to do better than short forms is that the CATs are adaptive; they determine which items to administer based on the participant's responses to prior items, unlike short forms.\index{adaptive testing!computerized!reliability}\index{adaptive testing!computerized!validity}\index{item response theory}\index{short form}
For guidelines on developing and evaluating short forms, see @Smith2000.\index{short form}

## Getting Started {#gettingStarted-cat}

### Load Libraries {#loadLibraries-cat}

```{r}
library("petersenlab") #to install: install.packages("remotes"); remotes::install_github("DevPsyLab/petersenlab")
library("mirtCAT")
library("here")
library("tinytex")
```

## Example of Unidimensional CAT {#exampleUnidimensionalCAT}

The computerized adaptive test (CAT) was fit using the `mirtCAT` package [@R-mirtCAT].\index{adaptive testing!computerized}
The example of a CAT with unidimensional data is adapted from `mirtCAT` documentation:\index{adaptive testing!computerized}\index{unidimensional} https://philchalmers.github.io/mirtCAT/html/unidim-exampleGUI.html (archived at https://perma.cc/3ZW4-DAUR)

### Define Population IRT Parameters {#definePopulationIRTparameters}

For reproducibility, we set the seed below.\index{simulate data}
Using the same seed will yield the same answer every time.
There is nothing special about this particular seed.

```{r irtParameters, cache = TRUE, cache.comments = FALSE}
set.seed(52242)
nitems <- 100
itemnames <- paste("Item.", 1:nitems, sep = "")
a <- matrix(rlnorm(nitems, meanlog = .2, sdlog = .3))
d <- matrix(rnorm(nitems, mean = 0, sd = 1))
pars <- data.frame(a1 = a, d = d, g = 0.2)
```

### Fit IRT Model {#fitIRTmodel-cat}

```{r irtModel, cache = TRUE, cache.extra = list(getRversion(), packageVersion("mirt"), packageVersion("mirtCAT")), cache.comments = FALSE, dependson = "irtParameters"}
mod <- generate.mirt_object(pars, "3PL")
```

### Model Summary {#modelOutput-cat}

```{r}
summary(mod)
coef(mod, simplify = TRUE, IRTpars = TRUE)
modItemParameters <- coef(
  mod,
  simplify = TRUE,
  IRTpars = TRUE)$items
```

### Model Plots {#modelPlots-cat}

A test characteristic curve of the measure is in Figure \@ref(fig:tcc-cat).\index{adaptive testing!computerized}\index{item response theory!test characteristic curve}

```{r tcc-cat, out.width = "100%", fig.align = "center", fig.cap = "Test Characteristic Curve."}
plot(mod)
```

The test information and [standard error of measurement](#standardErrorOfMeasurement) as a function of the person's construct level (theta; $\theta$) is in Figure \@ref(fig:infoSE-cat).\index{adaptive testing!computerized}\index{item response theory!test information curve}\index{item response theory!standard error of measurement}
The test appears to measure theta ($\theta$) with $SE < .4$ within a range of −1.5 to 2.\index{adaptive testing!computerized}

```{r infoSE-cat, out.width = "100%", fig.align = "center", fig.cap = "Test Information and Standard Error of Measurement."}
plot(mod, type = "infoSE", theta_lim = c(-3, 3))
```

### Item Plots {#itemPlots-cat}

Item characteristic curves are in Figure \@ref(fig:icc-cat).\index{adaptive testing!computerized}\index{item response theory!item characteristic curve}

```{r icc-cat, out.width = "100%", fig.align = "center", fig.cap = "Item Characteristic Curves."}
plot(mod, type = "trace", theta_lim = c(-3, 3))
```

Item information curves are in Figure \@ref(fig:info-cat).\index{adaptive testing!computerized}\index{item response theory!item information curve}

```{r info-cat, out.width = "100%", fig.align = "center", fig.cap = "Item Information Curves."}
plot(mod, type = "infotrace", theta_lim = c(-3, 3))
```

- Item 30 would be a good starting item; it has a difficulty closest to the 50th percentile $(b = `r round(modItemParameters["Item.30","b"], 3)`)$ and a high discrimination $(a = `r round(modItemParameters["Item.30","a"], 3)`)$.\index{adaptive testing!computerized}\index{item response theory!item difficulty}\index{item response theory!item discrimination}
- Item 70 would be a good starting item; it has a difficulty closest to the 50th percentile $(b = `r round(modItemParameters["Item.70","b"], 3)`)$ and a high discrimination $(a = `r round(modItemParameters["Item.70","a"], 3)`)$.\index{adaptive testing!computerized}\index{item response theory!item difficulty}\index{item response theory!item discrimination}
- Item 86 is an easy item $(b = `r round(modItemParameters["Item.86","b"], 3)`)$.\index{adaptive testing!computerized}\index{item response theory!item difficulty}
- Item 16 is a difficult item $(b = `r round(modItemParameters["Item.16","b"], 3)`)$.\index{adaptive testing!computerized}\index{item response theory!item difficulty}
- Item 53 has a low discrimination $(a = `r round(modItemParameters["Item.53","a"], 3)`)$.\index{adaptive testing!computerized}\index{item response theory!item discrimination}

Item characteristic curves and information curves for these items are in Figures \@ref(fig:catICCItem30)–\@ref(fig:catICCItem53).\index{adaptive testing!computerized}\index{item response theory!item characteristic curve}\index{item response theory!item information curve}

```{r catICCItem30, out.width = "100%", fig.align = "center", fig.cap = "Item Characteristic Curves and Information Curves: Item 30."}
itemplot(object = mod, item = "Item.30", "infotrace")
```

```{r catICCItem70, out.width = "100%", fig.align = "center", fig.cap = "Item Characteristic Curves and Information Curves: Item 70."}
itemplot(object = mod, item = "Item.70", "infotrace")
```

```{r catICCItem86, out.width = "100%", fig.align = "center", fig.cap = "Item Characteristic Curves and Information Curves: Item 86."}
itemplot(object = mod, item = "Item.86", "infotrace")
```

```{r catICCItem16, out.width = "100%", fig.align = "center", fig.cap = "Item Characteristic Curves and Information Curves: Item 16."}
itemplot(object = mod, item = "Item.16", "infotrace")
```

```{r catICCItem53, out.width = "100%", fig.align = "center", fig.cap = "Item Characteristic Curves and Information Curves: Item 53."}
itemplot(object = mod, item = "Item.53", "infotrace")
```

### Create Math Items {#createMathItems-cat}

```{r mathItems, cache = TRUE, cache.comments = FALSE, dependson = "irtParameters"}
questions <- answers <- character(nitems)
choices <- matrix("a", nitems, 5)
spacing <- floor(d - min(d)) + 1 #easier items have more variation

for(i in 1:nitems){
  n1 <- sample(1:100, 1)
  n2 <- sample(101:200, 1)
  ans <- n1 + n2
  questions[i] <- paste(n1, " + ", n2, " = ?", sep = "")
  answers[i] <- as.character(ans)
  ch <- ans + sample(c(-5:-1, 1:5) * spacing[i,], 5)
  ch[sample(1:5, 1)] <- ans
  choices[i,] <- as.character(ch)
}

df <- data.frame(
  Questions = questions,
  Answer = answers,
  Option = choices,
  Type = "radio")
```

### Run Computerized Adaptive Test (CAT) {#runCAT}

Set the minimum [standard error of measurement](#standardErrorOfMeasurement) for the latent trait (theta; $\theta$) that must be reached before stopping the CAT.\index{adaptive testing!computerized}\index{item response theory!standard error of measurement}
You can lengthen the CAT by lowering the minimum [standard error of measurement](#standardErrorOfMeasurement), or you can shorten the CAT by raising the minimum [standard error of measurement](#standardErrorOfMeasurement).\index{adaptive testing!computerized}\index{item response theory!standard error of measurement}

```{r minimumSEM, cache = TRUE, cache.comments = FALSE}
minimum_SEM <- .3
```

Run the CAT and stop once the [standard error of measurement](#standardErrorOfMeasurement) for the latent trait (theta; $\theta$) becomes $`r minimum_SEM`$ or lower.\index{adaptive testing!computerized}\index{item response theory!standard error of measurement}

```{r, eval = FALSE}
result <- mirtCAT(
  df,
  mod,
  start_item = "MI",
  method = "EAP",
  criteria = "MI",
  design = list(min_SEM = minimum_SEM))
```

```{r catModel, include = FALSE}
load(here("Data", "cat.rdata"))
```

```{r}
RNGkind("default", "default", "default")
```

### CAT Results {#catResults}

```{r}
print(result)
summary(result)
```

```{r, include = FALSE}
resultTheta <- as.numeric(result$thetas)
```

#### CAT Standard Errors {#CATstandardErrors}

[Standard errors of measurement](#standardErrorOfMeasurement) of a person's estimated construct level (theta; $\theta$) as a function of the item (presented in the order that items were administered as part of the CAT) are in Figure \@ref(fig:standardError-cat).\index{adaptive testing!computerized}\index{item response theory!standard error of measurement}

Initially, the respondent got the first items correct, which raised the estimate of their construct level.\index{adaptive testing!computerized}
However, as the respondent got more items incorrect, the CAT converged upon the estimate that the person has a construct level around $theta = `r apa(resultTheta, decimals = 2, leading = TRUE)`$, which means that the person scored slightly above average (i.e., `r apa(resultTheta, decimals = 2, leading = TRUE)` standard deviations above the mean).\index{adaptive testing!computerized}

```{r standardError-cat, out.width = "100%", fig.align = "center", fig.cap = "Standard Errors of Measurement Around Theta in a Computerized Adaptive Test."}
plot(result, SE = 1)
```

#### CAT 95% Confidence Interval {#cat95CI}

95% confidence intervals of a person's estimated construct level (theta; $\theta$) as a function of the item (presented in the order that items were administered as part of the CAT) are in Figure \@ref(fig:cat95PctCI).\index{adaptive testing!computerized}

(ref:cat95PctCICaption) 95% Confidence Interval of Theta in a Computerized Adaptive Test.

```{r cat95PctCI, out.width = "100%", fig.align = "center", fig.cap = "(ref:cat95PctCICaption)"}
plot(result, SE = qnorm(.975))
```

## Creating a Computerized Adaptive Test From an Item Response Theory Model {#cat-preexisting}

You can create a computerized adaptive test from any [item response theory](#irt) model.\index{adaptive testing!computerized}\index{item response theory}

### Create Items {#cat-createItems}

For instance, below we create a matrix with the questions and response options from the [graded response model](#irt-gradedResponseModel) that we fit in Section \@ref(irt-gradedResponseModel) in Chapter \@ref(irt) on [item response theory](#irt).\index{adaptive testing!computerized}\index{item response theory!graded response model}

```{r}
numItemsGRM <- nrow(coef(
  gradedResponseModel,
  simplify = TRUE,
  IRTpars = TRUE)$items)

names(Science)

questionsGRM <- c(
  "Science and technology are making our lives healthier, easier and more comfortable.",
  "The application of science and new technology will make work more interesting.",
  "Thanks to science and technology, there will be more opportunities for the future generations.",
  "The benefits of science are greater than any harmful effect it may have."
)

responseOptionsGRM <- c(
  "strongly disagree",
  "disagree to some extent",
  "agree to some extent",
  "strongly agree"
)

numResponseOptionsGRM <- length(responseOptionsGRM)

choicesGRM <- matrix("a", numItemsGRM, numResponseOptionsGRM)

for(i in 1:numItemsGRM){
  choicesGRM[i,] <- responseOptionsGRM
}

dfGRM <- data.frame(
  Questions = questionsGRM,
  Option = choicesGRM,
  Type = "radio")
```

### Run Computerized Adaptive Test {#cat-runModel}

Then you can create and run the computerized adaptive model:\index{adaptive testing!computerized}

```{r, eval = FALSE}
resultGRM <- mirtCAT(
  dfGRM,
  gradedResponseModel,
  start_item = "MI",
  method = "EAP",
  criteria = "MI",
  design = list(min_SEM = minimum_SEM))
```

## Conclusion {#conclusion-cat}

Computer-administered and online assessments have the potential to be both desirable and dangerous.\index{computerized assessment}
They have key [advantages](#computerAdvantages); at the same time, they have both [validity](#computerValidity) and [ethical](#computerEthics) challenges.\index{computerized assessment}
[Best practices](#computerBestPractices) for computer-administered and online assessments are provided.\index{computerized assessment}
[Adaptive testing](#adaptiveTesting) involves having the respondent complete only those items that are needed to answer an assessment question, which can save immense time without sacrificing [validity](#validity) (if done well).\index{adaptive testing}\index{validity}
There are many approaches to [adaptive testing](#adaptiveTesting), including [manual administration](#adaptiveManual)—such as [skip rules](#skipRules), [basal and ceiling criteria](#basalCeiling), and the [countdown approach](#adaptiveCountdown)—and [computerized adaptive testing (CAT) using item response theory](#catIRT).\index{adaptive testing!manual administration}\index{adaptive testing!computerized}
A [CAT](#catIRT) is designed to locate a person's level on the construct with as few items as possible.\index{adaptive testing!computerized}
A [CAT](#catIRT) administers the items that will provide the most [information](#irtReliability) based on participants' previous responses.\index{adaptive testing!computerized}\index{item response theory!information}
[CAT](#catIRT) typically results in the greatest item savings—around 50% item savings or more.\index{adaptive testing!computerized}

## Suggested Readings {#readings-cat}

@Buchanan2002; @Gibbons2016

## Exercises {#exercises-cat}

```{r, include = FALSE}
library("MOTE")
```

### Questions {#exercisesQuestions-cat}

### Answers {#exercisesAnswers-cat}