-
Notifications
You must be signed in to change notification settings - Fork 3
/
07-Generalizability-Theory.Rmd
360 lines (270 loc) · 29.7 KB
/
07-Generalizability-Theory.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
# Generalizability Theory {#gTheory}
Up to this point, we have discussed [reliability](#reliability) from the perspective of classical test theory (CTT).\index{reliability}\index{classical test theory}
However, as we discussed, CTT makes several assumptions that are unrealistic.\index{classical test theory}
For example, CTT assumes that all error is [random](#randomError).\index{classical test theory}\index{measurement error!random error}
In CTT, there is an assumption that there exists a true score that is an accurate measure of the trait under a *specific* set of conditions.\index{classical test theory}\index{true score}
There are other measurement theories that conceptualize [reliability](#reliability) differently than the way that CTT conceptualizes [reliability](#reliability).\index{classical test theory}\index{reliability}
One such measurement theory is generalizability theory [@Brennan1992], also known as G-theory and domain sampling theory.\index{generalizability theory}
G-theory is also discussed in the chapters on [reliability](#reliability) (Chapter \@ref(reliability), Section \@ref(gTheoryReliability)), [validity](#validity) (Chapter \@ref(validity), Section \@ref(gTheoryValidity)) and [structural equation modeling](#sem) (Chapter \@ref(sem), Section \@ref(generalizability-SEM)).\index{generalizability theory}\index{reliability}\index{validity}\index{structural equation modeling}
## Overview of Generalizability Theory (G-Theory) {#overview-gTheory}
G-theory is an alternative measurement theory to CTT that does not treat all measurement differences across time, rater, or situation as "error" but rather as a phenomenon of interest [@Wiggins1973].\index{generalizability theory}\index{classical test theory}\index{measurement error}
G-theory is a measurement theory that is used to examine the extent to which scores are consistent across a specific set of conditions.\index{generalizability theory}
In G-theory, the true score is conceived of as a person's *universe score*—the mean of all observations for a person over all conditions in the universe—this allows us to estimate and recognize the magnitude of multiple influences on test performance.\index{generalizability theory}\index{true score}\index{generalizability theory!universe score}
These multiple influences on test performance are called *facets*.\index{generalizability theory!facet}
### The Universe of Generalizability {#universeGeneralizability}
Instead of conceiving of all variability in a person's scores as error, G-theory argues that we should describe the details of the particular test situation (universe) that lead to a specific test score.\index{generalizability theory}\index{generalizability theory!universe}
The universe is described in terms of its facets:\index{generalizability theory}\index{generalizability theory!universe}\index{generalizability theory!facet}
- settings\index{generalizability theory}\index{generalizability theory!universe}\index{generalizability theory!facet}
- observers (e.g., amount of training they had)\index{generalizability theory}\index{generalizability theory!universe}\index{generalizability theory!facet}
- instruments (e.g., number of items in test)\index{generalizability theory}\index{generalizability theory!universe}\index{generalizability theory!facet}
- occasions (time points)\index{generalizability theory}\index{generalizability theory!universe}\index{generalizability theory!facet}
- attributes (i.e., what we are assessing; the purpose of test administration)\index{generalizability theory}\index{generalizability theory!universe}\index{generalizability theory!facet}
Measures with strong [reliability](#reliability) show a high ratio of variance as a function of the person relative to the variance as a function of other facets or factors.\index{generalizability theory}\index{reliability}
To the extent that variance in scores is attributable to different settings, observers, instruments, occasions, attributes, or other facets, the [reliability](#reliability) of the measure is weakened.\index{generalizability theory}\index{reliability}
### Universe Score {#universeScore}
A person's universe score is the average of a person's scores across all conditions in the universe.\index{generalizability theory}\index{true score}\index{generalizability theory!universe score}
According to G-theory, given the exact same conditions of all the facets in the universe, the exact same test score should be obtained.\index{generalizability theory}\index{true score}\index{generalizability theory!universe score}\index{generalizability theory!facet}
This is the universe score, which is analogous to the true score in CTT.\index{generalizability theory}\index{true score}\index{generalizability theory!universe score}
### G-Theory Perspective on Reliability {#reliability-gTheory}
G-theory asserts that the [reliability](#reliability) of a test does not reside within the test itself; a test's [reliability](#reliability) depends on the circumstances under which it is developed, administered, and interpreted.\index{generalizability theory}\index{reliability}
A person's test scores vary from testing to testing because of (many) variables in the testing situation.\index{generalizability theory}\index{generalizability theory!facet}
By assessing a person in multiple facets of the universe, this allows us to estimate and recognize the magnitude of multiple sources of [measurement error](#measurementError).\index{generalizability theory}\index{generalizability theory!facet}\index{measurement error}
Such [measurement error](#measurementError) includes:\index{measurement error}
- day-to-day variation in performance (stability of the construct, [test–retest reliability](#testRetest-reliability))\index{generalizability theory}\index{generalizability theory!facet}\index{measurement error}\index{reliability!test–retest}
- variance in the item sampling (coefficient of [internal consistency](#internalConsistency-reliability))\index{generalizability theory}\index{generalizability theory!facet}\index{measurement error}\index{reliability!internal consistency}
- variance due to both day-to-day and item sampling (coefficient of equivalence from [parallel-forms reliability](#parallelForms-reliability), or [convergent validity](#convergentValidity), as discussed in Section \@ref(gTheoryValidity))\index{generalizability theory}\index{generalizability theory!facet}\index{measurement error}\index{reliability!parallel forms}\index{validity!convergent}
In G-theory, all sources of [measurement error](#measurementError) (facets) are considered simultaneously—something CTT cannot achieve [@Shavelson1989].\index{generalizability theory}\index{generalizability theory!facet}\index{measurement error}\index{measurement error}
This occurs through specifying many different variance facets in the estimation of the true score, rather than just one source of error variance as in CTT.\index{generalizability theory}\index{generalizability theory!facet}\index{measurement error}\index{true score}\index{classical test theory}
This specification allows us to take into consideration variance due to occasion effects, item effects, and occasion $\times$ item effects (i.e., main effects of the facets in addition to their interaction), as in Table \@ref(tab:gTheoryTable).\index{generalizability theory}\index{measurement error}\index{measurement error}\index{generalizability theory!facet}
```{r, include = FALSE}
gTheoryTable <- data.frame(
"source" = c("Person (p)", "Item (i)", "Occasion (o)", "p x i", "p x o", "i x o", "p x i x o", "residual"),
"percentVariance" = c(30, 5, 3, 25, 5, 2, 10, 20))
```
```{r gTheoryTable, echo = FALSE}
kable(
gTheoryTable,
col.names = c("Source","Variance Accounted For (%)"),
caption = "Percent of Variance from Different Sources in Generalizability Theory Model With Three Facets: Person, Item, and Occasion (and Their Interactions).
(Adapted from @Webb2005, Table 1, p. 2. Webb, N. M., \\& Shavelson, R. J. (2005). Generalizability theory: Overview. In B. S. Everitt \\& D. C. Howell (Eds.), *Encyclopedia of statistics in behavioral science* (Vol. 2, pp. 717–719). John Wiley \\& Sons, Ltd. https://doi.org/10.1002/0470013192.bsa703)",
caption.short = "Percent of Variance from Different Sources in Generalizability Theory Model With Three Facets: Person, Item, and Occasion (and Their Interactions).",
booktabs = TRUE)
```
A score's usefulness largely depends on the extent to which it allows us to generalize accurately to behavior in a wider set of situations—i.e., a universe of generalization.\index{generalizability theory}\index{generalizability theory!universe}
The G-Theory equivalent of the CTT [reliability](#reliability) coefficient of a measure is the generalizability coefficient or dependability coefficient.\index{generalizability theory}\index{generalizability theory!generalizability coefficient}\index{generalizability theory!dependability coefficient}
### G-Theory Perspective on Validity {#validity-gTheory}
G-theory can simultaneously consider multiple aspects of [reliability](#reliability) and [validity](#validity) in the same model.\index{generalizability theory}\index{reliability}\index{validity}
For instance, [internal consistency reliability](#internalConsistency-reliability), [test–retest reliability](#testRetest-reliability), [inter-rater reliability](#interrater-reliability), [parallel-forms reliability](#parallelForms-reliability), and [convergent validity](#convergentValidity) [in the @Campbell1959 sense of the same construct assessed by a different method] can all be incorporated into a G-theory model.\index{generalizability theory}\index{reliability!internal consistency}\index{reliability!test–retest}\index{reliability!parallel forms}\index{validity!convergent}
For example, a G-theory model could assess each participant across the following facets:\index{generalizability theory}\index{generalizability theory!facet}
- time: e.g., T1 and T2 ([test–retest reliability](#testRetest-reliability))\index{generalizability theory}\index{generalizability theory!facet}\index{reliability!test–retest}
- items: e.g., questions within the same instrument ([internal consistency reliability](#internalConsistency-reliability)) and questions across different instruments ([parallel-forms reliability](#parallelForms-reliability))\index{generalizability theory}\index{generalizability theory!facet}\index{reliability!internal consistency}\index{reliability!parallel forms}
- rater: e.g., self-report and other-report ([inter-rater reliability](#interrater-reliability))\index{generalizability theory}\index{generalizability theory!facet}\index{reliability!inter-rater}
- method: e.g., questionnaire and observation ([convergent validity](#convergentValidity))\index{generalizability theory}\index{generalizability theory!facet}\index{validity!convergent}
Using such a G-theory model, we can determine the extent to which scores on a measure generalize to other conditions, measures, etc.\index{generalizability theory}
Measures with strong [convergent validity](#convergentValidity) show a high ratio of variance as a function of the person relative to the variance as a function of measurement method.\index{generalizability theory}\index{validity!convergent}
To the extent that variance in scores is attributable to different measurement methods, [convergent validity](#convergentValidity) is weakened.\index{generalizability theory}\index{validity!convergent}
An example data structure that could leverage G-theory to partition the variance in scores as a function of different facets (person, time, item, rater, method) and their interactions is in Table \@ref(tab:gtheoryDataStructure).\index{generalizability theory}\index{generalizability theory!facet}
Table: (\#tab:gtheoryDataStructure) Example Data Structure for Generalizability Theory With the Following Facets: Person, Time, Item, Rater, Method.
| Person | Time | Item | Rater | Method | Score |
|---------------|-------------|----------------------|--------------|----------------------|--------------|
| 1 | 1 | "hits others" | 1 | questionnaire | 10 |
| 1 | 1 | "hits others" | 1 | observation | 15 |
| 1 | 1 | "hits others" | 2 | questionnaire | 8 |
| 1 | 1 | "hits others" | 2 | observation | 13 |
| 1 | 1 | "argues" | 1 | questionnaire | 4 |
| 1 | 1 | "argues" | 1 | observation | 2 |
| 1 | 1 | "argues" | 2 | questionnaire | 5 |
| 1 | 1 | "argues" | 2 | observation | 7 |
| 1 | 2 | "hits others" | 1 | questionnaire | 8 |
| 1 | 2 | "hits others" | 1 | observation | 10 |
| 1 | 2 | "hits others" | 2 | questionnaire | 6 |
| 1 | 2 | "hits others" | 2 | observation | 7 |
| 1 | 2 | "argues" | 1 | questionnaire | 2 |
| 1 | 2 | "argues" | 1 | observation | 2 |
| 1 | 2 | "argues" | 2 | questionnaire | 4 |
| 1 | 2 | "argues" | 2 | observation | 6 |
| 2 | 1 | "hits others" | 1 | questionnaire | 5 |
| ... | ... | ... | ... | ... | ... |
In sum, G-theory can be a useful way of estimating the degree of [reliability](#reliability) and [validity](#validity) of a measure's scores in the same model.\index{generalizability theory}\index{reliability}\index{validity}
### Generalizability Study {#generalizabilityStudy}
In G-theory, the goal is to conduct a *generalizability study* (G study) and a [*decision study*](#decisionStudy) (D study).\index{generalizability theory}\index{generalizability theory!generalizability study}\index{generalizability theory!decision study}
A generalizability study examines the extent of variance in the scores that is attributable to various facets.\index{generalizability theory}\index{generalizability theory!generalizability study}
The researcher must specify and define the universe (set of conditions) to which they would like to generalize their observations and in which they would like to study the [reliability](#reliability) of the measure.\index{generalizability theory}\index{generalizability theory!generalizability study}\index{generalizability theory!universe}\index{reliability}
For instance, it might involve randomly sampling from within that universe in terms of people, items, observers, conditions, timepoints, measurement methods, etc.\index{generalizability theory}\index{generalizability theory!generalizability study}\index{generalizability theory!universe}
### Decision Study {#decisionStudy}
After conducting a [generalizability study](#generalizabilityStudy), one can then use the estimates of the extent of variance in scores that are attributable to various facets (estimated from the [generalizability study](#generalizabilityStudy)) to conduct a decision study (D study).\index{generalizability theory}\index{generalizability theory!generalizability study}\index{generalizability theory!decision study}
A decision study examines how generalizable scores from a particular test are if the test is administered in different situations.\index{generalizability theory}\index{generalizability theory!generalizability study}
In G-theory, [reliability](#reliability) is estimated with the [*generalizability coefficient*](#generalizabilityCoefficient) and the [*dependability coefficient*](#dependabilityCoefficient).\index{generalizability theory}\index{generalizability theory!generalizability coefficient}\index{generalizability theory!dependability coefficient}\index{reliability}
### Analysis Approach {#analysis-gTheory}
Traditionally, a generalizability theory approach would test the generalizability study and decision study using a factorial analysis of variance [ANOVA; @Brennan1992], as exemplified in Section \@ref(gTheoryANOVA), (as opposed to simple ANOVA in CTT).\index{generalizability theory}\index{generalizability theory!generalizability study}\index{generalizability theory!decision study}\index{ANOVA}\index{classical test theory}
However, ANOVA is limiting—it works best with balanced designs, such as with the same sample size in each condition/facet; but in most real-world applications, data are not equally balanced in each condition.\index{generalizability theory}\index{ANOVA}
So, it is better to fit G-theory models in a mixed model framework, as exemplified in Section \@ref(gTheoryMixedModel).\index{generalizability theory}\index{mixed model}
### Practical Challenges {#challenges-gTheory}
G-theory is strong theoretically, but it has not been widely implemented.\index{generalizability theory}
G-theory can be challenging because the researcher must specify, define, and assess the universe to which they would like to generalize their observations and to understand the [reliability](#reliability) of the measure.\index{generalizability theory}\index{reliability}
## Getting Started {#gettingStarted-gTheory}
### Load Libraries {#loadLibraries-gTheory}
```{r}
library("petersenlab") #to install: install.packages("remotes"); remotes::install_github("DevPsyLab/petersenlab")
library("gtheory")
library("MOTE")
library("tidyverse")
library("tinytex")
library("knitr")
library("rmarkdown")
library("bookdown")
```
### Prepare Data {#prepareData-gTheory}
#### Generate Data {#generateData-gTheory}
For reproducibility, we set the seed below.\index{simulate data}
Using the same seed will yield the same answer every time.
There is nothing special about this particular seed.
```{r}
set.seed(52242)
Person <- as.factor(rep(1:6, each = 8))
Occasion <- Rater <- as.factor(rep(1:2, each = 4, times = 6))
Item <- as.factor(rep(1:4, times = 12))
Score <- c(
9,9,7,4,9,8,5,5,9,8,4,6,
6,5,3,3,8,8,6,2,8,7,3,2,
9,8,6,3,9,6,6,2,10,9,8,7,
8,8,9,7,6,4,5,1,3,2,3,2)
```
#### Add Missing Data {#addMissingData-gTheory}
Adding missing data to dataframes helps make examples more realistic to real-life data and helps you get in the habit of programming to account for missing data.
```{r}
Score[30] <- NA
```
#### Combine Data into Dataframe {#combineData-gTheory}
```{r}
pio_cross_dat <- data.frame(Person, Item, Score, Occasion)
```
Below are examples implementing G-theory.
The `pio_cross_dat` data file for these examples comes from the `gtheory` package [@R-gtheory].
The examples are adapted from @Huebner2019.
### Universe Score for Each Person {#universeScore-example}
Universe scores for each person are generated using the following syntax and are presented in Table \@ref(tab:universeScores).\index{generalizability theory!universe score}
```{r universeScores, echo = FALSE}
pio_cross_dat %>%
group_by(Person) %>%
summarise(
universeScore = mean(Score, na.rm = TRUE),
.groups = "drop")
```
### Generalizability (G) Study {#gStudy-example}
[Generalizability studies](#generalizabilityStudy) can be conducted in an [ANOVA](#gTheoryANOVA) or [mixed model](#gTheoryMixedModel) framework.\index{generalizability theory}\index{generalizability theory!generalizability study}
Below, we fit a [generalizability study](#generalizabilityStudy) model in each framework.\index{generalizability theory}\index{generalizability theory!generalizability study}
In these models, the item, person, and their interaction appear to be the three facets that account for the most variance in scores.\index{generalizability theory}\index{generalizability theory!facet}
Thus, when designing future studies, it would be important to assess and evaluate these facets.\index{generalizability theory}\index{generalizability theory!generalizability study}
#### ANOVA Framework {#gTheoryANOVA}
\index{generalizability theory}\index{ANOVA}
```{r}
summary(aov(
Score ~ Person*Item*Occasion,
data = pio_cross_dat))
```
#### Mixed Model Framework {#gTheoryMixedModel}
The mixed model framework for estimating generalizability is described by @Jiang2018a.\index{generalizability theory}\index{mixed model}
```{r}
summary(lmer(
Score ~ 1 + (1|Person) + (1|Item) + (1|Occasion) +
(1|Person:Occasion) + (1|Person:Item) + (1|Occasion:Item),
data = pio_cross_dat))
PxIxO <- gstudy(
data = pio_cross_dat,
Score ~ (1|Person) + (1|Item) + (1|Occasion) + (1|Person:Item) +
(1|Person:Occasion) + (1|Occasion:Item))
PxIxO
```
### Decision (D) Study {#dStudy-example}
The [decision (D) study](#decisionStudy) and [generalizability (G) study](#generalizabilityStudy), from which the [generalizability](#generalizabilityCoefficient) and [dependability](#dependabilityCoefficient) coefficients can be estimated, were analyzed using the `gtheory` package [@R-gtheory].\index{generalizability theory}\index{generalizability theory!generalizability study}\index{generalizability theory!decision study}\index{generalizability theory!generalizability coefficient}\index{generalizability theory!dependability coefficient}
```{r}
decisionStudy <- dstudy(
PxIxO,
colname.objects = "Person",
data = pio_cross_dat,
colname.scores = "Score")
```
### Generalizability Coefficient {#generalizabilityCoefficient}
The generalizability coefficient is analogous to the [reliability](#reliability) coefficient in CTT.\index{generalizability theory}\index{generalizability theory!generalizability coefficient}\index{reliability}
It divides the estimated person variance component (the universe score variance) by the estimated observed-score variance (with some adjustment for the number of observations).\index{generalizability theory}\index{generalizability theory!generalizability coefficient}
In other words, variance in a [reliable](#reliability) measure should mostly be due to person variance rather than variance as a function of items, occasion, raters, methods, or other factors.\index{generalizability theory}\index{generalizability theory!facet}\index{reliability}
The generalizability coefficient uses relative error variance, so it characterizes the similarity in the relative standing of individuals, similar to CTT-based estimates of relative reliability, such as [Cronbach's alpha](#cronbachAlpha).\index{generalizability theory}\index{generalizability theory!generalizability coefficient}
Thus, the generalizability coefficient is an index of relative reliability.\index{generalizability theory}\index{generalizability theory!generalizability coefficient}\index{reliability!relative}
The generalizability coefficient ranges from 0–1, and higher scores reflect better reliability.\index{generalizability theory}\index{generalizability theory!generalizability coefficient}
```{r}
decisionStudy$generalizability
```
### Dependability Coefficient {#dependabilityCoefficient}
The dependability coefficient is similar to the [generalizability coefficient](#generalizabilityCoefficient); however, it uses absolute error variance rather than relative error variance in the estimation.\index{generalizability theory}\index{generalizability theory!dependability coefficient}\index{reliability}
The dependability coefficient characterizes the absolute magnitude of differences across scores, not (just) the relative standing of individuals.
Thus, the dependability coefficient is an index of absolute reliability.\index{generalizability theory}\index{generalizability theory!dependability coefficient}\index{reliability!absolute}
The dependability coefficient ranges from 0–1, and higher scores reflect better reliability.\index{generalizability theory}\index{generalizability theory!dependability coefficient}
```{r}
decisionStudy$dependability
```
## Conclusion {#conclusion-gTheory}
G-theory provides an important reminder that [reliability](#reliability) is not one thing.\index{generalizability theory}\index{reliability}
You cannot just say that a test "is reliable"; it is important to specify the facets across which the [reliability](#reliability) and [validity](#validity) of a measure have been established (e.g., times, raters, items, groups, instruments).\index{generalizability theory}\index{reliability}\index{validity}\index{generalizability theory!facet}
Generalizability theory can be a useful way of estimating multiple aspects of [reliability](#reliability) and [validity](#validity) of measures in the same model.\index{generalizability theory}\index{reliability}\index{validity}
## Suggested Readings {#readings-gTheory}
@Brennan2001
## Exercises {#exercises-gTheory}
```{r, include = FALSE}
library("tidyverse")
library("here")
library("MOTE")
```
```{r, include = FALSE}
# Generalizability Theory -------------------------------------------------
participant_ex <- rep(c(1,2,3), times = 6)
occasion_ex <- rep(rep(c(1,2), each = 3), times = 3)
rater_ex <- rep(c(1,2,3), each = 6)
score_ex <- c(7,2,0,10,5,1,8,3,1,11,6,1,6,2,4,12,6,5)
por_data_ex <- data.frame(participant_ex, occasion_ex, rater_ex, score_ex)
universeScores_ex <- por_data_ex %>%
group_by(participant_ex) %>%
summarise(universeScore = mean(score_ex, na.rm = TRUE), .groups = "drop") #universe score for each participant
person1_ex <- universeScores_ex$universeScore[which(universeScores_ex$participant_ex == 1)]
person2_ex <- universeScores_ex$universeScore[which(universeScores_ex$participant_ex == 2)]
person3_ex <- universeScores_ex$universeScore[which(universeScores_ex$participant_ex == 3)]
PxOxR_ex <- gstudy(data = por_data_ex,
score_ex ~ (1|participant_ex) + (1|occasion_ex) + (1|rater_ex) +
(1|participant_ex:occasion_ex) + (1|participant_ex:rater_ex) + (1|rater_ex:occasion_ex))
varParticipant_ex <- PxOxR_ex$components$percent[which(PxOxR_ex$components$source == "participant_ex")]
varRater_ex <- PxOxR_ex$components$percent[which(PxOxR_ex$components$source == "rater_ex")]
varOccasion_ex <- PxOxR_ex$components$percent[which(PxOxR_ex$components$source == "occasion_ex")]
varRaterXOccasion_ex <- PxOxR_ex$components$percent[which(PxOxR_ex$components$source == "rater_ex:occasion_ex")]
#Calculate generalizability coefficients
decisionStudy_ex <- dstudy(PxOxR_ex, colname.objects = "participant_ex", data = por_data_ex, colname.scores = "score_ex")
decisionStudy_ex
generalizabilityCoefficient_ex <- decisionStudy_ex$generalizability #analogous to reliability coefficient in classical test theory; uses relative error variance: relevant when relative (norm-referenced) decisions are made (e.g., percentiles)
dependabilityCoefficient_ex <- decisionStudy_ex$dependability #analogous to reliability coefficient in classical test theory; uses absolute error variance: relevant when absolute decisions are made
```
### Questions {#exercisesQuestions-gTheory}
1. You want to see how generalizable the Antisocial Behavior subscale of the BPI is.
You conduct a generalizability study ("$G$ study") to see how generalizable the scores are across participants ($N = 3$), two measurement occasions, and three raters.
What is the universe score (estimate of true score) for each participant?
What percent of variance is attributable to: (a) individual differences among participants, (b) different raters, (c) the measurement occasions, and (d) the interactive effect of raters and measurement occasions?
Using a decision study ("$D$ study"), what are the generalizability and dependability coefficients?
Is the measure reliable across the universe (range of factors) we examined it in?
Interpret the results from the G study and D study.
The data from your study are in Table \@ref(tab:evaluatingGeneralizabilityTable) below:
```{r, include = FALSE}
por_data_ex2 <- por_data_ex
names(por_data_ex2) <- c("Participant","Occasion","Rater","Score")
```
```{r evaluatingGeneralizabilityTable, echo = FALSE}
kable(
por_data_ex2,
caption = "Exercise 12: Table of Example Data for Evaluating Generalizability of Scores Across Participants, Occasions, and Raters.",
booktabs = TRUE)
```
### Answers {#exercisesAnswers-gTheory}
1. The universe score is $`r person1_ex`$, $`r person2_ex`$, and $`r person3_ex`$ for participants 1, 2, and 3, respectively.
The percent of variance attributable to each of those factors is: (a) participant: $`r apa(varParticipant_ex, decimals = 1)`\%$, (b) rater: $`r apa(varRater_ex, decimals = 1)`\%$, (c) the measurement occasion: $`r apa(varOccasion_ex, decimals = 1)`\%$, and (d) the interactive effect of rater and measurement occasion: $`r apa(varRaterXOccasion_ex, decimals = 1)`\%$.
The generalizability coefficient is $`r apa(generalizabilityCoefficient_ex, decimals = 2, leading = FALSE)`$, and the dependability coefficient is $`r apa(dependabilityCoefficient_ex, decimals = 2, leading = FALSE)`$.
The measure is fairly reliable across the universe examined, both in terms of relative differences (generalizability coefficient) and absolute differences (dependability coefficient).
However, when using the measure in future work, it would be important to assess many participants (due to the strong presence of individual differences) across multiple occasions (due to the strong effect of occasion) and multiple raters (due to the moderated effect of rater as a function of participant).
It will be important to account for the considerable sources of variance in scores on the measure in future work including: participant, occasion, participant $\times$ rater, and participant $\times$ occasion.
By accounting for these sources of variance, we can get a purer estimate of each participant's true score and the population's true score.