Abstract
Research topic/aim
How to adjust the sum score for Differential Item Funtioning in educational measurement.
Theoretical framework
Differential Item Functioning (DIF) is a common source of item bias in educational measurement scales. DIF occurs when responses to an item not only reflects the measured construct, but also non-construct characteristics, e.g. sex or age, or put differently, when an item is not conditionally independent of background variables given the latent construct. If not accounted for, DIF can lead to differences in scores for subgroups that do not reflect true differences, and in some cases this can lead to Type I or Type II errors in the results. DIF is most often dealt with by the classical approach of eliminating any items functioning differentially (Holland & Wainer, 1993) in order to adhere to this requirement for validity (Meredith, 1993). This is only practical if you have items to lose and this is often not the case in non-ability, diagnostic or screening tests.
Methodological design
For Rasch models with evidence of uniform DIF (Hanson, 1998), the results indicate that item responses fit a Rasch model for
each of the subgroups for which one or more items function differentially. It is thus possible to “split for DIF”, which effectively means that person parameter estimates are estimated for each of these subgroups separately, and since these estimates are on the same logit scale (theta), they are still comparable and can be used in subsequent statistical analyses
and for individual assessment, and essential validity is maintained (Kreiner & Christensen, 2007). However, many practitioners prefer to work with the sum scores and not the person parameter estimates resulting from the Rasch models. However on of the properties of the Rasch model, and only the Rasch model is that the sum scores are sufficient for the person parameter
estimates, and there is a one-to-one relationship between these (Rasch, 1960). This we utilize to equate the sum scores for DIF (Kreiner & Nielsen, 2013).
Expected conclusions/findings
The presentation will briefly describe how DIF equating is done using real data from a well-being survey among Danish school children to illustrate the effect of equating for DIF/not equating for DIF both at the individual person level and the group level. Data is curtesy of the non-profit organization “Børns Vilkår”.
Relevance to Nordic educational research
Adjusting for DIF is relevant for all measurement in education.
Hanson, B. (1998). Uniform DIF and DIF defined by differences in item response functions. Journal of Educational and Behavioral Statistics, 23 (3), 244-253.
Holland, P. W. & Wainer, H. (1993). Differential item functioning. Routledge, London.
Kreiner, S., & Christensen, K. B. (2007). Validity and objectivity in health-related scales: Analysis by graphical loglinear Rasch models. In von Davier, & Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 329–346). New York: Springer. DOI: 10.1007/978-0-387-49839-3_21.
Kreiner, S. & Nielsen, T. (2013). Item analysis in DIGRAM 3.04. Part I: Guided tours. Research report 13/6. Department of Biostatistics, University of Copenhagen
Meredith, W. (1993). Measurement invariance, factorial analysis and factorial invariance. Psychometrika, 58(4), 525-543. DOI:10.1007/BF02294825
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests . Danish Institute for Educational Research, Copenhagen.
36
How to adjust the sum score for Differential Item Funtioning in educational measurement.
Theoretical framework
Differential Item Functioning (DIF) is a common source of item bias in educational measurement scales. DIF occurs when responses to an item not only reflects the measured construct, but also non-construct characteristics, e.g. sex or age, or put differently, when an item is not conditionally independent of background variables given the latent construct. If not accounted for, DIF can lead to differences in scores for subgroups that do not reflect true differences, and in some cases this can lead to Type I or Type II errors in the results. DIF is most often dealt with by the classical approach of eliminating any items functioning differentially (Holland & Wainer, 1993) in order to adhere to this requirement for validity (Meredith, 1993). This is only practical if you have items to lose and this is often not the case in non-ability, diagnostic or screening tests.
Methodological design
For Rasch models with evidence of uniform DIF (Hanson, 1998), the results indicate that item responses fit a Rasch model for
each of the subgroups for which one or more items function differentially. It is thus possible to “split for DIF”, which effectively means that person parameter estimates are estimated for each of these subgroups separately, and since these estimates are on the same logit scale (theta), they are still comparable and can be used in subsequent statistical analyses
and for individual assessment, and essential validity is maintained (Kreiner & Christensen, 2007). However, many practitioners prefer to work with the sum scores and not the person parameter estimates resulting from the Rasch models. However on of the properties of the Rasch model, and only the Rasch model is that the sum scores are sufficient for the person parameter
estimates, and there is a one-to-one relationship between these (Rasch, 1960). This we utilize to equate the sum scores for DIF (Kreiner & Nielsen, 2013).
Expected conclusions/findings
The presentation will briefly describe how DIF equating is done using real data from a well-being survey among Danish school children to illustrate the effect of equating for DIF/not equating for DIF both at the individual person level and the group level. Data is curtesy of the non-profit organization “Børns Vilkår”.
Relevance to Nordic educational research
Adjusting for DIF is relevant for all measurement in education.
Hanson, B. (1998). Uniform DIF and DIF defined by differences in item response functions. Journal of Educational and Behavioral Statistics, 23 (3), 244-253.
Holland, P. W. & Wainer, H. (1993). Differential item functioning. Routledge, London.
Kreiner, S., & Christensen, K. B. (2007). Validity and objectivity in health-related scales: Analysis by graphical loglinear Rasch models. In von Davier, & Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 329–346). New York: Springer. DOI: 10.1007/978-0-387-49839-3_21.
Kreiner, S. & Nielsen, T. (2013). Item analysis in DIGRAM 3.04. Part I: Guided tours. Research report 13/6. Department of Biostatistics, University of Copenhagen
Meredith, W. (1993). Measurement invariance, factorial analysis and factorial invariance. Psychometrika, 58(4), 525-543. DOI:10.1007/BF02294825
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests . Danish Institute for Educational Research, Copenhagen.
36
| Original language | English |
|---|---|
| Title of host publication | Book of Abstracts, NERA 2024 |
| Number of pages | 1 |
| Publisher | Nordic Educational Research Association (NERA) |
| Publication date | 6 Mar 2024 |
| Pages | 36 |
| Publication status | Published - 6 Mar 2024 |
Fingerprint
Dive into the research topics of 'DIF equating in Rasch models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver