Is There A PARCC Mode Effect? - Harvard University

4m ago
765.59 KB
10 Pages
Last View : 1m ago
Last Download : n/a
Upload by : Cannon Runnels

SDP FELLOWSHIPCAPSTONE REPORT 2016Is There a PARCC ModeEffect?Matthew Duque, Baltimore County Public Schools

Executive SummaryWhile the majority of state-mandated testing in K–8 was administered online in the2015–2016 school year, there is some evidence that results from online tests are notcomparable to those from traditional paper and pencil tests. Using data from a large schooldistrict, the present study examined the extent to which student performance on the first yearof the Partnership for Assessment of Readiness for College and Careers (PARCC) exam wasrelated to test mode. Controlling for student and school characteristics, results indicate thatstudents who tested on paper scored substantially higher than students who tested online,suggesting that the online format tests more than content knowledge.Strategic Data Project Fellowship Capstone ReportsStrategic Data Project (SDP) Fellows author capstone reports to reflect the work thatthey led in their education agencies during the two-year program. The reports demonstrateboth the impact fellows make and the role of the SDP network in supporting their growth asdata strategists. Additionally, they provide recommendations to their host agency and mayserve as guides to other agencies, future fellows and researchers seeking to do similar work.The views or opinions expressed in this report are those of the authors and do not necessarilyreflect the views or position of the Center for Education Policy Research at Harvard University.2

In the 2015–2016 school year, the majority of state-mandated testing in K–8 wasadministered online (Ed Tech Strategies, 2015). Online testing offers several advantages overtraditional paper and pencil tests, including new ways to assess student understanding, time andcost savings in scoring and score reporting, and increased security. However, some studies havefound online exam scores to be incomparable to paper versions of the same test (Choi & Tinkler,2002; Coon, McLeod, & Thissen, 2002). Potential mode differences in mandated state K–8 testresults would bias the plethora of schooling decisions that rely on summative test scores,including student grade promotion and retention, course placement, and school accountabilitymeasures.There are three categories of reasons why test scores might not be comparable betweentest modes: presentation characteristics, response requirements, and general administrationcharacteristics (Bennett, 2003). Presentation characteristics include the number of items that fiton a screen versus a page of paper, as well as differences in the size of font used between themodes. The second category, response requirements, includes any differences in requirements fornavigating the test and recording answers between a traditional paper and pencil test and acomputer exam. The third category of mode differences comprises general administrationcharacteristics, such as whether the test is adaptive or fixed form and whether the timing of eachsection and the overall test are similar. Research generally indicates that “the more complicatedit is to present or take the test on computer, the greater the possibility of mode effects”(Pommerich, 2004, p. 3). For example, tests that do not require any navigation and only askmultiple choice questions are more likely to be comparable across modes, while those thatrequire scrolling through long reading passages and written responses are less likely to becomparable across modes.3

The Present StudyThe present study examined the extent to which student performance on the Partnershipfor Assessment of Readiness for College and Careers (PARCC) was related to test mode in thefirst year of administration. In the 2015–2016 school year, seven states and the District ofColumbia were members of the PARCC consortium. Although PARCC found no test modedifferences at the item level in the 2013–2014 pilot year, a student-level analysis of scoredifferences in the first year of testing may indicate different results. While the majority ofresearch on test mode effects focuses on pilot tests or low- or no-stakes tests, the present study isone of the first to examine differences in test mode results of a state-mandated K–8 assessment inits first year of implementation.This study uses data from Baltimore County Public Schools (BCPS), a large schooldistrict in which not all schools were equipped to test online. Over 46,000 students in Grades 3–8in 106 elementary and 28 middle schools in BCPS were given the PARCC exam in two testadministrations during the 2014–2015 school year. Test mode was determined on a school-byschool basis, according to each school’s ratio of students to computers. In math, 53% of studentstested online; in English/language arts (ELA), 29% of students tested online.1 Table 1 shows thatstudents who tested online were more likely to be Black or Hispanic and to qualify for free andreduced meals (FARMS); on average, students who tested online also had lower priorachievement.2Thresholds for testing online were lower in math than in ELA because PARCC’s online math calculator allowedthe district to forego the purchase of handheld calculators in schools that tested online.2These unexpected differences are likely a result of the recent infusion of technology into the district’s historicallyunder-resourced Title I schools.14

Table 1Student and School Characteristics of Paper and Online Test TakersAsianBlackHispanic/LatinoTwo or More RacesWhiteFemaleFARMsSpecial EducationELLsGiftedPrior N19,59421,78832,85413,377Note. Only includes students who took both tests (PBA and EOY) in the same mode.*p 0.05. **p 0.01. ***p .001.Hierarchical linear modeling (HLM) was used to compare differences in students’PARCC scale scores based on test mode, with students nested in schools. HLM accounts for twolevels of variation in the outcome—variation within schools and variation between schools. (Seeappendix for more information on methods.) Student and average school demographics andsame-subject prior achievement were included as covariates to control for the non-randomassignment of schools to online testing.ResultsResults indicate that there was a statistically significant mode effect in all subject–gradecombinations. Table 2 shows that, after controlling for student demographics and priorachievement, students who took PARCC on paper scored substantially higher than students whotook the exam electronically. On average, students who tested online scored between 3 and 11percentile points lower than their peers who tested on paper in math, and between 11 and 185

percentile points lower in ELA. Within ELA, the effect was larger on the writing portion of thetest than on the reading portion. Further, the paper advantage was larger in middle grades thanprimary grades, potentially due to different test response requirements. No consistent interactioneffects were found between test mode and prior student achievement or student demographics.Table 2Estimated Differences in PARCC Scores by Test Mode, in Standard ithoutWithWithoutWithWritingWithoutWithGrade Controls Controls Controls Controls Controls Controls Controls *****80.830.230.770.330.660.250.850.45***Note. Positive differences indicate higher scores for paper test takers; estimates with controls arecalculated using a multilevel model.*p .05. **p .01. ***p .001.DiscussionThe test mode effect found in this study suggests that the online PARCC exam measuredmore than subject matter. Two possible explanations for a mode effect are students’ lack ofexperience with computer tests and the response requirements of the electronic version of thetest. Two elements of the present study favor the latter explanation. First, all BCPS students inGrades 3–8 were administered a non-PARCC formative assessment on computers prior to thePARCC test, suggesting that they had at least some experience with computer tests. Second, thevariation in mode effects by subject and school level—Grades 3–5 versus Grades 6–8—alignwith differences in PARCC’s response requirements by subject and grade.6

A mode effect on a multi-state, state-mandated summative assessment has substantialimplications. In states and districts that tested completely online, there was no observed modeeffect; however, there still exists a theoretical mode effect that may bias results away from a truescore. Biased assessment scores can effectively invalidate school accountability ratings as wellas any schooling decisions that rely on these data, including students’ course placement.States are moving towards testing all students online but not all schools are equipped to do so. Inthe meantime, there is a solution to address mode effects. Test producers should examine modecomparability and, when a mode effect exists, adjust the scale scores of each mode to eliminateany mode disadvantage.7

ReferencesBennett, R. E. (2003). Online assessment and the comparability of score meaning (ETS ResearchMemorandum RM-03-05). Princeton, NJ: ETS.Choi, S. W., & Tinkler, T. (2002). Evaluating comparability of paper-and-pencil and computerbased assessment in a K–12 setting. Paper presented at the annual meeting of theNational Council on Measurement in Education, New Orleans, LA.Coon, C., McLeod, L., & Thissen, D. (2002). NCCATS update: Comparability results of paperand computer forms of the North Carolina end-of-grade tests (RTI Project No.08486.001). Raleigh, NC: North Carolina Department of Public Instruction.EdTech Strategies. (2015). Pencils down: The shift to online and computer-based testing.Retrieved from /2015/11/PencilsDownK-8 EdTech-StrategiesLLC.pdfPommerich, M. (2004). Developing computerized versions of paper-and-pencil tests: Modeeffects for passage-based tests. Journal of Technology, Learning, and Assessment, 2(6).8

Appendix:MethodsIn 2014–2015, the PARCC exam was given in two administrations, a Performance BasedAssessment (PBA) in early spring and an End of Year (EOY) assessment in late spring. In thisanalysis, we have only included students who took both administrations in the same mode, whichwas over 99.5% of students in both math and ELA.Hierarchical linear modeling (HLM) was used to compare differences in students’PARCC scale scores based on test mode, with students nested in schools. HLM accounts for twolevels of variation in the outcome—variation within schools and variation between schools. Thefollowing equation was modeled:𝑃𝐴𝑅𝐶𝐶𝑖𝑗 𝛾00 𝛾00 𝛾01 𝑀𝑒𝑎𝑛𝑋𝑗 𝛾10 𝑃𝑟𝑖𝑜𝑟𝐴𝑐ℎ𝑖𝑗 𝛾20 𝑋𝑖𝑗 𝜀𝑖𝑗 𝑟0𝑗where PARCCij is the standardized PARCC scale score of student i in school j; MeanPriorAchj isthe average prior achievement score of school j; MeanXj is a factor of average school-levelstudent characteristics, including race/ethnicity, free and reduced-price meal (FARM) status,English language learner (ELL) status, special education status, and gifted status; PriorAchij isthe same-subject, same-year winter MAP score of student i in school j; Xij is a factor of studentcharacteristics, including race/ethnicity, FARM status, ELL status, special education status, andgifted status; and 𝜀𝑖𝑗 and 𝑟0𝑗 are the student-level and school-level error terms, respectively.Student- and school-level demographics and prior achievement are included to control for thenon-random selection of schools into the online test mode.Since PARCC scores were not vertically scaled across grades, scores were standardizedby subject and grade within the district, and the above equation was estimated separately by9

grade and subject. Separate models that include interaction terms between test mode and studentcharacteristics were also investigated to examine potential differential effects of test mode.BCPS students in Grades 3–8 took the computer-adaptive MAP test in both the fall andwinter. The winter test was utilized as the control for prior achievement due to its temporalproximity to the spring PARCC exam. This proximity minimizes any potential correlationbetween school value-added and PARCC test mode. It should also be noted that the use of acomputer-based test as a control for prior achievement likely downwardly biased the estimates inthe present study by essentially removing any potential effect of inexperience with computertesting.10

of the Partnership for Assessment of Readiness for College and Careers (PARCC) exam was related to test mode. Controlling for student and school characteristics, results indicate that students who tested on paper scored substantially higher than students who tested online, suggesting