Background: Mission US is a series of interactive first-person role-playing history games and curricular materials that address a critical problem: students lack fundamental knowledge of our nation's history. According to the most recent National Assessment of Educational Progress (NAEP, 2018), only 13% of Grade 8 students were proficient on the U.S. history assessment. This randomized controlled trial (RCT) is the first large scale, national study to evaluate the impact of Mission US on students' history knowledge and skills. Purpose/Research Questions: This study investigates the impact of two Mission US games and accompanying curricular materials on Grade 810 students' U.S. history outcomes, and is guided by three confirmatory research questions: Relative to business-as-usual classrooms, does using Mission US impact students': 1) historical content knowledge; 2) ability to analyze historical documents; and 3) motivation to study history? The study also explores the potential mediating effect of historical empathy on these outcomes. We hypothesize that Mission US will improve students' historical knowledge and thinking skills, especially historical empathy, perspective-taking, and the analysis of historical documents. Setting: Researchers recruited teachers at public, private, and charter schools from 25 states across the U.S. Orientation, professional development, and assessments were virtual, but teachers implemented the intervention in their regular classroom settings. Participants: Participants included 65 Grade 8-10 social studies teachers at 60 schools and their 1,305 students, enrolled in two groups (Spring/Fall 2023). Schools were heterogeneous (e.g., traditional public schools, virtual, military leadership academy, religious schools); see Table 1 for a description of participants by treatment status. Teachers' average age was 43, and 67.7% identified as female. Teachers averaged 14 years of teaching experience; 93.8% had a social studies teaching certificate. Of the students, 42.9% were in Grade 8, 20.3% in Grade 9, and 36.8% in Grade 10. Intervention: Intervention teachers used two Mission US games and materials (12 hours total), during a three-month implementation period. Teachers were provided with curriculum guides for each era with suggested pacing. After baseline assessments and randomization, intervention teachers received 60-minutes of professional development, introducing the Mission US games, curriculum, website, and dashboard. Control teachers used an analogous website, with background documents to ensure equivalent historical content across all classrooms. Research Design: The team conducted an RCT to evaluate the impact of Mission US on students' history outcomes. Researchers randomly assigned teachers to intervention or control group after baseline assessments. Due to recruitment challenges, researchers shifted from a model with two social studies teachers per school and randomizing within school to instead randomizing at the school level, with students nested within school. All measures and procedures were pilot-tested with 11 teachers and their 234 students. To mitigate possible biases, researchers pre-registered the study design in the Registry of Educational Effectiveness Studies (REES) and collected baseline data prior to randomization; attrition was low and met What Works Clearinghouse standards. Data Collection and Analysis: All surveys and assessments were administered digitally using Qualtrics. Teachers administered baseline and post-assessments to students. Teachers completed demographic and curriculum surveys at the start of the study; weekly activity logs and virtual classroom observations monitored fidelity of implementation. Forty-three teachers provided de-identified student-level data, including demographics, prior semester GPA, attendance, enrollment, ELL, and IEP/504 plan status. Student outcomes were measured as follows: Research Question 1: Historical Content Knowledge. Using the NAEP Questions Tool, the team identified 24 multiple-choice Grade 8 questions. See Table 2 for a breakdown of question type on pre- and post-test (Cronbach's alpha=0.76-0.81; Hedges' g = 0.15). Research Question 2: Analysis and Interpretation of Historical Documents. Using the NAEP Questions Tool, the team identified 15 Grade 8 document-based multiple-choice questions and distributed these evenly across pretest (Cronbach's alpha=0.64-0.69; Hedges' g= 0.13). (See Table 2.) Research Question 3: Motivation to Study History. Students' interest and motivation to study history were measured with two instruments: the Individual Interest Questionnaire (Cronbach's alpha=0.86-0.88; Hedges' g = 0.005) and the Interest and Enjoyment Index from the 2018 NAEP U.S. History Student Questionnaire (Cronbach's alpha=0.80-0.85; Hedges' g = 0.005). Research Question 4: Historical Empathy. The research team used the approach in Hartmann and Hasselhorn's (2008) Historical Perspective Taking measure and the theoretical framework from Endacott and Brooks (2013, 2018) to develop a measure of historical empathy. The Historical Empathy Measure includes two alternate forms (Cronbach's alpha=0.78; Hedges' g = 0.029), containing 15 multiple-choice questions, 5 questions for each core component (i.e., historical contextualization, perspective-taking, affective connection). Researchers established baseline equivalence between intervention and control students for all measures. Psychometric analyses indicate acceptable reliability for all instruments. After descriptive analyses, researchers subjected each student outcome to a student-nested-within-school multilevel regression analysis where the outcome is regressed on the treatment indicator, outcome at baseline, and group-enrollment indicator (Fall/Spring). We are currently adding student and school-level covariates to the models, which will be particularly important given the heterogeneity of the schools sampled. Additional moderator and exploratory mediator analyses are underway. Findings/Results: Preliminary analyses without covariates demonstrated a positive but not significant difference in outcomes between students in the intervention and control groups for all outcome measures. These null findings may be a result of the study design changes discussed above. Using the current two-level cluster design, we plan to statistically control for differences in school characteristics to improve statistical power. Currently, our team is refining analyses and using publicly available federal data to fill in school-level variables and address missing student-level data. We will be able to fully report on our analyses by the conference. Qualitative data from teacher focus groups show Mission US impacted students' interest and, given our theory of change, we plan to explore subgroup differences in baseline interest level and differences by student characteristics. Preliminary subgroup analyses indicate that there may be heterogenous treatment effects. Conclusions Our RCT provides a case study on the tensions between sample heterogeneity and detecting impact, as well as a novel measure of historical empathy. Additional moderator and subgroup analyses will allow us to specify for which students and under what conditions Mission US might exhibit a positive impact.