Background & Context: Accurate, reliable, and scalable measurement of classroom quality is critical to ensure that children benefit from early childhood programs. Early childhood classroom quality is most commonly measured using classroom observation tools serving multiple purposes including (1) teacher support (e.g., coaching, professional development), (2) accountability systems (e.g., quality rating and improvement systems), and (3) education research. Education policymakers have expressed interest in using video technology to improve the accuracy, usefulness, and reduce the costs of classroom observations (Curby et al., 2016; Kane et al., 2020). The use of video recordings might provide classroom teachers with greater agency in the timing of observations and opportunities to review the results (Teachstone & Intentional Futures, 2023). Video might also enable researchers to better understand how early educational experiences differ for racially, culturally, and linguistically minoritized learners (Meek et al., 2022). Although some prior research has documented the comparability of classroom observations in homogenous contexts (Curby et al., 2016), little is known about the reliability of video-based observations in the wider range of publicly funded preschool classrooms, nor the extent to which they may feasibly be implemented to support teacher coaching, improve accountability systems, and provide data on classrooms. Objective: We address this gap by examining the reliability of scores from video based and live observations of early childhood classrooms. We do this with two observation tools commonly used in early childhood settings: the Classroom Assessment Scoring System PreK-3rd (CLASS 2nd edition; Pianta & Hamre, 2022) and the Early Childhood Environment Rating Scales (ECERS-3; Harms et al., 2014). We address three research questions: (1) Do live and video observations, respectively, have sufficient inter-rater reliability (IRR) within-condition?; (2) Do live and video observations have sufficient between-condition IRR (i.e., do observers assign similar ratings to the same observation period scored on video and live)?; and (3) Do scores on domains/subscales and dimensions/items vary systematically across live and video-based coders? Additionally, we will address exploratory questions about the feasibility of video-based classroom observations using data from teacher surveys and interviews with teachers and program leaders. Setting: The study team conducted observations in preschool programs in Washington DC, Maryland, Massachusetts, and Virginia, including public pre-K, public charter schools, community-based childcare centers, faith-based programs, and cooperative nursery schools. Participants: The sample includes 100 CLASS and 60 ECERS observations. Most participating lead teachers (88%) identified as women; 6% identify as Asian, 26% identify as Black, 16% identify as Hispanic/Latine, 42% identify as White, and 6% identify as multiracial. Most teachers (77%) had at least a B.A. Years of teaching experience ranged from 1 to 45 years. Practice: This study focuses on identifying whether the practice of conducting classroom observations via video is reliable, feasible, and useful to support teacher professional development, accountability systems, and research. We include a tool focused on process quality (CLASS) and a tool that includes both process and structural quality (ECERS-3). Research Design: This mixed-methods study (preregistered at https://osf.io/hy638) includes psychometric analyses of reliability and differential item functioning, descriptive analysis of teacher survey data, and thematic analysis using inductive coding methods to synthesize findings across qualitative teacher and program leader interviews. Data Collection and Analysis: Certified CLASS 2nd edition and ECERS-3 observers conducted observations live while simultaneously recording the classroom to be coded later by a different certified observer. Videos were recorded using a stationary recording device that rotated to track the lead teacher's movements and microphones worn by teachers and observers. Following the observation, all lead teachers were invited to complete a survey about the observation experience and the usefulness, burden, and perceived accuracy of observations. Selected teachers and program directors were also invited to participate in interviews. To understand whether live and video observations produce reliable and equivalent ratings, we first calculated within-condition IRR separately for live and video observations. Second, we assessed IRR between conditions by calculating agreement where the live and video observations, respectively, are considered different raters. Third, we used multivariate regression models to predict scores from condition, controlling for observer to account for the fact that observers were not randomly distributed across live and video observations. To provide insights on exploratory questions about feasibility, we will incorporate descriptive statistics from teacher surveys and use inductive coding methods to identify themes from the teacher and program leader interviews. Preliminary Results: Preliminary results indicate that video-based scoring of the CLASS has sufficient within-condition IRR (88%, above the standard cut-off of 80%; Landis & Koch, 1977). Additionally, observers have sufficient between-condition IRR, as preliminary results indicate 89% percent-within-one agreement between live and video raters of the same observation. Yet despite these levels of reliability, scores reflected somewhat higher quality teaching for live observations than video across several dimensions of the CLASS (Figure 1). As such, preliminary evidence indicates that despite meeting reliability standards, using video observations interchangeably with live scores or in comparison to quality cut-offs based on live scores may disadvantage classrooms observed via video. Next steps for analysis include (1) re-estimating analyses on the final sample; (2) replicating the analyses for the CLASS with the ECERS; and (3) conducting descriptive and thematic analysis of survey and interview data. Conclusions: Understanding the reliability, feasibility, and utility of video-based classroom observations is essential as the educational landscape integrates increasingly advanced technology into professional development, accountability, and research. Preliminary findings highlight how evidence of the effectiveness of new technologies in education should be considered with insights from practitioners and in light of the distinct use case scenarios. For example, results suggest video-based CLASS observations have sufficient reliability and as such, may be useful for instructional coaching. However, the systematic differences in scores across live and video observations suggests that, if used in high-stakes accountability settings such as QRIS, programs or teachers scored on video may be disadvantaged compared to those scored live. While the results of this study focus on current use cases, results have implications for scenarios that education researchers and practitioners may encounter in the future.