Background: The unanticipated difficult airway is a potentially life-threatening event during anaesthesia or acute conditions. An unsuccessfully managed upper airway is associated with serious morbidity and mortality. Several bedside screening tests are used in clinical practice to identify those at high risk of difficult airway. Their accuracy and benefit however, remains unclear., Objectives: The objective of this review was to characterize and compare the diagnostic accuracy of the Mallampati classification and other commonly used airway examination tests for assessing the physical status of the airway in adult patients with no apparent anatomical airway abnormalities. We performed this individually for each of the four descriptors of the difficult airway: difficult face mask ventilation, difficult laryngoscopy, difficult tracheal intubation, and failed intubation., Search Methods: We searched major electronic databases including CENTRAL, MEDLINE, Embase, ISI Web of Science, CINAHL, as well as regional, subject specific, and dissertation and theses databases from inception to 16 December 2016, without language restrictions. In addition, we searched the Science Citation Index and checked the references of all the relevant studies. We also handsearched selected journals, conference proceedings, and relevant guidelines. We updated this search in March 2018, but we have not yet incorporated these results., Selection Criteria: We considered full-text diagnostic test accuracy studies of any individual index test, or a combination of tests, against a reference standard. Participants were adults without obvious airway abnormalities, who were having laryngoscopy performed with a standard laryngoscope and the trachea intubated with a standard tracheal tube. Index tests included the Mallampati test, modified Mallampati test, Wilson risk score, thyromental distance, sternomental distance, mouth opening test, upper lip bite test, or any combination of these. The target condition was difficult airway, with one of the following reference standards: difficult face mask ventilation, difficult laryngoscopy, difficult tracheal intubation, and failed intubation., Data Collection and Analysis: We performed screening and selection of the studies, data extraction and assessment of methodological quality (using QUADAS-2) independently and in duplicate. We designed a Microsoft Access database for data collection and used Review Manager 5 and R for data analysis. For each index test and each reference standard, we assessed sensitivity and specificity. We produced forest plots and summary receiver operating characteristic (ROC) plots to summarize the data. Where possible, we performed meta-analyses to calculate pooled estimates and compare test accuracy indirectly using bivariate models. We investigated heterogeneity and performed sensitivity analyses., Main Results: We included 133 (127 cohort type and 6 case-control) studies involving 844,206 participants. We evaluated a total of seven different prespecified index tests in the 133 studies, as well as 69 non-prespecified, and 32 combinations. For the prespecified index tests, we found six studies for the Mallampati test, 105 for the modified Mallampati test, six for the Wilson risk score, 52 for thyromental distance, 18 for sternomental distance, 34 for the mouth opening test, and 30 for the upper lip bite test. Difficult face mask ventilation was the reference standard in seven studies, difficult laryngoscopy in 92 studies, difficult tracheal intubation in 50 studies, and failed intubation in two studies. Across all studies, we judged the risk of bias to be variable for the different domains; we mostly observed low risk of bias for patient selection, flow and timing, and unclear risk of bias for reference standard and index test. Applicability concerns were generally low for all domains. For difficult laryngoscopy, the summary sensitivity ranged from 0.22 (95% confidence interval (CI) 0.13 to 0.33; mouth opening test) to 0.67 (95% CI 0.45 to 0.83; upper lip bite test) and the summary specificity ranged from 0.80 (95% CI 0.74 to 0.85; modified Mallampati test) to 0.95 (95% CI 0.88 to 0.98; Wilson risk score). The upper lip bite test for diagnosing difficult laryngoscopy provided the highest sensitivity compared to the other tests (P < 0.001). For difficult tracheal intubation, summary sensitivity ranged from 0.24 (95% CI 0.12 to 0.43; thyromental distance) to 0.51 (95% CI 0.40 to 0.61; modified Mallampati test) and the summary specificity ranged from 0.87 (95% CI 0.82 to 0.91; modified Mallampati test) to 0.93 (0.87 to 0.96; mouth opening test). The modified Mallampati test had the highest sensitivity for diagnosing difficult tracheal intubation compared to the other tests (P < 0.001). For difficult face mask ventilation, we could only estimate summary sensitivity (0.17, 95% CI 0.06 to 0.39) and specificity (0.90, 95% CI 0.81 to 0.95) for the modified Mallampati test., Authors' Conclusions: Bedside airway examination tests, for assessing the physical status of the airway in adults with no apparent anatomical airway abnormalities, are designed as screening tests. Screening tests are expected to have high sensitivities. We found that all investigated index tests had relatively low sensitivities with high variability. In contrast, specificities were consistently and markedly higher than sensitivities across all tests. The standard bedside airway examination tests should be interpreted with caution, as they do not appear to be good screening tests. Among the tests we examined, the upper lip bite test showed the most favourable diagnostic test accuracy properties. Given the paucity of available data, future research is needed to develop tests with high sensitivities to make them useful, and to consider their use for screening difficult face mask ventilation and failed intubation. The 27 studies in 'Studies awaiting classification' may alter the conclusions of the review, once we have assessed them.