Manual Pupil Evaluation in Clinical Practice
Pupillary assessment is a fundamental component of the neurological examination, particularly critical in patients with head injuries, stroke, and other neurological conditions. Traditional methods rely on subjective evaluation using a flashlight or penlight, a practice that remains widespread across emergency departments, intensive care units, and neurosurgical wards. However, mounting evidence reveals significant limitations in the reliability and accuracy of manual assessment methods, raising questions about their adequacy for detecting clinically meaningful neurological changes.
Systematic Measurement Errors in Manual Assessment
Research demonstrates that critical care and neurosurgical nurses consistently underestimate pupil diameter when using subjective assessment methods. The magnitude of error increases proportionally with pupil size, creating a pattern of systematic bias that becomes more pronounced as pupils dilate. While pupils measuring 1 mm showed minimal error with a mean difference of only 0.1 mm, pupils measuring 8 mm were underestimated by an average of 1.2 to 1.4 mm (Kerr et al., 2016). This systematic bias becomes particularly problematic in neurological emergencies where accurate detection of pupil dilation may indicate cerebral ischemia or herniation, conditions requiring immediate intervention.
The pattern of underestimation was consistent across multiple testing conditions, from simple black and white drawings to photographs of human eyes to actual bedside patient assessments. Notably, accuracy began to deteriorate significantly when the objective pupil measurement exceeded 4.0 mm, suggesting a critical threshold beyond which subjective assessment becomes increasingly unreliable (Kerr et al., 2016). When actual pupil size was 4.5 mm or greater, only 37 to 54 percent of nurses correctly identified the pupil as being in this larger range. This threshold effect has profound clinical implications, as larger and enlarging pupils often signal serious neurological complications that demand urgent recognition and treatment.
Poor Reliability Between and Within Observers
A comprehensive study examining 2,329 paired assessments across 222 practitioners revealed alarmingly low interrater reliability for manual pupillary examinations. Practitioner agreement was only moderate for pupil size with a kappa value of 0.54, moderate for shape with a kappa of 0.62, and fair for reactivity with a kappa of 0.40 (Olson et al., 2015). When considering the complete pupillary assessment encompassing size, shape, and reactivity together, overall agreement was poor with a kappa of only 0.26. These statistical measures reveal that even among trained professionals working in critical care and neurosurgical settings, there is substantial disagreement about fundamental pupillary characteristics.
The problem extends beyond simple interrater disagreement to include concerning levels of intrarater inconsistency. When nurses were presented with duplicate images of the same pupils at different time points during the study, they measured the pupils consistently only 49 to 55 percent of the time (Kerr et al., 2016). Even more troubling, when nurses saw the exact same photographs at two different time points, they measured the pupils consistently and correctly only 11.7 percent of the time. This internal inconsistency raises serious concerns about serial monitoring, where a single nurse may conduct multiple assessments on the same patient during a shift and compare current findings to prior measurements to detect neurological changes.
Critical Failures in Detecting Abnormal Pupils
The most concerning findings relate to detection of clinically significant abnormalities, particularly non-reactive or "fixed" pupils. The presence of a non-reactive pupil in a patient with acute neurological disease is considered an event of vital importance and often triggers emergency diagnostic procedures such as stat brain CT imaging and therapeutic interventions such as mannitol infusion or hypertonic saline administration. However, manual assessment demonstrated severe limitations in this critical area.
Among the 2,329 paired practitioner assessments, only 49.7 percent of pupils scored as fixed by one practitioner were confirmed as fixed by a second practitioner (Olson et al., 2015). This means that in roughly half of the cases where one trained clinician identified a potentially life-threatening finding, a second equally trained clinician examining the same patient within minutes disagreed with that assessment. The comparison with automated pupillometry revealed even more concerning patterns. Of 189 practitioner observations of fixed pupils, only 63, or 33.3 percent, were confirmed as non-reactive by automated pupillometry. This suggests a high rate of false-positive findings where practitioners identified pupils as non-reactive when objective measurement demonstrated residual reactivity.
The converse problem also existed, with practitioners missing truly abnormal pupils. When the pupillometer identified 83 observations of non-reactive pupils, the first practitioner correctly identified non-reactivity in only 69.9 percent of cases, while the second practitioner achieved only 55.4 percent accuracy (Olson et al., 2015). These false-negative findings are particularly dangerous, as they represent missed opportunities to detect neurological deterioration and initiate timely intervention.
Detection of anisocoria, or unequal pupil sizes, was similarly problematic. In the photograph assessment phase, nurses correctly identified unequal pupils in only 33 percent of cases (Kerr et al., 2016). At the bedside with actual patients, when pupillometry determined that 31 of 242 paired pupil sets were unequal with diameters differing by 1.0 mm or greater, nurses correctly identified only 58.1 percent of these unequal sets. Conversely, nurses reported 40 sets as unequal, but only 17, or 42 percent, were confirmed as unequal by the pupillometer. In the trauma population, unequal pupils serve as one indicator of traumatic brain injury, making accurate detection clinically essential.
Inconsistent Techniques and Inadequate Reactivity Assessment
Despite the availability of measurement tools designed to improve accuracy, nearly all nurses in the study, specifically 93 percent, relied on subjective estimation (Kerr et al., 2016). This preference for unaided visual assessment over available tools reflects common clinical practice but perpetuates the measurement errors inherent in subjective evaluation. Additionally, although protocols typically specify the conditions under which pupillary examinations should occur, including recommendations for room lighting and angle of light shone in the eye, health care professionals are rarely compliant with these standardized conditions. This lack of compliance with established protocols further compounds the problems of inaccuracy and inconsistency in pupillary assessment.
Manual assessment showed high rates of both false-positive and false-negative findings for pupil reactivity, creating bidirectional errors that compromise clinical decision-making. In 33 assessments where the pupillometer provided a pupillary light reflex reading considered sluggish with PLR less than 3.0, nurses missed this abnormal sluggishness in 7 instances, representing a 21 percent false-negative rate (Olson et al., 2015). Conversely, in 444 assessments where the pupillometer provided a PLR reading considered normal, nurses reported a sluggish pupil in 77 of those instances, representing a 17 percent false-positive rate. This bidirectional error pattern suggests fundamental limitations in the human ability to accurately assess the speed and extent of pupillary constriction, particularly for subtle changes that fall in the borderline range between normal and abnormal.
The Challenge of Ambient Light in Pupillary Measurement
The influence of ambient lighting on pupillary measurements presents an additional challenge in clinical practice, where lighting conditions can vary by up to five orders of magnitude, from dim overnight conditions to bright daytime environments (Szczęśniewski et al., 2025). Standard pupillary light reflex parameters demonstrate substantial light dependence, with initial pupil diameter showing approximately 41% variation between dim and bright conditions, constriction amplitude varying by as much as 133%, and maximum constriction velocity differing by approximately 82% (Szczęśniewski et al., 2025). These fluctuations complicate interpretation: changes in pupillary metrics may reflect variations in ambient lighting rather than neurological deterioration.
Conventional infrared pupillometers have improved objectivity over manual penlight examination, but remain significantly affected by ambient lighting conditions, with studies documenting clinically relevant variations in measurements depending on room illumination (Ong et al., 2019). This environmental sensitivity affects both manual and traditional automated pupillometry systems, creating a need for measurement approaches that can correct for ambient light effects.
Quantitative Pupillometry: Standardized Objective Measurement
Automated pupillometry provides critical advantages over manual assessment that address the fundamental sources of error and variability inherent in subjective evaluation. The device delivers a standardized light stimulus of fixed intensity and duration at a consistent distance from the patient, eliminating the substantial variability that occurs when different practitioners use different flashlights or penlights with varying brightness, hold them at different distances, and shine them for different durations (Kerr et al., 2016). High-speed video recording captures the complete pupillary response, enabling the device to generate multiple objective measurements including maximum pupil diameter, minimum pupil diameter, and constriction velocity.
Quantitative scoring systems replace the subjective and poorly defined categories of "brisk," "sluggish," and "fixed" with objective numerical values that can be trended over time and compared across different observers and time points. The elimination of subjective interpretation removes a major source of measurement error and inconsistency. The pupillometer provides consistent, reproducible measurements that do not vary based on who performs the assessment or when the assessment occurs, eliminating the substantial intrarater and interrater variability that plagues manual assessment.
The PuRe Score: Lighting-Invariant Pupillary Assessment
The Pupil Reactivity (PuRe) Score represents an advancement in quantitative pupillometry through its introduction of a lighting-invariant metric designed to address the ambient light sensitivity that affects both manual and traditional automated pupillometry systems (Bogucki et al., 2024). The PuRe Score addresses ambient light variability through computational correction. In validation studies conducted across ambient light levels ranging from 4 to 1,200 lux, spanning nearly five orders of magnitude, the PuRe Score demonstrated stability, with median values of 2.64 in dim light versus 2.42 in bright light showing no statistically significant difference (p=0.449) (Szczęśniewski et al., 2025).
This stability was achieved through a two-step computational process: first, models were fitted to predict each pupillary parameter based on lighting conditions in healthy individuals; second, these predictions were used to subtract the effect of lighting, leaving a lighting-corrected metric that reflects how the parameter deviates from expected values given the ambient illumination (Bogucki et al., 2024). The resulting PuRe Score provides a measure of pupillary reactivity that remains stable across the lighting variations encountered in clinical environments, from emergency departments to intensive care units to pre-hospital settings.
The PuRe Score quantifies pupillary reactivity on a standardized scale from 0 to 5, where 0 indicates a non-reactive pupil, scores from 0-3 represent abnormal or "sluggish" responses, and scores from 3-5 indicate normal or "brisk" responses (Bogucki et al., 2024). The score is calculated through regularized multi-variable logistic regression that combines seven lighting-corrected pupillary parameters: initial pupil diameter, minimum pupil diameter, final pupil size, constriction amplitude, maximum constriction velocity, peak dilation velocity, and constriction latency. The formulae behind the score have been made openly available for the clinical research community, promoting transparency and enabling independent validation and refinement (Bogucki et al., 2024).
Smartphone-Based Implementation and Accessibility
The PuRe Pupillometer operates as smartphone-based software as a medical device (SaMD), using standard smartphone platforms rather than dedicated hardware (Szczęśniewski et al., 2025). This approach utilizes multi-core graphical processing units (GPUs) and neural processing units (NPUs), such as Apple's Neural Engine, to enable real-time application of artificial intelligence algorithms and multi-frame integration techniques including multi-frame super-resolution, temporal averaging, and parallax-based artifact mitigation.
The smartphone-based implementation allows clinicians to perform neurological evaluations without specialized hardware beyond a device that many healthcare providers already carry (Bogucki et al., 2024). This has implications across the continuum of care: emergency medical technicians can deploy quantitative pupillometry at accident scenes; transport teams can monitor patients during ambulance or helicopter transport; emergency department physicians can perform immediate assessments upon patient arrival; and intensive care nurses can conduct frequent serial measurements without equipment availability constraints (John et al., 2024; Boulter et al., 2021). The portability eliminates barriers such as equipment procurement costs, maintenance requirements, limited device availability, and the need to transport patients to where specialized equipment is located.
Measurement Precision Through Artificial Intelligence
The PuRe Pupillometer achieves measurement precision through an on-device computational pipeline that combines artificial intelligence with deterministic mathematical algorithms (Szczęśniewski et al., 2025). The system uses a deep neural network trained on more than one million manually-annotated eye images, which performs real-time pupil and iris segmentation across all iris colors: brown, hazel, blue, and green. This AI-driven segmentation runs on the smartphone's neural processing unit, while graphical processing units and central processing units handle deterministic signal processing and analysis to yield immediate display of pupillometry parameters.
Frame-level analysis demonstrates that the PuRe Pupillometer achieves pupil diameter accuracy of ±0.025 mm (approximately 25 micrometers), exceeding the typical 30-100 micrometer accuracy reported for conventional infrared pupillometers (Szczęśniewski et al., 2025). This precision is achieved through high-resolution smartphone cameras with AI-driven processing that uses image scene micro-movements and variable lighting as sources of enhanced measurement through multi-frame integration. High-speed video recording at 60 Hz captures the pupillary response during a five-second measurement protocol that includes a one-second baseline period, a one-second flash stimulation from the device's integrated light source, and a three-second recovery phase.
Enhanced Detection of Subtle Clinical Changes
The precision of quantitative pupillometry enables detection of subtle pupillary changes that may precede clinically apparent neurological deterioration by hours. Research indicates that pupillometer-detected changes in pupil reactivity can occur hours before changes in intracranial pressure become evident through other monitoring methods (Kerr et al., 2016). This early warning capability has profound implications for timely intervention in neurocritical care, where rapid identification of secondary brain injury can guide therapeutic decisions and potentially improve outcomes.
While a change in pupil size of 0.5 to 1.0 mm may not be reliably detected by manual assessment, automated pupillometry can accurately measure and document such changes, providing objective evidence of evolving pathology. Similarly, subtle decreases in the speed of pupillary constriction or minor reductions in the extent of constriction can be quantified before they become apparent to the human observer. This enhanced sensitivity creates opportunities for earlier diagnostic evaluation and therapeutic intervention before patients progress to more severe stages of neurological compromise.
Clinical Validation in Neurocritical Care
The clinical utility of the PuRe Score has been validated in neurocritical care settings. In a prospective observational study of 12 neuro-ICU patients with diverse diagnoses including hemorrhagic stroke (33.3%), traumatic brain injury (25.0%), brain tumor (16.7%), hydrocephalus (16.7%), and intracranial hypertension (8.3%), the PuRe Score demonstrated correlation with neurological status as measured by the Glasgow Coma Scale (Spearman ρ=0.746, p<0.001) (Szczęśniewski et al., 2025). This correlation was maintained despite the presence of common neurocritical care medications including propofol and fentanyl, which are known to affect pupillary responses.
For detecting severe neurological impairment, defined as Glasgow Coma Scale scores of 8 or less, a PuRe Score threshold of ≤3.0 achieved an area under the receiver operating characteristic curve of 0.940 (95% confidence interval 0.92-0.96) (Szczęśniewski et al., 2025). The threshold yielded sensitivity of 84.3%, specificity of 90.2%, overall accuracy of 86.0%, positive predictive value of 95.4%, and negative predictive value of 70.3%. These performance metrics indicate that the PuRe Score provides discrimination of neurological severity with high accuracy, enabling objective identification of patients requiring intensive monitoring and intervention.
Prognostic Value and Risk Stratification
The prognostic value of the PuRe Score is notable. Non-survivors exhibited markedly lower median PuRe Scores of 0.00 (interquartile range 0.00-1.98) compared to 2.82 (interquartile range 1.61-3.83) in survivors (p<0.001), with 91.7% of non-survivor recordings falling below the critical threshold of 3.0 (Szczęśniewski et al., 2025). This separation between survivor and non-survivor groups suggests utility for early identification of patients at highest risk for poor outcomes, enabling more informed prognostic discussions with families and potentially guiding decisions regarding intensity of treatment.
Serial Monitoring and Longitudinal Tracking
One application of the PuRe Score lies in its support for longitudinal tracking of neurological status through serial measurements. Case examples demonstrate the score's capacity to capture progressive strengthening of pupillary responses concordant with improving neurological status, with median daily PuRe values rising from the "sluggish" range into the "brisk" range as patients recover, crossing the clinically relevant threshold of 3.0 that separates severe (GCS≤8) from non-severe status (Szczęśniewski et al., 2025).
This capability for objective trending addresses limitations of manual assessment for serial monitoring. When a single practitioner performs multiple assessments over time, intrarater inconsistency rates of 49-55% documented with manual methods mean that apparent changes may reflect measurement variability rather than true clinical evolution (Kerr et al., 2016). Similarly, when different practitioners assess the same patient during shift changes, interrater reliability as low as kappa 0.26 for complete pupillary assessment means that documented changes between caregivers are often artifacts of measurement rather than genuine neurological deterioration or improvement (Olson et al., 2015).
The lighting-invariance is relevant in this context, as ambient lighting conditions naturally vary throughout the diurual cycle, from dim overnight conditions to bright daytime environments, and could otherwise introduce false trends in pupillary parameters. By correcting for these lighting effects, the PuRe Score ensures that tracked changes reflect alterations in neurological status rather than environmental variations. This reliability enables clinicians to identify subtle gradual deterioration that might warrant escalation of monitoring or intervention, as well as to recognize progressive improvement that might support de-escalation of care or inform prognostic discussions.
Deployment in Variable Clinical Environments
The robustness of the PuRe Pupillometer has been demonstrated in challenging environments. In the NEP2NE (Nautical Experiments in Physiology, Technology and Underwater Exploration) scientific mission, the system was deployed in a hyperbaric underwater habitat where three subjects performed pupillometry measurements at a depth of 22 feet under 1.6 atmospheres of pressure over 5 days (Bogucki et al., 2024). Despite illumination being over six times brighter at the surface (790±570 lux) compared to at depth (128±83 lux), the PuRe Score showed no significant difference between surface and depth conditions, demonstrating stability across dramatic lighting variations and under hyperbaric conditions.
The smartphone platform has been validated for use in dynamic, unstable conditions where handheld recording introduces motion artifacts. Advanced video stabilization methods utilizing deep feature matching with ConvNeXT-based neural networks, followed by implicit neural representation for pixel-wise refinement, correct for both lateral and axial motion artifacts that would otherwise distort measurements (John et al., 2024). This capability is relevant for pre-hospital emergency medicine applications in ambulances and helicopters, where vehicle movement combined with handheld device operation creates challenging recording conditions.
These validations in austere and dynamic environments demonstrate the practical deployment potential of smartphone-based quantitative pupillometry. Traditional hardware-based pupillometers, while effective in controlled clinical settings, face challenges in pre-hospital care due to their size, weight, power requirements, cost, and fragility. The smartphone-based PuRe Pupillometer, requiring no additional hardware beyond a device many clinicians already carry, offers a solution for extending quantitative pupillometry across the spectrum of care, from accident scenes to ambulance transport to emergency department triage to intensive care monitoring to ward-level surveillance.
Clinical Implications for Patient Care and Safety
The cumulative evidence demonstrates that manual pupillary assessment using flashlight or penlight is inadequate for detecting clinically meaningful changes in patients with neurological injury. The high rate of false-negative findings for non-reactive pupils, where practitioners miss abnormalities that are present, poses particular risk as these findings typically warrant urgent diagnostic evaluation and therapeutic intervention (Olson et al., 2015). When a practitioner fails to detect a non-reactive or sluggish pupil, opportunities for timely treatment may be lost, potentially allowing preventable secondary brain injury to progress.
The converse problem of false-positive findings also carries clinical consequences. When practitioners identify pupils as non-reactive when they actually retain reactivity, patients may undergo unnecessary diagnostic testing and receive treatments they do not need. These false alarms consume healthcare resources, expose patients to potential procedural risks, and may trigger prognostic discussions with families that are based on inaccurate information. While false-positive findings are generally preferable to false-negative ones in acute care settings where erring on the side of caution is appropriate, the high rate of false-positives documented in these studies suggests that many clinical decisions are being made based on unreliable assessment data.
The poor interrater reliability of manual assessment creates additional clinical problems beyond the immediate implications for individual patients. When different practitioners examining the same patient within minutes disagree about basic pupillary characteristics, the utility of serial assessments for detecting change over time is compromised. A documented "change" in pupil size or reactivity between assessments may represent actual clinical deterioration, but it may also represent nothing more than measurement variability between different observers. This ambiguity complicates clinical decision-making and may lead either to inappropriate escalation of care based on artifact or dangerous delays in intervention based on false reassurance.
Given that early detection of pupillary changes can alert the care team to increasing intracranial pressure and enable timely intervention, the use of standardized pupil assessment tools such as quantitative pupillometry is necessary to increase accuracy and consistency (Kerr et al., 2016). Regular assessments of pupils are vital for early identification of subtle changes in patients' neurologic status, and serial assessments must be accurate and reproducible to serve this monitoring function effectively. If pupillary changes are identified early through objective measurement, diagnostic and treatment interventions can be delivered in a more timely and effective manner, potentially improving patient outcomes and reducing the burden of preventable secondary brain injury.
Standardization and Open Science Approach
The standardization provided by the PuRe Score addresses a gap in neurological monitoring. While numerous pupillometry devices have been developed, the lack of standardized metrics across different systems has hindered comparison of results and limited the development of universal clinical guidelines. By providing an open-source, validated metric with clearly defined thresholds for clinical interpretation, the PuRe Score offers a foundation for standardization of pupillary assessment across different healthcare systems, research studies, and clinical trials (Bogucki et al., 2024).
The implications extend to clinical research and quality improvement efforts as well. Studies that rely on manually assessed pupillary findings as outcome measures or as components of prognostic models are building on an unreliable foundation. The substantial measurement error and poor interrater reliability documented in validation studies suggest that conclusions drawn from research using manual pupillary assessment should be interpreted with appropriate caution. Similarly, quality improvement initiatives aimed at reducing the time between detection of abnormal pupils and therapeutic intervention may be measuring random variation in assessment practice rather than true changes in clinical processes if they rely on manually documented pupillary findings.
Conclusion
The evidence overwhelmingly supports the superiority of quantitative pupillometry over traditional flashlight or penlight assessment for evaluating pupillary function in patients with neurological conditions. Manual assessment suffers from systematic measurement bias with consistent underestimation that worsens as pupils dilate, poor interrater reliability even among trained specialists, high rates of false-negative findings for critical abnormalities such as non-reactive pupils, concerning levels of intrarater inconsistency, and degraded performance precisely when accuracy is most clinically important. These fundamental limitations are not easily correctable through additional training or improved protocols, as they reflect inherent constraints in human perceptual abilities when performing subjective assessments of small, dynamic structures under variable conditions.
The PuRe Score and PuRe Pupillometer address these limitations by providing standardized, objective, and reproducible measurements that enable earlier detection of neurological deterioration. The system eliminates observer-dependent variability, removes the threshold effect where accuracy deteriorates at larger pupil sizes, corrects for ambient lighting variations that affect both manual and conventional automated methods, quantifies subtle changes in reactivity that precede clinically apparent deterioration, and generates numerical data that can be reliably trended over time. The smartphone-based implementation extends these capabilities across the continuum of care, from pre-hospital settings to intensive care units, without requiring dedicated hardware investment.
Healthcare institutions caring for patients with neurological injuries, including neurocritical care units, stroke units, trauma centers, and general neurology wards, should strongly consider implementing quantitative pupillometry as the standard of care for pupillary assessment. The transition from subjective to objective measurement of this fundamental neurological parameter has the potential to improve early detection of complications, enhance the reliability of serial monitoring, reduce unnecessary diagnostic testing and treatments triggered by false-positive findings, and ultimately contribute to better patient outcomes through more timely and appropriate clinical interventions. As the evidence base supporting quantitative pupillometry continues to grow, the practice of relying solely on manual flashlight assessment in high-risk neurological populations becomes increasingly difficult to justify.
Bogucki, A., Swiatek, M., Wlodarski, M., Chrost, H., Yan, Z., John, I., Chrapkiewicz, R., Ciuraszkiewicz, J., Kolodziejczak, P., Sokolowski, M., & Szczesniewsk, L. (2024). Lighting-invariant quantitative pupillometry: The Pupil Reactivity (PuRe) Score.
Boulter, J. H., Shields, M. M., Meister, M. R., Murtha, G., Curry, B. P., & Dengler, B. A. (2021). The expanding role of quantitative pupillometry in the evaluation and management of traumatic brain injury. Frontiers in Neurology, 12, 685313.
John, I., Yan, Z., Bogucki, A., Swiatek, M., Chrost, H., Wlodarski, M., Chrapkiewicz, R., & Li, J. (2024). Unsupervised deep learning-driven stabilization of smartphone-based quantitative pupillometry for mobile emergency medicine. 2024 IEEE International Symposium on Biomedical Imaging (ISBI), 1-5.
Kerr, R. G., Bacon, A. M., Baker, L. L., Gehrke, J. S., Hahn, K. D., Lillegraven, C. L., Renner, C. H., & Spilman, S. K. (2016). Underestimation of pupil size by critical care and neurosurgical nurses. American Journal of Critical Care, 25(3), 213-219.
Olson, D. M., Stutzman, S., Saju, C., Wilson, M., Zhao, W., & Aiyagari, V. (2015). Interrater reliability of pupillary assessments. Neurocritical Care, 24(2), 251-257.
Ong, C., Hutch, M., & Smirnakis, S. (2019). The effect of ambient light conditions on quantitative pupillometry. Neurocritical Care, 30, 316-321.
Szczęśniewski, L., Swiatek, M., Wlodarski, M., Chrost, H., Yan, Z., John, I., Chrapkiewicz, R., Ciuraszkiewicz, J., Kolodziejczak, P., Sokolowski, M., & Bogucki, A. (2025). Clinical validation of smartphone-based quantitative pupillometry with lighting correction in neurocritical care.



%20(1).png)

