Project Summary / Abstract
We seek to investigate the agent-based participation of machine learning (ML) models in an existing
crowdsourcing system, which could substantially speed up biomedical image analysis without loss of data quality
for Aims 2-4 in our R01 research. We encountered an analytic bottleneck in our prior R01-supported work, which
seeks to reveal mechanisms that underlie capillary stalling in the brain and requires quantifying stall rates from
2PEF (2-photon excited fluorescence) image stacks. To address this, we partnered with the Human Computation
Institute (HCI) to crowdsource the analysis using the online citizen science platform Stall Catchers, which has
reduced the time to analyze a typical dataset from many months to just a few weeks. Beyond enabling several
published results, 35,000 Stall Catchers volunteers have produced over 1.4 million high-quality “crowd”
annotations, which served as a rich training set in a recent machine learning competition that led to the creation
of fifty distinct ML models exhibiting a broad distribution of sensitivity and bias. None of these models, by itself,
meets our stringent analytic requirements. However, if we could endow these models with sufficient agency to
participate as bonafide Stall Catchers players, then we could test the hypothesis that hybrid (human/machine)
ensembles will achieve the same data quality as human-only ensembles when answers are combined using our
existing “wisdom of the crowd” algorithm. Developing an open source toolkit for transforming ML models into
citizen science “bots” would enable a direct pathway for effectively integrating even substandard ML models into
an existing crowd-powered analytic pipeline without requiring intensive re-engineering. Accelerating biomedical
data analysis in this way could allow other biomedical researchers to derive immediate value from smaller training
sets and investigate more hypotheses using less time and resources. This project could enable a low-overhead
pathway for semi-automation using imperfect ML models, which could leverage ML sooner while reducing
reliance on human cognitive resources, and provide a pathway for achieving fully automated analyses as
improved ML models are added to the crowd as CitSci bots. Success in this pursuit would allow us to incorporate
full-time CitSci bots into Stall Catchers, which could double the number of capillary stalling studies we can
conduct in a given year toward elucidating a more complete mechanistic model of capillary stalling. This would
speed up our ability to identify a targeted intervention with reduced side effects that could alleviate cognitive
impairments in implicated dementias, such as Alzheimer’s disease while contributing to the advancement of
hybrid intelligence methods with broad utility for biomedical data analysis.