CLEdgeSeq: Advanced sequence analysis and communication for edge applications - Developing effective and safe therapeutics is critical to our quality of life, and novel approaches are
needed to improve the speed and accuracy of detection biological contamination and other quality control
parameters. Edge DNA and RNA sequence computational capability will provide point-of-production
and point-of-care support for analysis of biological contamination that will be essential for effective mon-
itoring and gating of biological contaminants in water and therapeutic supplies. The main objective of this
SBIR is to create a neural network-accelerated DNA and RNA sequence processing node suitable for field, clinical,
and industrial use, making it of great utility to monitoring efforts aimed at tracking and assessing potential haz-
ardous exposures, as well as highly marketable to a wide range of end-users. In order to demonstrate technical
feasibility, QBI will perform these Specific Aims:
Specific Aim 1: Develop neural network NGS data processing routines for mapping, assembly, and
analysis in edge applications. DNA and RNA sequence analysis, which is core to a number of biologi-
cal contaminant monitoring techniques but is also core to QBI's recent nanobody library technology, has
traditionally has been reserved for servers and other capable computers, but neural network approaches
have enabled modern edge computational platforms to perform sophisticated analyses outside of central
servers. We seek to develop two proof of principle neural networks that accelerate two sequence analysis
routines central to sequence analysis on edge platforms, which will permit distributed and cost-effective
execution of sequence analysis in remote contexts. The error tolerance of neural networks will be espe-
cially useful, since nanopore sequencers with relatively high sequence error rate are becoming a common
edge technology due to their ability to sequence samples with minimal hardware and with minimal pre-
processing of sample.
Specific Aim 2: Develop and demonstrate CLEdgeSeq node as a solution for distributed NGS data
processing. These neural network tools will be combined with data format specifications and network
communication routines to establish feasibility of a computational node, CLEdgeSeq. CLEdgeSeq will
pre-process NGS data at the point of sequencing, perform necessary analyses, assemble consensus se-
quences, and then relay the results to a central server in a bandwidth-effective manner. The resulting
platform will be a distributed computing approach to NGS data activities that should scale well with
the expected rapid rise of NGS data generation from point-of-production and point-of-care sequencers,
especially including nanopore sequencers.
Successful completion of these Aims will serve to validate modern neural network approaches to sequence anal-
ysis for edge hardware, making it of great utility to monitoring efforts aimed at tracking and assessing potential
hazardous exposures. Creating this computational capability will be transformative, since it will provide
a cost-effective and scalable solution to ubiquitous DNA and RNA sequencing for monitoring and other
efforts. This will enable QBI to continually expand their customer base as they continue to add sensing
capabilities tailored to meet end-users' needs.
1