RAMBO speeds searches on huge DNA databases

submited by
Style Pass
2021-06-30 20:30:17

Rice University computer scientists are sending RAMBO to rescue genomic researchers who sometimes wait days or weeks for search results from enormous DNA databases.

DNA sequencing is so popular, genomic datasets are doubling in size every two years, and the tools to search the data haven’t kept pace. Researchers who compare DNA across genomes or study the evolution of organisms like the virus that causes COVID-19 often wait weeks for software to index large, “metagenomic” databases, which get bigger every month and are now measured in petabytes.

RAMBO, which is short for “repeated and merged bloom filter,” is a new method that can cut indexing times for such databases from weeks to hours and search times from hours to seconds. Rice University computer scientists presented RAMBO last week at the Association for Computing Machinery data science conference SIGMOD 2021.

“Querying millions of DNA sequences against a large database with traditional approaches can take several hours on a large compute cluster and can take several weeks on a single server,” said RAMBO co-creator Todd Treangen, a Rice computer scientist whose lab specializes in metagenomics. “Reducing database indexing times, in addition to query times, is crucially important as the size of genomic databases are continuing to grow at an incredible pace.”

Leave a Comment