The volume of literature produced on the topic of COVID-19 is daunting. So much so that scientists can’t keep up and need help finding relevant papers and building correlations.
Enter COVIDScholar.com. The search engine uses natural language processing techniques to scan, search, synthesize, draw insights and make connections.
A group of materials scientists at Lawrence Berkeley National Laboratory (Berkeley Lab), who usually spend their time researching high-performance materials for thermoelectrics or battery cathodes, built the text mining tool. Their quest to develop text and data mining techniques that can help answer high-priority questions related to COVID-19 stems from the White House’s March 16 call to action.
At the time, the COVID-19 Open Research Dataset (CORD-19) of scholarly literature about COVID-19, SARS-CoV-2 and the Coronavirus group had the most extensive machine-readable coronavirus literature collection available for data and text mining, with more than 29,000 articles.
Once the Berkeley Lab team set to work, its prototype was up and running within a week; after a month the tool had collected more than 61,000 research papers. About 8,000 were specifically about COVID-19 and the balance were about related topics, such as other viruses and pandemics in general. They estimate 200 new articles are published every day on the coronavirus. “Within 15 minutes of the paper appearing online, it will be on our website,” said Amalie Trewartha, a postdoctoral fellow who is one of the lead developers.
Ready for Public Use
The tool went live this week when the Berkeley Lab team released an upgraded version that allows the user to search for “related papers” and sort articles using machine-learning-based relevance tuning. COVIDScholar will also recommend similar abstracts and automatically sort papers in subcategories, such as testing or transmission dynamics, allowing users to do specialized searches.
The developers built automated scripts to grab new papers (including preprint papers), clean them up and make them searchable. At the most basic level, COVIDScholar acts as a simple search engine—albeit a highly specialized one touted as the largest single-topic literature collection on COVID-19—according to the developers.