Automated screening tools
Automated screening of scientific manuscripts can help authors to identify and fix common problems, such as failing to state whether experiments were blinding or randomized, using potentially misleading bar graphs to present continuous data, or failing to acknowledge study limitations. Tools can screen a manuscript and provide authors with customized feedback in seconds. This makes automated screening a valuable strategy for improving transparency and reproducibility on a large scale, across many fields.
At QUEST, we have developed several new screening tools and are founding members of an international working group that combines many differnt tools into a powerful screening pipeline (ScreenIT).
ODDPub is a text-mining algorithm that parses a set of publications and detects which publications disseminated Open Data or Open Code along with the paper. This tool is tailored towards biomedical science.
Github Link: https://github.com/quest-bih/oddpub
Paper Link: https://doi.org/10.5334/dsj-2020-042
Barzooka is a deep convolutional neural network that screens publication PDFs and checks for bar graphs of continuous data and other common graphing issues. Many different data distributions can lead tot he same bar graph and the actual data may suggest different conclusions from the summary statistics alone. Barzooke also detects more informative alternatives to bar graphs, like dot plots, box plots and histograms.
Tool page: https://quest-barzooka.bihealth.org/
Why you shouldn’t use bar graphs of continuous data, and what to use instead:
The Automated Screening Working Group is an international group of tool creators working to improve scientific manuscripts. This group was co-founded by QUEST members. Group members have combined their tools into the ScreenIT pipeline, which screens for common problems that can affect transparency or reporting and provides feedback to authors. Throughout the COVID-19 pandemic, we have been using our automated ScreenIT pipeline to screen COVID-19 preprints on medRxiv and bioRxiv. Public reports are automatically posted via hypothes.is and tweeted out via @SciScoreReports (https://hypothes.is/users/sciscore).
For more details on the international working group and the ScreenIT pipeline, see:
Automated Screening Working Group: https://scicrunch.org/ASWG
Correspondence article on COVID-19 preprint screening: https://www.nature.com/articles/s41591-020-01203-7
The ScreenIT pipeline includes the following tools:
Blinding, randomization, sample-size calculations, sex/gender, ethics and consent statements, resources, RRIDs
Open data, open code
Bar graphs of continuous data
Rainbow color maps (these color maps create visual artifacts and aren’t colorblind safe)
Seek and Blastn
Correct identification of nucleotide sequences