Lab Manager | Run Your Lab Like a Business
Creative digital blue business interface on blurry background
iStock, peshkov

Improvements for Man and Machine in Scientific Publishing

Frictionless Data improves machine readability of articles, enables humans to directly interact with the data

by GigaScience
Register for free to listen to this article
Listen with Speechify

The need for information from research outputs to be more findable, accessible, interoperable, and reusable (FAIR) has spurred researchers, database managers, and publishers to continually look for new and better ways to make information machine-readable. Another equally important area is creating articles that readers can actively engage with, rather than passively taking in information from reading a published article. One tool that easily improves machine readability of data is a data standard called Frictionless Data, developed by the Open Knowledge Foundation. A study published in the Open Science journal GigaByte revealed that not only does Frictionless Data drastically improve machine readability, but that it can also turn normally static figures within the article into dynamic entities that allow readers to directly interact with the data within the article. Demonstrating that the use of Frictionless Data can tackle two important activities: allowing both man and machine to use and directly engage with scientific outputs in a dynamic fashion.

Integration of Frictionless Data was carried out on an article by a team of researchers from the University of Melbourne in Australia, led by professor Anthony Papenfuss, whose lab have been long time advocates of open and reproducible research. Making sure the data, source code, and every other sharable component of their research is openly available to the community. This makes their work especially amenable to utilizing new tools on top of their articles to make the published work dynamic and actively usable. The article here presents two new open source tools, svaRetro and svaNUMT, for interpreting difficult to structural variation in genome analysis. These help annotate novel genomic events that are missed in most genome assembly pipelines: such as retrotransposition events and insertion of DNA fragments from the mitochondria to the nuclear DNA, which contribute to the complexity of genome sequences and the understanding of gene function and genome evolution.

Get training in Metrics and Productivity and earn CEUs.One of over 25 IACET-accredited courses in the Academy.
Metrics and Productivity Course

The openness and availability of all of the research components behind these tools and analyses created a perfect opportunity to implement Frictionless Data to make the article far more machine readable. During the process of adding this to the article, Raniere Silva from City University of Hong Kong, as part of a FAIR data internship, made the fortuitous discovery that Frictionless Data could also play a role in improving human interaction with the article. The figures, for the first time, were regenerated in an interactive manner. In the example here, readers can not only view the summary information presented in the figure, they can hover over data points to see the exact numbers and information behind these, and also manipulate the figure itself to view specific components that are of interest.

Silva says: “My biggest surprise was that the Frictionless Data Package specifications in conjunction with the popular Plotly tool has functions to convert a static visualization into a dynamic one. This massively reduces the barrier for many researchers to produce dynamic data visualization as they only need to add a line or two to their code. GigaByte made a huge leap by publishing the dynamic data visualization and I hope it inspires other journals to publish dynamic data visualization.”

When asked what they found most useful from this process, the authors stated: “The interactive figures are a great addition to the paper. We found the interactive functions made reading labels easier, especially for label-rich figures, and liked that the figures were accessible in SVG format, allowing viewing and editing without losing information from the figures.”

To promote the use of Frictionless Data in more published articles, Silva wrote a detailed handbook that includes an introduction to the use of Frictionless Data, an introduction to the specifications, short working examples for creating an author’s own data package, and long examples, based on published articles in GigaScience and GigaByte journals, illustrating the creation and use of Frictionless Data. The goal is for the handbook to serve as the start of a conversation within the scientific community of how to embrace Frictionless Data. This handbook also provides a resource and guidance to make things easier and for data producers to submit articles with these packages to data publishers, such as GigaScience Press.

Of added interest, in addition to the inclusion of Frictionless Data, paper is that for the first time as the figures were regenerated in an interactive manner this process combined a CODECHECK certificate of reproducible computation.

The use of Frictionless Data and all the downstream elements it enables, serves as transformative steps in scientific publishing, as they improve machine readability and reproducibility, and turn scientific articles from their old-fashioned static format into a 21st century living document. These types of novel, data-literate additions to the publication process are part of the reason GigaByte was the winner of the 2022 ALSPS Innovation in Publishing Award presented this month.

- This press release was originally published on the GigaScience website