Expert report calls for action on materials data

Modern materials science research is generating ever more data – but what happens to it all? While some data makes it into publications, much of the data produced during the course of research and development is never reused, according to a report from The Minerals, Metals & Materials Society (TMS). The main reason, say the report’s authors, is the effort and resource needed to preserve, maintain, and share data.

The diverse group of ten technical experts led by Justin Scott of the TMS with sponsorship from the National Science Foundation hope that the report, Building a Materials Data Infrastructure: Opening New Pathways to Discovery and Innovation in Science and Engineering, will spur the development of a common infrastructure for materials data to underpin the discovery, design, development, and implementation of innovative materials and processes.

Despite efforts from federal funding agencies in the US to drive the dissemination and sharing of data, the vast majority is still stored locally – out of reach of other users and researchers. But the materials science community is increasingly recognizing that an open, collaborative approach to data is the best way of answering new or existing research questions and speeding up innovation and development.

In today’s highly digital and integrated world, a coordinated materials data infrastructure will be a key enabler for accelerating materials-related science and engineering breakthroughs in the 21st century and beyond

A researcher studying grain boundaries, for example, typically undertakes atomistic modeling of their properties, which vary with five geometric degrees of freedom, as well as with temperature, composition, impurity concentration and so on. The amount of generated data is vast and sharing datasets in this context could save much time and effort. Initially, researchers shared their data and metadata on an individual basis in authors’ own formats but now the community is developing standards for file formats and local repositories for data.

The past decade has seen the first steps be taken toward a materials data infrastructure that could enable the capture, storage, and sharing of this – and other kinds – of data in the future. In the US, the profile of data management was significantly raised by the launch of the Materials Genome Initiative by President Barack Obama in 2011, which focuses on an accessible, extensible, scalable, and sustainable data infrastructure for materials discovery and development. Over the same timeframe, the European Commission has funded several projects on data sharing and reuse.

Further progress, believe the experts brought together for the TMS report, will require three core components to create a robust materials data infrastructure: repositories – which could be hardware or software – to store and make data available; software tools for handling and analyzing datasets; and e-collaboration platforms to enable the community to share and reuse data. As well as these three core elements, a common or compatible set of technologies, policies, incentives, standards, and protocols for handling data will be needed.

Over the coming months, The Materials Today family will be looking to engage in a number of further projects to facilitate a culture of sharing and enable researchers to put their data in the spotlight.

But there are still significant issues to overcome, warns the report, not least a lack of e-collaboration tools and platforms, the difficulties associated with implementing open or common standards and formats for data, the absence of standardized components and workflows for data extraction and reuse, and little integration of resources by infrastructure providers. Materials research also has some particular challenges – notably the widespread use of microstructural characterization tools that generate two-dimensional surface scans or three-dimensional tomographic images, which can be difficult to convert into digitally searchable data.

The research community itself also needs clarity on the benefits of a materials data infrastructure. Clear ‘career’ incentives for sharing data and a definitive means of crediting data contributors could also go a long way towards encouraging active researchers to participate, suggests the report. Legal issues, like copyright and privacy of data related to individuals (from a clinical trial, for example), must also be resolved.

So what should be done? The report makes eight ‘tactical’ recommendations ranging from the development of repositories, tools, and platforms to build up a materials data infrastructure to the establishment of funding, training, and incentive programs to encourage researchers to make data management a key element of their work.

As to who should undertake the much-needed next steps of groundwork, the report turns the spotlight on federal agencies such as the National Institute of Standards and Technology (NIST), as well as universities and technology companies, which should be supported financially by the NSF, Department of Energy (DOE), and Department of Defense (DoD). Some initiatives are already underway, like the NIST-CHiMaD Proto Data Repository for structural phase-based data (https://phasedata.nist.gov) and Materials Commons hosted by the University of Michigan, but they are still at the early stage of development and cover only a tiny fraction of the data generated by the research community.

One of the specialists interviewed for the report was Elsevier’s VP of Research Data Collaborations, Anita de Waard, who identified three main barriers to sharing science and engineering data:

  • Incentives to submit and publish data are unclear. Funding agencies and research institutions alike support the sharing of data; however, there are not many well-articulated incentives to publish detailed datasets from a data producer’s perspective.
  • Standardized methods of sharing are not available. Much work needs to be done to standardize the storage and sharing of data; there is currently too much freedom in approaches, which can inhibit the reuse of data.
  • Very few examples currently exist to help illustrate the vision for published data. Success stories of great science being enabled through the storage and sharing of data are not prominent.

Within the Elsevier Research Data Management effort, a number of efforts are aimed at making data shared, stored, cited and used (see also: “Ten Habits of Highly Effective Research Data”, https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data). The Elsevier journal submissions system directly supports and encourages the deposition of data in open, sustained data repositories https://www.elsevier.com/authors/author-services/research-data/data-base-linking. A suite of journals devoted to data, software and methods aims to make all aspects of the research workflow discoverable and credited https://www.elsevier.com/authors/author-services/research-elements. Our Electronic Lab Notebook, Hivebench, enables detailed data capture at the moment of research https://www.elsevier.com/solutions/hivebench. And Mendeley Data is a free, robust data repository that supports open standards and long-term preservation, through a series of collaborations with key repository partners (https://blog.mendeley.com/2015/11/09/put-your-research-data-online-with-mendeley-data/)

Christiane Barranguet, Publishing Director for Materials Today, the family of materials science information solutions at Elsevier, endorses the findings from the report. “We endeavor to play an active role in educating researchers on data sharing, as well as encouraging and facilitating the sharing of research data through various open data initiatives. These include an easy drag and drop deposit of data into Mendeley Data when submitting a research paper and the option of co-submission of a Data in Brief paper alongside the original research data, to give a more detailed, curated and citable overview of research data and how they were collected it”.

Over the coming months, The Materials Today family will be looking to engage in a number of further projects to facilitate a culture of sharing and enable researchers to put their data in the spotlight.

But more than just a collection of technical platforms and tools, a functional ‘ecosystem’ of users and providers – who will need to be trained and educated – will have to be established to broaden the adoption of a materials data infrastructure and sustain its use.

Collaborative, cross company data initiatives at Elsevier endorse the report’s conclusion that “In today’s highly digital and integrated world, a coordinated materials data infrastructure will be a key enabler for accelerating materials-related science and engineering breakthroughs in the 21st century and beyond.” Within and across Elsevier, various teams are gearing up for collaborations to make this vision a reality.

For further information:

http://www.tms.org/Publications/Studies/Materials_Data_Infrastructure/Materials_Data_Infrastructure.aspx?hkey=d228f86c-e269-49a2-a638-395285b760e4

https://data.mendeley.com/

https://www.hivebench.com/

http://www.prisms-center.org/#/mcommons/overview