A low-dimensional uniform manifold approximation projection showing symmetry-aware image similarity from a database of more than 25,000 piezoresponse force microscopy images. Image: Joshua Agar/Lehigh University.
A low-dimensional uniform manifold approximation projection showing symmetry-aware image similarity from a database of more than 25,000 piezoresponse force microscopy images. Image: Joshua Agar/Lehigh University.

Understanding structure-property relations is a key goal of materials research, according to Joshua Agar, a faculty member in Lehigh University’s Department of Materials Science and Engineering. And yet currently no metric exists to understand the structure of materials, due to the complexity and multidimensional nature of that structure.

Artificial neural networks, a type of machine learning, can be trained to identify similarities – and even correlate parameters such as structure and properties – but there are two major challenges, says Agar. One is that the majority of the vast amounts of data generated by materials experiments are never analyzed. This is largely because such images, produced by scientists in laboratories all over the world, are rarely stored in a usable manner and not usually shared with other research teams. The second challenge is that neural networks are not very effective at learning symmetry and periodicity (how periodic a material’s structure is), two features of utmost importance to materials researchers.

Now, a team led by researchers at Lehigh University has developed a novel approach that can create similarity projections via machine learning, allowing researchers to search an unstructured image database for the first time and identify trends. Agar and his collaborators developed and trained a neural network model to include symmetry-aware features, and then applied their method to a set of 25,133 piezoresponse force microscopy images collected on diverse materials systems over five years at the University of California, Berkeley. The resulting trained model was able to group similar classes of material together and observe trends, forming a basis by which to start to understand structure-property relationships.

“One of the novelties of our work is that we built a special neural network to understand symmetry and we use that as a feature extractor to make it much better at understanding images,” says Agar, a lead author of a paper on this work in npj Computational Materials.

The team was able to arrive at these projections by employing Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction technique. This approach, says Agar, allows researchers to learn “...in a fuzzy way, the topology and the higher-level structure of the data and compress it down into 2D”.

“If you train a neural network, the result is a vector, or a set of numbers that is a compact descriptor of the features. Those features help classify things so that some similarity is learned. What’s produced is still rather large in space, though, because you might have 512 or more different features. So, then you want to compress it into a space that a human can comprehend such as 2D, or 3D – or, maybe, 4D.”

By doing this, Agar and his team were able to take the 25,000-plus images and group very similar classes of material together.

“Similar types of structures in a material are semantically close together and also certain trends can be observed, particularly if you apply some metadata filters. If you start filtering by who did the deposition, who made the material, what were they trying to do, what is the material system...you can really start to refine and get more and more similarity. That similarity can then be linked to other parameters like properties.”

This work demonstrates how improved data storage and management could rapidly accelerate materials discoveries. Of particular value, according to Agar, are images and data generated by failed experiments.

“No one publishes failed results and that’s a big loss because then a few years later someone repeats the same line of experiments,” he says. “So, you waste really good resources on an experiment that likely won’t work.” Instead of losing all of that information, the data that has already been collected could be used to generate new trends that have not been seen before and speed discovery exponentially.

This study is the first 'use case' of an innovative new data-storage enterprise housed at Oak Ridge National Laboratory called DataFed. DataFed, according to its website is “...a federated, big-data storage, collaboration and full-life-cycle management system for computational science and/or data analytics within distributed high-performance computing (HPC) and/or cloud-computing environments.”

“My team at Lehigh has been part of the design and development of DataFed in terms of making it relevant for scientific use cases,” says Agar. “Lehigh is the first live implementation of this fully-scalable system. It’s a federated database so anyone can pop up their own server and be tied to the central facility.”

Agar is the machine-learning expert on Lehigh University’s Presidential Nano-Human Interface Initiative team. This interdisciplinary initiative, integrating the social sciences and engineering, seeks to transform the ways that humans interact with instruments of scientific discovery to accelerate innovations.

“One of the key goals of Lehigh’s Nano/Human Interface Initiative is to put relevant information at the fingertips of experimentalists to provide actionable information that allows more informed decision-making and accelerates scientific discovery,” says Agar. “Humans have limited capacity for memory and recollection. DataFed is a modern-day Memex; it provides a memory of scientific information that can easily be found and recalled.”

DataFed provides an especially powerful and invaluable tool for researchers engaged in interdisciplinary team science, allowing researchers who are collaborating on team projects located in different/remote locations to access each other’s raw data. “This is one of the key components of our Lehigh Presidential Nano/Human Interface (NHI) Initiative for accelerating scientific discovery,” says Martin Harmer, a professor in Lehigh’s Department of Materials Science and Engineering and director of the Nano/Human Interface Initiative.

This story is adapted from material from Lehigh University, with editorial changes made by Materials Today. The views expressed in this article do not necessarily represent those of Elsevier. Link to original source.