Technology: Sharing data in materials science

The US Materials Genome Initiative (MGI), launched by President Barack Obama in June 2011, aims to halve the time and cost of developing advanced materials for applications such as energy, transportation and security. Over two years, hundreds of millions of dollars have been invested in academia, industry and federal-agency projects.

Data sharing and developing computational tools are critical to MGI’s success. Advanced materials have complex physical and chemical properties that can be manipulated for a variety of applications, and these can change during synthesis, manufacturing, and use. Tracking these assets is a daunting task, and MGI involves efforts to standardize terminology, data-collection formats, and reporting guidelines.

Fortunately, there is much to be learned from existing collaborations in nanotechnology. The National Nanotechnology Initiative (NNI), established a decade ago for materials in the 1-100-nanometer range, is a ready partner for MGI, which covers scales from nanometers to micrometres.

MGI may consider joining NNI’s Nanotechnology Knowledge Infrastructure Initiative which was launched in May 2012 to develop digital data and information infrastructure and strengthen collaboration between the science and modeling communities. The initiative has already defined a set of data preparedness levels designed at NASA’s Technology Readiness Levels to provide a basis for communicating the quality and maturity of material data.

The MGI may also join a partnership between the NNI and the European Commission to support a transatlantic dialogue on the nuts and bolts of data sharing: informatics, consensus-derived ontology, data representation and collection.

Data sharing is an inherently collaborative activity that has the potential to advance materials science much more quickly. MGI can strengthen existing efforts and act as a hub for information sharing on content at all levels.

David L. McDowell: Encourage Sharing

Executive Director of the Institute for Materials, Georgia Institute of Technology, Atlanta

MGI should avoid a ‘build it and they will come’ attitude. Scientists and engineers need incentives to collaborate and share their data and skills. There should be something in it for everyone.

The data-sharing environment should invite collaboration as well as facilitate it. Stakeholders have broad interests that go beyond simply retrieving existing data – they want to discover materials and predict advanced products. A seamless and robust online environment, and development of cyber-infrastructure that is distributed and organic rather than centralised, will encourage the contribution of diverse users.

Social-networking strategies can connect users with diverse expertise to advance common interests. A win-win approach should be encouraged. For example, modeling advances by uploading experimental data sets in exchange for access to modeling tools. Explicit agreements should govern the ethics of credit attribution and data use.

Maximizing the usefulness of information is a major attraction for investors in MGI’s infrastructure. For example, expensive data sets obtained from national synchrotron and neutron-diffraction facilities should be stored and leveraged to the greatest extent for discovery and citation, as should data from large-scale supercomputer simulations. .

Open-access rules are desirable, with examples from the National Science Foundation-funded Nanohub for nanometer-scale modeling and simulation tools, as well as the LAMMPS molecular-dynamics code and DREAM.3D software to mesh three-dimensional microstructures.

Amanda Barnard: Embrace uncertainty

Head of Virtual Nanoscience Laboratory, Commonwealth Scientific and Industrial Research Organization, Parkville, Australia

MGI is opening up styles of collaborative working that give rise to technical and personal challenges. Materials scientists should be more comfortable with uncertainty. They must relinquish control, trust their fellow scientists, and resist the urge to redo everything ‘just to be sure’.

Delivering new science from existing data requires the pooling of resources. Some insights and breakthroughs cannot be built any other way. One method can calibrate scales or achieve resolutions that others cannot. Electron microscopy can resolve subatomic features on surfaces, but optical microscopy shows how light reflects off them.

It is difficult to combine results from different sources. Errors arise from specifications in experimental or computational techniques. Many experimentalists know the frustration of reproducing results that vary with laboratory conditions. Even theory-based computational methods can give different answers.

Combining data of different origins often introduces more uncertainty than a simple summation of measurement or statistical errors stemming from pure data sets.