Combining Transfer Learning and Representation Learning to Improve Predictive Analytics on Small Materials Data | IEEE Conference Publication | IEEE Xplore

Combining Transfer Learning and Representation Learning to Improve Predictive Analytics on Small Materials Data


Abstract:

Modern data mining methods have seen a widespread and growing application in the field of materials science for regression-based predictive modeling due to their effectiv...Show More

Abstract:

Modern data mining methods have seen a widespread and growing application in the field of materials science for regression-based predictive modeling due to their effectiveness in extracting and utilizing the hidden information from the materials datasets. However, due to the costly and time-consuming nature of the methods involved in obtaining the experimental and computational data, the majority of the materials datasets are small in size. Moreover, limited hand-engineered representations available from the raw materials data make it harder to improve the accuracy of predictive models on such small and specialized training datasets. In this paper, we introduce a novel technique that combines transfer learning (TL) and representation learning (RL) using a pre-trained deep neural network to maximize accuracy without additional computational costs on inorganic material properties. The performance of the proposed method is compared against traditional machine learning (ML), and deep neural network models trained from scratch (SC) with elemental fraction (EF) as input, more informative physical attributes (PA) as input (for a stringent comparison), as well as conventional TL and RL techniques using deep neural networks. The results demonstrate that the proposed method can improve the accuracy as compared to SC models and conventional TL and RL techniques.
Date of Conference: 18-20 December 2024
Date Added to IEEE Xplore: 04 March 2025
ISBN Information:

ISSN Information:

Conference Location: Miami, FL, USA

I. Introduction

Modern data mining methods have seen a widespread and growing application in the field of materials science for regression-based predictive modeling due to their effectiveness in extracting and utilizing the hidden information from the materials datasets and aid in the process of materials discov-ery [1]–[7]. This has been made possible due to the availability of computationally calculated large materials databases [8], [9] as well as easy-to-use data mining tools and advance-ment in the machine learning (ML) and deep learning (DL) algorithms to extract hidden information from raw inputs and build accurate and robust models for various material properties [10]–[15]. Since materials property prediction is a regression-based task and the representation used as model input to train various ML/DL methods usually comprises of a one-dimensional numerical vector obtained by pre-processing raw materials input, traditional ML algorithms [10], [11] and DL models composed of fully connected layers [16]–[21] are extensively used. However, due to the costly and time-consuming nature of the methods involved in obtaining the experimental and, in some cases even computational data, the majority of the materials datasets are small in size, limiting the highly accurate models to a selected few materials properties with a large amount of data [22], [23]. Moreover, limited generalized hand-engineered representations available from the raw materials data [24], [25] make it harder to improve the accuracy of predictive models built on such small and specialized training datasets. Therefore, various advanced data mining techniques such as transfer learning (TL) [26]–[29] and representation learning (RL) [30]–[35] are often applied to tackle the bottleneck of small data size by reusing the existing knowledge in a bid to boost the predictive performance of the model.

Contact IEEE to Subscribe

References

References is not available for this document.