I. Introduction
In recent years, large-scale Deep Learning (DL) models have been demonstrating impressive results in a variety of domains ([1]–[3]). With the popularity of distributed training (e.g. [4], [5]), the cost of training such models has also become larger and larger. On the other hand, at these expensive costs, the data generated during the training process, e.g. weights, performance metrics, loss landscape, etc, obtained through the heavy iterative process are discarded after the training is finished, and often only the final model and weights are saved for inference applications.