I. Introduction
In order to perform various data science techniques, a large amount of data is required. SynGen is a platform designed for generating synthetic data based on various user inputs and predefined fields. Generating or manually creating datasets of such magnitude is an arduous and resource-intensive undertaking. Using machine learning techniques, SynGen is capable of generating synthetic data from scratch as well as identifying the attributes included in provided datasets and generating more synthetic data from those. Further, it can compare the performance of different machine learning algorithms and recommend the best algorithm based on the datasets. A few significant features of the developed platform are:
The platform is designed to support the generation of synthetic data based on user-defined schemas.
Generating synthetic data using the data that was previously available to the user.
Provide students and/or users with a framework where they can upload datasets and apply various machine learning algorithms and compare the results.