I. Introduction
In machine learning, Bioinformatics is one of the important applications where machine learning helps in extracting useful information from the biological data. There are many research directions in Bioinformatics where the protein analysis of amino acid sequences is one of the important problems in this regard [1]. In the sequencing technology, the genome data are generating day by day of larger size. It consists of large amounts of protein, DNA, and RNA. The data collected of different geometric structure and distribution where each protein sequence is of variable length and belongs to different superfamilies. Classification of protein sequences into different superfamilies could be helpful for knowing the structure/function or hidden characteristics of an unknown protein sequence. One of the most important applications of classification of protein sequence is in identifying the drug to be given to treat a particular disease. Suppose a particular sequence S is obtained from disease D and by using any classification method, it is found that the sequence S belongs to a superfamily F. Then the combination of drugs of superfamily F can be used to treat disease D. Proteins are organic compounds made of amino acids arranged in a linear chain and folded into a globular form.