ECCB 2024
Integration of multi-view features of protein sequences for improved protein function prediction
The advent of next-generation sequencing technology has led to the generation of vast amounts of protein data. In biological systems, proteins play a pivotal role in numerous processes, including catalysis of metabolic reactions, DNA replication, and transport of molecules. The study of protein function is therefore of paramount importance. In recent years, many intelligent studies have incorporated protein function prediction and machine learning with the burgeoning field of artificial intelligence. However, previous studies have struggled to analyze the structure and function of complex proteins due to the limitations of their approach, which only captures and embeds a single view of the protein sequence to be predicted. In this regard, we propose multi-view protein function prediction. The proposed method extracts feature vectors from different perspectives from protein sequences to improve prediction performance. The first view is an amino acid composition feature vector that contains sequence information. The second view is a feature vector that calculates the frequency in the protein sequence based on the k-mer vector and contains statistical information. Finally, a feature extraction model-based vector that encompasses both graphical representation and statistical features is generated and determined as the last view. The proposed multi-view approach represents a comprehensive integration of sequence, statistical, physicochemical, and evolutionary information. In conclusion, the proposed method overcomes the limitations of single-view approaches by using features from different information sources. Furthermore, the extracted multi-view-based features can be applied to other protein function prediction methods with relative ease.