Prediction of protein-protein interaction sites from weakly homologous template structures using meta-threading and machine learning

Surabhi Maheshwari, Louisiana State University
Michal Brylinski, Louisiana State University

Abstract

© 2014 John Wiley & Sons, Ltd. We developed eFindSitePPI that predicts interfacial residues in protein structures using evolutionarily weakly related dimer templates, structure alignments, and machine learning. eFindSitePPI performs well not only using experimental structures but also tolerates structural imperfections in computer-generated models. In addition, it detects specific molecular interactions at the interface, such as hydrogen bonds, aromatic interactions, salt bridges, and hydrophobic contacts. Comparative benchmarks against several dimer datasets show that eFindSitePPI outperforms other methods for protein binding residue prediction, particularly using protein models.The identification of protein-protein interactions is vital for understanding protein function, elucidating interaction mechanisms, and for practical applications in drug discovery. With the exponentially growing protein sequence data, fully automated computational methods that predict interactions between proteins are becoming essential components of system-level function inference. A thorough analysis of protein complex structures demonstrated that binding site locations as well as the interfacial geometry are highly conserved across evolutionarily related proteins. Because the conformational space of protein-protein interactions is highly covered by experimental structures, sensitive protein threading techniques can be used to identify suitable templates for the accurate prediction of interfacial residues. Toward this goal, we developed eFindSitePPI, an algorithm that uses the three-dimensional structure of a target protein, evolutionarily remotely related templates and machine learning techniques to predict binding residues. Using crystal structures, the average sensitivity (specificity) of eFindSitePPI in interfacial residue prediction is 0.46 (0.92). For weakly homologous protein models, these values only slightly decrease to 0.40-0.43 (0.91-0.92) demonstrating that eFindSitePPI performs well not only using experimental data but also tolerates structural imperfections in computer-generated structures. In addition, eFindSitePPI detects specific molecular interactions at the interface; for instance, it correctly predicts approximately one half of hydrogen bonds and aromatic interactions, as well as one third of salt bridges and hydrophobic contacts. Comparative benchmarks against several dimer datasets show that eFindSitePPI outperforms other methods for protein-binding residue prediction. It also features a carefully tuned confidence estimation system, which is particularly useful in large-scale applications using raw genomic data. eFindSitePPI is freely available to the academic community at http://www.brylinski.org/efindsiteppi.