A tabular approach to the sequence-to-structure relation in proteins (tetrapeptide representation) for de novo protein design

Jan Meus, Uniwersytet Jagielloński Collegium Medicum
Michał Brylinski, Uniwersytet Jagielloński Collegium Medicum
Monika Piwowar, Uniwersytet Jagielloński Collegium Medicum
Piotr Piwowar, AGH University of Science and Technology
Zdzisław Wiśniowski, Uniwersytet Jagielloński Collegium Medicum
Justyna Stefaniak, Uniwersytet Jagielloński w Krakowie
Leszek Konieczny, Uniwersytet Jagielloński Collegium Medicum
Grzegorz Surówka, Uniwersytet Jagielloński w Krakowie
Irena Roterman, Uniwersytet Jagielloński Collegium Medicum

Abstract

Background: Experimental observations classify the protein-folding process as a multi-step event. The backbone conformation has been experimentally recognized as responsible for the early-stage structural forms of a polypeptide. The sequence-to-structure and structure-to-sequence relation is critical for predicting protein structure. A contingency table representing this relation for tetrapeptides in their early-stage is presented. Their correlation seems to be essential in protein-folding simulation. Material/Methods: The polypeptide chains of all the proteins in the Protein Data Bank were transformed into their early-stage structural forms. The tetrapeptide was selected as the structural unit. Tetrapetide sequences and structures were expressed by letter codes. The transformation of a contingency table of any size (here: 160,000x2401) to a 2x2 table performed for each non-zero cell of the original table allowed calculation of the r-coefficient measuring the strength of the relation. Results: High values of the r-coefficient extracted sequences of strong structural determinability and structures of high sequence selectivity. The web-site program to calculate the r-coefficient ranking list was constructed to enable applying this method to any problem of contingency table analysis. Conclusions: The results revealed sequence-to-structure (and vice versa) correlation in early-stage folding. Surprisingly, the irregular structural forms of loops and bends appeared to be highly determined. Comparison of these results with another method based on information entropy revealed high accordance. The method oriented on interpretation of a large contingency table seems very useful especially for large-scale microarray analysis, a very popular technique in the post-genomic era. © Med Sci Monit, 2006.