19 Structural bioinformatics
Structural bioinformatics is a multidisciplinary area enriched by chemistry, physics, computer science and many others. Although, it could be focused on different biological macro-molecules, here the emphasis will be focused on proteins.
One of the first protein structure elucidated was myoglobin and it triggered the study of the role of the structure of proteins and its biological functions
Identify a protein related to your study that could be further analyzed.
19.1 Protein structures
Difference between the levels of protein structure primary structure is the basic linear representation of aminoacids. Natural aminoacids and modified or rare aminoacids display physico-chemical rich information and could be represented by letters. Therefore in the genetic code we could find the one-letter code.
Secondary structures result from the spatial arrangement of aminoacids that interact with each closer neighbors. There are some remarkable secondary structures such as \(\beta\)-sheets, \(\alpha\)-helix, coils (flexible) and others.
Tertiary structure informs about the structural disposition of the secondary structures that fold between each other due to hydrophobic interactions, disulfure bonds, and other chemical interactions forming a globular and dynamic structure. Thus, proteins could display multiple structural states depending on the physical and energy stability (see the :Levinthal’s paradox).
Quaternary structures result from interaction of multiple tertiary structures. The structure, therefore dictates the protein function. This basic concept have triggered more recently a boom on the analysis of the structure of proteins.
19.2 Identifying or predicting protein structures
Xray crystallography, nuclear magnetic resonance (NMR) allows to encapsulated dynamic information of the protein in time Electron micrography (EM) and Cryo-EM. These experiment rely on highly specialized set ups and there are other drawbacks.
Modelling the structure whether ab initio or by homology also allow structure prediction. However these strategies
Recently AlphaFold
The protein database (PDB)
Protein topologies resulting from the folding: horshoe, beta-barrel and other could be identified
Structural classification of proteins (SCOP) when analyzing a new protein classification by class, architecture, topology (fold-family), homologous superfamily and sequence family
Importance of Gene Ontology
19.2.1 Secondary structure prediction
Secondary structures could also be represented in one letter (e.g.)
Functional domains could be predicted by sequence alignment and allow structural inference. Main predictions are based heavily on machine learning and are frequently accepted under a consensus of multiple tools.
Submit 5lWM (JAK3) from the PDB on FASTA format on the JPRED and PSIPRED and compared with the experimentally predicted version of the protein. Analyze the predictions and tell are there differences between predictions? Which one is more accurate?
19.3 PDB database introduction
The PDB database is one of the most important and ancient open biological database where all new protein structures are submitted. It is an international consortium where several regions work together to curate information.
The PDB in Europe (PDBe) is not only for proteins but form many other experimentally predicted macro-molecules (protein-protein interactions, peptides, RNA and so on).
Protein structures are registered using a unique code of four characters.
X-ray crystallography: is a chemical state of the macro-molecule where it is immobilized, therefore information correspond to one state of the structure, then crystal protein is submitted to an x-ray beam to generate a diffraction pattern.
NMR spectroscopy: captures dynamic information of the protein, but generally resolves small proteins. The principle?
Electron microscopy adapted to cryo-preservation allow proteins visualization.