Database approaches are widely used in structural bioinformatics, since ab initio techniques are often computationally prohibitive, and the structure of biological macromolecules are typically derived from a limited set of motifs. There are several issues and challenges that arise when developing methods to enable efficient database retrieval. For example, how can complex data be represented efficiently, and what should be the size and composition of the database? In this work, we discuss some of these challenges, based on a crystallographic protein model-building program called TEXTAL. In particular, we discuss how structural information on amino acids is represented (as numeric features), how difficult it is to recognize amino acids (based on 3D electron density patterns), and what types of examples (and how many of them) need to be stored in the database. These insights are potentially useful in many other related applications, such as structure-based drug design, protein-protein interaction, discriminating nucleic acids and proteins in hybrid complexes, etc. ©2007 IEEE.
- Structural BioinformaticsX-ray CrystallographyElectron Density Map InterpretationFeatures