How much sequence is enough?
We're often asked to build 3D homology models of proteins based only on protein amino acid sequence. The basic idea is to predict a protein's 3D structure (target) by comparing its AA sequence to sequences from similar proteins with known 3D structures (templates). The perennial question that comes up is "How similar do the target:template sequences have to be for you to trust the 3D models?"
Chakravarty et al. (Nucleic Acids Researsh, 33(1):244-259, 2005) have some interesting results on this point. They show that 40% sequence identity is probably good enough for many homology models. In fact, with 40% identity, the homology models are about as accurate as NMR structures, and NMR structures are usually considered good enough for modeling drug interactions with proteins. (However, the 3D structures from homology models are not as good as those derived from X-ray crystallography, which are considered the gold standard).
This figure shows the 3D X-ray structure for a Staphylococcus protease and several homology models. The match is pretty good at 60% and 30% sequence identity.

Surface and backbone structures of S. griseus protease B. Left-hand shows X-ray structure; remaining four show homology models. Sequence identities for the models are 61%, 32%, 22%, and 13%.
Chakravarty et al. (Nucleic Acids Researsh, 33(1):244-259, 2005) have some interesting results on this point. They show that 40% sequence identity is probably good enough for many homology models. In fact, with 40% identity, the homology models are about as accurate as NMR structures, and NMR structures are usually considered good enough for modeling drug interactions with proteins. (However, the 3D structures from homology models are not as good as those derived from X-ray crystallography, which are considered the gold standard).
This figure shows the 3D X-ray structure for a Staphylococcus protease and several homology models. The match is pretty good at 60% and 30% sequence identity.

Surface and backbone structures of S. griseus protease B. Left-hand shows X-ray structure; remaining four show homology models. Sequence identities for the models are 61%, 32%, 22%, and 13%.

1 Comments:
Hey, you have a great blog here! I'm definitely going to bookmark you! I have a computer network site/blog. It pretty much covers computer network related stuff.
Post a Comment
<< Home