biopython Bio.PDB FAQ

blackjimmy

UID: 30347
帖子: 50
积分: 114
在线时间: 3 小时

1^# blackjimmy 发表于 2007-03-29 21:24

biopython Bio.PDB FAQ

学习python进行中！ biopython

Biopython
Bio.PDB module FAQ
1,Focuses
on working with crystal structure biological macromolecules.
2,Well
tested. Nearly 5500 structures from PDB—all seemed to be parsed
correctly
3,Really
fast!
4,Not
directly supported for molecular graphics. But there are quite a few
python-based solutions. You can use Pymol or BTW.
http://pymol.sourceforge.net
5,USAGE:
a, importing
Bio.PDB :
>>>from
Bio.PDB import *
b, input/output
create
a structure object from a PDB file:

create
a PDBParser object:
>>>parser=PDBParser()

create
a structure object from a PDB file:
>>>structure=parser.get_structure('DOG',
'1LKE.pdb')

create
a structure object from an mmCIF file
create
an MMCIFParser object:
>>>parser=MMCIFParser()
create
a structure object from the mmCIF file
>>>structure=parser.get_structure('DOG','1LKE.cif')
some
more level access to an mmCIF file
you
can create a python dict that maps all mmCIF tags. If there are
multiple values, the tag is mapped into a list of values.
>>>mmcif_dict=MMCIF2Dict('1LKE.cif')
eg:get
solvent content from an mmCIF file:
>>>sc=mmcif_dict('_exptl_crystal.density_percent_sol']
eg:get
the list of the y coordinates of all atoms:
>>>y_list=mmcif_dict['_atom_site.Cartn_y']
Return
to parsing the PDB header
>>>resolution=structure.header['resolution']
>>>keywords=structure.header['keywords']
The
available keys : name, head, deposition_date, release_date,
structure_method, resolution, structure_reference, journal_reference,
author, compound
The
dict can also be created without creating a Structure object
>>>file=open(filename,
'r')
>>>header_dict=parse_pdb_header(file)
>>>file.close()
Download
structure from the PDB:
>>>pdb1=PDBList()
>>>pdb1.retrieve_pdb_file('1LKE')
The
PDBList class can also be used
>>>python
PDBList.py 1LKE
you
must in that directory, of course!
Try
to download the entire PDB if necessary
>>>python
PDBList.py all /data/pdb
>>>>>>python
PDBList.py all /data/pdb -d
Adding
the -d option will store all files in the same directory. It's not a
good choice! Otherwise, they are sorted into PDB-style rectories
according to their PDB ID's.
Keep
a local copy of the PDB up-to-date using PDBList.py object
>>>p1=PDBList(pdb='
/data/pdb')
>>>p1.update_pdb()
Use
the PDBIO class for writing PDB files
eg:
saving a structure
>>>io=PDBIO()
>>>io.set_structure(s)
>>>io.save('out.pdb')
You
can't write mmCIF files
the
overall layout of a structure object:
SMCRA(structure/model/chain/residue/atom)
A
structure consists of models/ A model consists of chains
A
chain consists of residues / A residue consists of atoms
Navigate
through a structure object:
>>>p=PDBParser()
>>>structure=p.get_structure('X',
'pdb1fat.ent')
>>>for
model in structure:
for
chain in model:
for
residue in chain:
for
atom in residue:
print
atom
some
other shortcuts:
>>>#
iterate over all atoms in a structure
>>>for
atom in structure.get_atoms():
print
atom
>>>#
iterate over all residues in a model
>>>for
residue in model.get_residues():
print
residue
structures,
models, chains, residues, atoms are called Entities in Biopython.
You
can always get a parent Entity from a child Entity, eg:
>>>residue=atom.get_parent()
>>>chain=residue.get_parent()
you
can also test whether an Entity has a certain child use has_it method
You
can do that a bit more conveniently
>>>atoms=structure.get_atoms()
>>>residue=structure.get_residues()
>>>atoms=chain.get_atoms()
简单的说，它们(结构，模块，链，残基，原子）是一个范围问题。你可以从上级中抽取下级内容。也可以综合下级找上级（父子关系）
>>>#
get all residues from a structure
>>>res_list=Selection.unfold_entities(structure,
'R')
>>>#
get all atoms from a chain
>>>atom_list=Selection.unfold_entities(chain,
'A')
A=atom,
R=residue, C=chain, M=model, S=structure
也可以跨级操作：
>>>residue_list=Selection.unfold_entities(atom_list,
'R')
>>>chain_list=Selection.unfold_entities(atom_list,'C')
Extract
a specific Atom/Residue/Chain/Model from a structure:
just
use nest structure as list:
>>>model=structure[0]
>>>chain=model['A']
>>>residue=chain[100]
>>>atom=residue['CA']
>>>atom=structure
[0] ['A'] [100] ['CA']
Model
id: an integer which denotes the rank of the model in the
PDB/mmCIF file.
The
model is starts at 0. Crystal structure generally have one model
id(0), while NMR files usually have more
Chain
id: specified in the file, a single
character(typically a letter)
Residue
id: complicated, due to the clumsy
PDB format. A residue id is a tuple with three elements:
1,the
hetero-flag: 'H_' plus the name of the hetero-residue, eg. 'H_GLC',
or 'W' in the case of a water molecule.
2,
sequence identifier in the chain, eg. 100
3,
insertion code: eg. 'A'. The insertion code is sometimes used to
preserve a certain desirable residue numbering scheme
hetero-flag
and insertion code can be blank:
>>>#
full id
>>>residue=chain[('
', 100, ' ')]
>>>#
shortcut id
>>>residue=chain[100]
atom
id: the atom name. Eg: 'CA'
In
PDB files, a space can be part of an atom name.
calcium—'CA..'
, to distinguish from C alpha atom '.CA.'
Disorder
handle
two
views: the atom and the residue point of view
disordered
atoms and residues are stored in special objects that behave as if
there is no disorder