University of the Witwatersrand, Johannesburg | Wits Bioinformatics |
---|
Abdelkrim Rachedi
In this tutorial you will learn how to use the Protein Data Bank, which is the international repository for processing and distributing 3-D macromolecular structure data determined by X-ray crystallography, Nuclear Magnetic Resonance (NMR) and Electron Microscopy. Explore primary, seconday, tertiary structure, ligand examples and ligand environment.
The primary goals of this tutorial are:
1. Learn to access the PDB and explore data in the entries and what they mean.
2. Find and explore entries bound to ligands (inhibitors).
3. Explore ligands binding 3D-environment.
Tutorial Part 1: PDB access & structural data
- The PDB is a flat-files type of database, it is in fact an archive of files containing macromolcules (nucliec and protein) structural data determined by X-ray crystallography, Nuclear Magnetic Resonance (NMR) and Electron Microscopy.
- Data for each entry in the database is stored in a seprate text file with a defined format, see pdb file format.
- Each entry has a four characters PDB id such as 3DFR (can contain numbers and letters). The full name of each file takes the following format pdbxxxx.ent (xxxx=the pdb id) such as pdb3dfr.ent
A.Go to the PDB WWW page., see Fig.1
B.Fig.1: Main pdb page. Pay attention to total number of entries 71794 in the PDB to date.
The top search bar support a variety of search ways such as: PDB id code, keywords, Authors, titles ..etc. To get help on what is offered click on , as is seen red-highlighted in,
see Fig.2
Fig.2:
Top Bar Search Help. Gives help about the types of queries that can be done
Since most likely you do not have a particular pdbid, then type the keywords “dihydrofolate reductase” in the Seach box, see Fig.3, and then click Search button.
Fig.3:
Top main pdb page. Search box with the query text
dihydrofolate Reducate
The Query Result Browser will display a results panel in the left side of the page displaying a hit number of 253 structures, see Fig.4
E.
Fig.4:Typical Query Results pages. Take notice of the Query Refinements informations, Fig.5. See for example Fig.6 that shows related hits in the SCOP database.
Fig.5: Query Refinements information & options.
Fig.6: Dihydrofolate Reductase hits in the SCOP database.
Explore a PDB entry:
View the content of one of the entries by clicking on the icon highlighted in red in Fig.7. This will open a page with the content of the entry.
Fig.7:
Red highlighted icon which when clicked would display the associated pdb entry.
In this case
the pdb entry id is 3FQ0
F.The first two data type lines should be:
HEADER OXIDOREDUCTASE 06-JAN-09 3FQ0 TITLE STAPHYLOCOCCUS AUREUS DIHYDROFOLATE REDUCTASE COMPLEXED TITLE 2 WITH NADPH AND 2,4-DIAMINO-5-(3-(2,5-DIMETHOXYPHENYL)PROP- TITLE 3 1-YNYL)-6-ETHYLPYRIMIDINE (UCP120B)Note that the second line, "TITLE", is further divided because of the length of the title.
Further down you should find the Uniprot cross reference id and the structure sequence:
DBREF 3FQ0 A 1 157 UNP Q2YY41 Q2YY41_STAAB 2 158 SEQRES 1 A 157 THR LEU SER ILE LEU VAL ALA HIS ASP LEU GLN ARG VAL SEQRES 2 A 157 ILE GLY PHE GLU ASN GLN LEU PRO TRP HIS LEU PRO ASN SEQRES 3 A 157 ASP LEU LYS HIS VAL LYS LYS LEU SER THR GLY HIS THR SEQRES 4 A 157 LEU VAL MET GLY ARG LYS THR PHE GLU SER ILE GLY LYS SEQRES 5 A 157 PRO LEU PRO ASN ARG ARG ASN VAL VAL LEU THR SER ASP SEQRES 6 A 157 THR SER PHE ASN VAL GLU GLY VAL ASP VAL ILE HIS SER SEQRES 7 A 157 ILE GLU ASP ILE TYR GLN LEU PRO GLY HIS VAL PHE ILE SEQRES 8 A 157 PHE GLY GLY GLN THR LEU PHE GLU GLU MET ILE ASP LYS SEQRES 9 A 157 VAL ASP ASP MET TYR ILE THR VAL ILE GLU GLY LYS PHE SEQRES 10 A 157 ARG GLY ASP THR PHE PHE PRO PRO TYR THR PHE GLU ASP SEQRES 11 A 157 TRP GLU VAL ALA SER SER VAL GLU GLY LYS LEU ASP GLU SEQRES 12 A 157 LYS ASN THR ILE PRO HIS THR PHE LEU HIS LEU ILE ARG SEQRES 13 A 157 LYS- Can you think about why the sequence data is called "SEQRES"?
Tip: check the Uniprot entry http://www.uniprot.org/uniprot/Q2YY41 against the SEQRES.
The Uniprot sequence is:
10 20 30 40 50 60 MTLSILVAHD LQRVIGFENQ LPWHLPNDLK HVKKLSTGHT LVMGRKTFES IGKPLPNRRN 70 80 90 100 110 120 VVLTSDTSFN VEGVDVIHSI EDIYQLPGHV FIFGGQTLFE EMIDKVDDMY ITVIEGKFRG 130 140 150 DTFFPPYTFE DWEVASSVEG KLDEKNTIPH TFLHLIRKK- What do you see?
Further down you should see: (ignore the two lines in the green area)
1 2 3 4 5 6 7 8 12345678901234567890123456789012345678901234567890123456789012345678901234567890 ATOM 76 N GLN A 11 -16.377 -5.868 50.479 1.00 13.87 N ATOM 77 CA GLN A 11 -15.458 -6.990 50.659 1.00 14.10 C ATOM 78 C GLN A 11 -15.736 -8.147 49.692 1.00 13.37 C ATOM 79 O GLN A 11 -15.240 -9.270 49.878 1.00 13.26 O ATOM 80 CB GLN A 11 -15.480 -7.438 52.121 1.00 14.28 C ATOM 81 CG GLN A 11 -15.032 -6.290 53.031 1.00 17.76 C ATOM 82 CD GLN A 11 -14.888 -6.667 54.485 1.00 21.55 C ATOM 83 OE1 GLN A 11 -15.815 -7.196 55.113 1.00 24.16 O ATOM 84 NE2 GLN A 11 -13.726 -6.354 55.047 1.00 23.67 N ATOM 85 N ARG A 12 -16.499 -7.838 48.638 1.00 12.04 N ATOM 86 CA ARG A 12 -16.901 -8.789 47.595 1.00 11.79 C ATOM 87 C ARG A 12 -17.879 -9.881 48.043 1.00 11.15 C ATOM 88 O ARG A 12 -18.064 -10.863 47.338 1.00 10.81 O ATOM 89 CB ARG A 12 -15.698 -9.385 46.857 1.00 11.69 C ATOM 90 CG ARG A 12 -15.090 -8.455 45.797 1.00 12.28 C ATOM 91 CD ARG A 12 -13.804 -9.032 45.203 1.00 12.87 C ATOM 92 NE ARG A 12 -12.757 -9.049 46.224 1.00 15.08 N ATOM 93 CZ ARG A 12 -12.419 -10.108 46.962 1.00 16.96 C ATOM 94 NH1 ARG A 12 -13.012 -11.291 46.789 1.00 15.70 N ATOM 95 NH2 ARG A 12 -11.468 -9.971 47.884 1.00 18.43 N-What do the columns above mean?
These lines give the atom number, atom name, residue name, polypeptide chain identifier, residue number, x coordinate, y coordinate, z coordinate, occupancy, thermal factor (b-factor) of every atom in the protein and atom type.
The occupency gives information how much does each atom occupy the 3D position; value of 1.0 represents full occupency.
The b-factor tells you about relative mobility of the atoms.
CA is an alpha carbon. Every amino acid has an alpha carbon (with the exception of Glycine). The atoms N, CA, Ca andO are part of the polypeptide backbone called also Main-Chain. Atom beloging to the amino acids' radical group (R-group) also called Side-Chain start from CB, CG ..etc. Hydrogen atoms are missing, because hydrogen atoms are not observed by x-ray diffraction of large molecules like proteins. NMR PDB entries contain hydrogen atoms because they are detected with the NMR technique.
For details see below the ATOM record format:
COLUMNS DATA TYPE FIELD DEFINITION --------------------------------------------------------------------------------- 1 - 6 Record name "ATOM " 7 - 11 Integer serial Atom serial number. 13 - 16 Atom name Atom name. 17 Character altLoc Alternate location indicator. 18 - 20 Residue name resName Residue name. 22 Character chainID Chain identifier. 23 - 26 Integer resSeq Residue sequence number. 27 AChar iCode Code for insertion of residues. 31 - 38 Real(8.3) x Orthogonal coordinates for X in Angstroms. 39 - 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms. 47 - 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms. 55 - 60 Real(6.2) occupancy Occupancy. 61 - 66 Real(6.2) tempFactor Temperature factor. 73 - 76 LString(4) segID Segment identifier, left-justified. 77 - 78 LString(2) element Element symbol, right-justified. 79 - 80 LString(2) charge Charge on the atom.- How many amino acids you have in the list above? give their names
- In both cases, highlight the main and side chain atoms
- What type of secondary structure they blong to. (check this Secondary Structure Format)
NMR entries, hydrogen atoms and models:
The PDB entry 2hm9 is an example of an NMR strcuture. Click pdb2hm9.pdb and explore its content. In particular note the existence of hydrogen atoms and find out how NMR models are described. See also below the graphycal display of the 2hm9 entry:
Tutorial Part 2: Protein Entries & Ligands
A.Click on the tab entitled " 144 Ligands Hits" and explore the list of ligands, see Fig.9.B.
Fig.9: Ligands result page showing DHFR entries (complexes) with bound ligands.Note that each hit has some PDB entries associated with.
Click on of the entries assocaited link, highlighted in Fig.10.a, to see which entries binds this particular ligand.
Fig.10.a: An example of a ligand binding with 2 DHFR complex structures. Highlighted is the link to information about DHFR entries binding the ligand.
Tutorial Part 3: Ligand's 3D-environment & Chemistry
To explore the ligands' 3D-environment & Chemistry, select a pdb ID from the list above, as shown below:
suppose you selected the pdb id 3FQ0
load the web tool: Ligands Sites Explorer , see Fig.11
type in 3FQ0 in "PDB id: box" press ENTER or click "Go"
Fig.11 Ligand Site Explorer page showing data for the pdb enty 3FQ0.Go to the Ligands table;
then explore the links found in the columns entitled "Explore Site" and "Ligand Chemistry"
- Explore Site links: (Wits Bioinformatics tools)
- Environment: gives detailed list of bonds and their types made between the ligand and the protein and water molecules if any, see Fig.12
Fig.12 Ligand Site Explorer: Environment details of the lingand NAP in the binding site of protein 3FQ0.- 3D-view: gives a 3D view of the ligands site and allows for visual exploration, see Fig.13
Fig.13 Ligand Site Explorer: 3D-viewer showing the lingand NAP in the binding site of protein 3FQ0.- Click the on link and explore.
- Take note of residues involved in the binding, bonds and their types.
- Make sure you verify these in the 3D viewer.
- Ligands Chemistry link: (EBI's tool)
- The links allow for the exploration of the chemistry of ligands, see Fig.14
Fig.14 Ligand Site Explorer: EBI's Chem tool showing the chemistry details of lingand NAP.
Glycine smiles string: [C@2H2]([C](=[O])[O-])[NH3+]
Resource: