PDB and Protein Structure Tutorial

		University of the Witwatersrand, Johannesburg

		Wits Bioinformatics

Protein Three Dimensional Structure
and Function using PDB

Summer University 2011

Abdelkrim Rachedi

In this tutorial you will learn how to use the Protein Data Bank, which is the international repository for processing and distributing 3-D macromolecular structure data determined by X-ray crystallography, Nuclear Magnetic Resonance (NMR) and Electron Microscopy. Explore primary, seconday, tertiary structure, ligand examples and ligand environment.

The primary goals of this tutorial are:

1. Learn to access the PDB and explore data in the entries and what they mean.

2. Find and explore entries bound to ligands (inhibitors).

3. Explore ligands binding 3D-environment.

You need to double-click on the thumbnails to enlarge and single-click to restore to thumbnail size.

Tutorial Part 1: PDB access & structural data

- The PDB is a flat-files type of database, it is in fact an archive of files containing macromolcules (nucliec and protein) structural data determined by X-ray crystallography, Nuclear Magnetic Resonance (NMR) and Electron Microscopy.

- Data for each entry in the database is stored in a seprate text file with a defined format, see pdb file format.

- Each entry has a four characters PDB id such as 3DFR (can contain numbers and letters). The full name of each file takes the following format pdbxxxx.ent (xxxx=the pdb id) such as pdb3dfr.ent

Go to the PDB WWW page., see Fig.1

Fig.1: Main pdb page. Pay attention to total number of entries 71794 in the PDB to date.

The top search bar support a variety of search ways such as: PDB id code, keywords, Authors, titles ..etc. To get help on what is offered click on , as is seen red-highlighted in ,
see Fig.2

Fig.2: Top Bar Search Help. Gives help about the types of queries that can be done

Since most likely you do not have a particular pdbid, then type the keywords “dihydrofolate reductase” in the Seach box, see Fig.3, and then click Search button.

Fig.3: Top main pdb page. Search box with the query text dihydrofolate Reducate

The Query Result Browser will display a results panel in the left side of the page displaying a hit number of 253 structures, see Fig.4

Fig.4:Typical Query Results pages. Take notice of the Query Refinements informations, Fig.5. See for example Fig.6 that shows related hits in the SCOP database.

Fig.5: Query Refinements information & options.

Fig.6: Dihydrofolate Reductase hits in the SCOP database.

Explore a PDB entry:

View the content of one of the entries by clicking on the icon highlighted in red in Fig.7. This will open a page with the content of the entry.

Fig.7: Red highlighted icon which when clicked would display the associated pdb entry.
In this case the pdb entry id is 3FQ0

The first two data type lines should be:

HEADER    OXIDOREDUCTASE                          06-JAN-09   3FQ0              
TITLE     STAPHYLOCOCCUS AUREUS DIHYDROFOLATE REDUCTASE COMPLEXED               
TITLE    2 WITH NADPH AND 2,4-DIAMINO-5-(3-(2,5-DIMETHOXYPHENYL)PROP-           
TITLE    3 1-YNYL)-6-ETHYLPYRIMIDINE (UCP120B)

Note that the second line, "TITLE", is further divided because of the length of the title.

Further down you should find the Uniprot cross reference id and the structure sequence:

DBREF  3FQ0 A    1   157  UNP    Q2YY41   Q2YY41_STAAB     2    158             
SEQRES   1 A  157  THR LEU SER ILE LEU VAL ALA HIS ASP LEU GLN ARG VAL          
SEQRES   2 A  157  ILE GLY PHE GLU ASN GLN LEU PRO TRP HIS LEU PRO ASN          
SEQRES   3 A  157  ASP LEU LYS HIS VAL LYS LYS LEU SER THR GLY HIS THR          
SEQRES   4 A  157  LEU VAL MET GLY ARG LYS THR PHE GLU SER ILE GLY LYS          
SEQRES   5 A  157  PRO LEU PRO ASN ARG ARG ASN VAL VAL LEU THR SER ASP          
SEQRES   6 A  157  THR SER PHE ASN VAL GLU GLY VAL ASP VAL ILE HIS SER          
SEQRES   7 A  157  ILE GLU ASP ILE TYR GLN LEU PRO GLY HIS VAL PHE ILE          
SEQRES   8 A  157  PHE GLY GLY GLN THR LEU PHE GLU GLU MET ILE ASP LYS          
SEQRES   9 A  157  VAL ASP ASP MET TYR ILE THR VAL ILE GLU GLY LYS PHE          
SEQRES  10 A  157  ARG GLY ASP THR PHE PHE PRO PRO TYR THR PHE GLU ASP          
SEQRES  11 A  157  TRP GLU VAL ALA SER SER VAL GLU GLY LYS LEU ASP GLU          
SEQRES  12 A  157  LYS ASN THR ILE PRO HIS THR PHE LEU HIS LEU ILE ARG          
SEQRES  13 A  157  LYS

- Can you think about why the sequence data is called "SEQRES"?

Tip: check the Uniprot entry http://www.uniprot.org/uniprot/Q2YY41 against the SEQRES.

The Uniprot sequence is:

        10         20         30         40         50         60 
MTLSILVAHD LQRVIGFENQ LPWHLPNDLK HVKKLSTGHT LVMGRKTFES IGKPLPNRRN 

        70         80         90        100        110        120 
VVLTSDTSFN VEGVDVIHSI EDIYQLPGHV FIFGGQTLFE EMIDKVDDMY ITVIEGKFRG 

       130        140        150 
DTFFPPYTFE DWEVASSVEG KLDEKNTIPH TFLHLIRKK

- What do you see?

Further down you should see: (ignore the two lines in the green area)

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890

ATOM     76  N   GLN A  11     -16.377  -5.868  50.479  1.00 13.87           N  
ATOM     77  CA  GLN A  11     -15.458  -6.990  50.659  1.00 14.10           C  
ATOM     78  C   GLN A  11     -15.736  -8.147  49.692  1.00 13.37           C  
ATOM     79  O   GLN A  11     -15.240  -9.270  49.878  1.00 13.26           O  
ATOM     80  CB  GLN A  11     -15.480  -7.438  52.121  1.00 14.28           C  
ATOM     81  CG  GLN A  11     -15.032  -6.290  53.031  1.00 17.76           C  
ATOM     82  CD  GLN A  11     -14.888  -6.667  54.485  1.00 21.55           C  
ATOM     83  OE1 GLN A  11     -15.815  -7.196  55.113  1.00 24.16           O  
ATOM     84  NE2 GLN A  11     -13.726  -6.354  55.047  1.00 23.67           N  
ATOM     85  N   ARG A  12     -16.499  -7.838  48.638  1.00 12.04           N  
ATOM     86  CA  ARG A  12     -16.901  -8.789  47.595  1.00 11.79           C  
ATOM     87  C   ARG A  12     -17.879  -9.881  48.043  1.00 11.15           C  
ATOM     88  O   ARG A  12     -18.064 -10.863  47.338  1.00 10.81           O  
ATOM     89  CB  ARG A  12     -15.698  -9.385  46.857  1.00 11.69           C  
ATOM     90  CG  ARG A  12     -15.090  -8.455  45.797  1.00 12.28           C  
ATOM     91  CD  ARG A  12     -13.804  -9.032  45.203  1.00 12.87           C  
ATOM     92  NE  ARG A  12     -12.757  -9.049  46.224  1.00 15.08           N  
ATOM     93  CZ  ARG A  12     -12.419 -10.108  46.962  1.00 16.96           C  
ATOM     94  NH1 ARG A  12     -13.012 -11.291  46.789  1.00 15.70           N  
ATOM     95  NH2 ARG A  12     -11.468  -9.971  47.884  1.00 18.43           N

-What do the columns above mean?

These lines give the atom number, atom name, residue name, polypeptide chain identifier, residue number, x coordinate, y coordinate, z coordinate, occupancy, thermal factor (b-factor) of every atom in the protein and atom type.

The occupency gives information how much does each atom occupy the 3D position; value of 1.0 represents full occupency.

The b-factor tells you about relative mobility of the atoms.

CA is an alpha carbon. Every amino acid has an alpha carbon (with the exception of Glycine). The atoms N, CA, Ca andO are part of the polypeptide backbone called also Main-Chain. Atom beloging to the amino acids' radical group (R-group) also called Side-Chain start from CB, CG ..etc. Hydrogen atoms are missing, because hydrogen atoms are not observed by x-ray diffraction of large molecules like proteins. NMR PDB entries contain hydrogen atoms because they are detected with the NMR technique.

For details see below the ATOM record format:

COLUMNS DATA TYPE FIELD DEFINITION --------------------------------------------------------------------------------- 1 - 6 Record name "ATOM " 7 - 11 Integer serial Atom serial number. 13 - 16 Atom name Atom name. 17 Character altLoc Alternate location indicator. 18 - 20 Residue name resName Residue name. 22 Character chainID Chain identifier. 23 - 26 Integer resSeq Residue sequence number. 27 AChar iCode Code for insertion of residues. 31 - 38 Real(8.3) x Orthogonal coordinates for X in Angstroms. 39 - 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms. 47 - 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms. 55 - 60 Real(6.2) occupancy Occupancy. 61 - 66 Real(6.2) tempFactor Temperature factor. 73 - 76 LString(4) segID Segment identifier, left-justified. 77 - 78 LString(2) element Element symbol, right-justified. 79 - 80 LString(2) charge Charge on the atom.

- How many amino acids you have in the list above? give their names

- In both cases, highlight the main and side chain atoms

- What type of secondary structure they blong to. (check this Secondary Structure Format)

NMR entries, hydrogen atoms and models:

The PDB entry 2hm9 is an example of an NMR strcuture. Click pdb2hm9.pdb and explore its content. In particular note the existence of hydrogen atoms and find out how NMR models are described. See also below the graphycal display of the 2hm9 entry:

Note in particular the multiple conformations of the ligand TRR (2,4-DIAMINO-5-(3,4,5-TRIMETHOXY-BENZYL)-PYRIMIDIN-1- IUM). See also below figures:

Tutorial Part 2: Protein Entries & Ligands

Click on the tab entitled " 144 Ligands Hits" and explore the list of ligands, see Fig.9.

Fig.9: Ligands result page showing DHFR entries (complexes) with bound ligands.

Note that each hit has some PDB entries associated with.

Click on of the entries assocaited link, highlighted in Fig.10.a, to see which entries binds this particular ligand.

Fig.10.a: An example of a ligand binding with 2 DHFR complex structures. Highlighted is the link to information about DHFR entries binding the ligand.

Tutorial Part 3: Ligand's 3D-environment & Chemistry

To explore the ligands' 3D-environment & Chemistry, select a pdb ID from the list above, as shown below:

suppose you selected the pdb id 3FQ0

load the web tool: Ligands Sites Explorer
, see Fig.11

type in 3FQ0 in "PDB id: box" press ENTER or click "Go"

Fig.11 Ligand Site Explorer page showing data for the pdb enty 3FQ0.

Go to the Ligands table;
then explore the links found in the columns entitled "Explore Site" and "Ligand Chemistry"

Explore Site links: (Wits Bioinformatics tools)

Environment: gives detailed list of bonds and their types made between the ligand and the protein and water molecules if any, see Fig.12

Fig.12 Ligand Site Explorer: Environment details of the lingand NAP in the binding site of protein 3FQ0.

3D-view: gives a 3D view of the ligands site and allows for visual exploration, see Fig.13

Fig.13 Ligand Site Explorer: 3D-viewer showing the lingand NAP in the binding site of protein 3FQ0.

- Click the on link and explore.

- Take note of residues involved in the binding, bonds and their types.

- Make sure you verify these in the 3D viewer.

Ligands Chemistry link: (EBI's tool)

The links allow for the exploration of the chemistry of ligands, see Fig.14

Fig.14 Ligand Site Explorer: EBI's Chem tool showing the chemistry details of lingand NAP.

Glycine smiles string: [C@2H2]([C](=[O])[O-])[NH3+]

Resource:

Visual Biochemistry

Jmol Wiki

Protein Three Dimensional Structureand Function using PDBSummer University 2011

Protein Three Dimensional Structure
and Function using PDB

Summer University 2011