Welcome to an introduction to Bioinformatics! Here we'll do some searches to get a feel for the kinds of biological information available on the web.

In today's lab, you will explore information about one type of human colon cancer - hereditary non-polyposis colon cancer (HNPCC) and the mismatch repair gene. This is one of the "spellchecker" genes for DNA replication. You will learn its relevance to yeast and bacteria, and see how tools available on the web can help keep researchers and the public informed.

To start you will search the Online Mendelian Inheritance in Man (OMIM) database. This database is a catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere. You will then follow some links to explore other relevant information available to you. Finally, you will see how similar the gene responsible for HNPCC is in a variety of organisms.


What the colors mean:


  1. First let's search OMIM. Here you will enter the words mismatch repair in the Search text box at the top of the page. Then click the Go Button.

  2. You should see a page of results with many links. Click on the link (should be at or near the top of the list)
    *609309 MutS,E.coli,HOMOLOG of,2;MSH2..
  3. First, read the paragraph under the subheading "Description" to get a quick summary.

  4. Scroll down the page and read additional information. One section of particular interest is ALLELIC VARIANTS. Take a look at the some of the more than 20 different mutations in this gene that cause this hereditary form of colon cancer. When you are through, click here to get back to the top of the OMIM article.

  5. Notice the various links on the left side of the page. Although you can click on any of these link, for easier navigation, click RefSeq link here. (RefSeq is a database of genetic sequences and has links to many resources.) Click on NM_000251.Scroll down to the bottom of the document to see the DNA sequence of the MSH2 gene. Note that along the way, you will also see the translation of the DNA sequence represented in single letter amino acid codes.

  6. Now let's look at the alignment for human, mouse and rat copies of the gene. The amino acids (identified by their single letter code) that are colored are identical in human and at least one other species. Notice how similar the sequences from these related organisms are. Now take a look at the alignment for more distantly related species known to have this gene. Notice that now the only amino acids colored are those that are common to at least 5 organisms. To magnify the alignment, click on it. Scroll through the alignment to find an area of the gene that is very similar in all organisms. This so called "conserved" region may be important in the three-dimensional structure of the protein.

  7. There's one more step before completing this lab. You will use a tool that scientists use on a daily basis - BLAST. This is a database search tool that takes as input a DNA or protein sequence and searches against a database of known sequences. Below is the human MSH2 gene. You can use this protein sequence to search against any of the available protein databases to see what sequences are similar to the human sequence. The human sequence is listed below:
    MAVQPKETLQLESAAEVGFVRFFQGMPEKPTTTVR
    LFDRGDFYTAHGEDALLAAREVFKTQGVIKYMGPA
    GAKNLQSVVLSKMNFESFVKDLLLVRQYRVEVYKN
    RAGNKASKENDWYLAYKASPGNLSQFEDILFGNND
    MSASIGVVGVKMSAVDGQRQVGVGYVDSIQRKLGL
    CEFPDNDQFSNLEALLIQIGPKECVLPGGETAGDM
    GKLRQIIQRGGILITERKKADFSTKDIYQDLNRLL
    KGKKGEQMNSAVLPEMENQVAVSSLSAVIKFLELL
    SDDSNFGQFELTTFDFSQYMKLDIAAVRALNLFQG
    SVEDTTGSQSLAALLNKCKTPQGQRLVNQWIKQPL
    MDKNRIEERLNLVEAFVEDAELRQTLQEDLLRRFP
    DLNRLAKKFQRQAANLQDCYRLYQGINQLPNVIQA
    LEKHEGKHQKLLLAVFVTPLTDLRSDFSKFQEMIE
    TTLDMDQVENHEFLVKPSFDPNLSELREIMNDLEK
    KMQSTLISAARDLGLDPGKQIKLDSSAQFGYYFRV
    TCKEEKVLRNNKNFSTVDIQKNGVKFTNSKLTSLN
    EEYTKNKTEYEEAQDAIVKEIVNISSGYVEPMQTL
    NDVLAQLDAVVSFAHVSNGAPVPYVRPAILEKGQG
    RIILKASRHACVEVQDEIAFIPNDVYFEKDKQMFH
    IITGPNMGGKSTYIRQTGVIVLMAQIGCFVPCESA
    EVSIVDCILARVGAGDSQLKGVSTFMAEMLETASI
    LRSATKDSLIIIDELGRGTSTYDGFGLAWAISEYI
    ATKIGAFCMFATHFHELTALANQIPTVNNLHVTAL
    TTEETLTMLYQVKKGVCDQSFGIHVAELANFPKHV
    IECAKQKALELEEFQYIGESQGYDIMEPAAKKCYL
    EREQGEKIIQEFLSKVKQMPFTEMSEENITIKLKQ
    LKAEVIAKNNSFVNEIISRIKVTT

    BLAST searching can be slow so we have saved results from previous searches of Proteins from all organisms and, another search, Only proteins in various species of insects for you to review. Scroll through the document to see the sequences that were found and also take a look at some of the alignments. See below if you would like to try BLAST yourself.

  8. OPTIONAL: To use BLAST, you will first need to choose the database to search. You can select any of the organisms in the GENOMES box. Each page might be slightly different, but you will need to make sure you select BLASTP from the pull down Program menu and choose the genomes you'd like to search. Then copy the sequence above and paste it into the big text box on the BLAST page. Click the Begin Search or BLAST button to start your search. On the next page, click on Format or Format Results to see the results of your database search. Sometimes there is a wait to see results. If this is the case, you should see a message on your screen about this. Once the results come back take a look at the alignments and notice what parts of the sequence are conserved.

 

A Brief Summary of what we have done investigating the genetics of a disease:

  1. We searched the OMIM database for the disease gene known as "mismatch repair"
  2. and examined some results.
  3. We looked at the complete sequence of the mismatch repair gene.
  4. We looked at mutations in the mismatch repair gene.
  5. We looked at alignments of sequences in a variety of organisms.
  6. We did a database search to find sequences in other organisms that are similar to the sequence in humans.