Congratulations, you have successfully completed a "wet" laboratory on DNA forensics. Now it's time to explore the world of Biocomputing to get a feel for the relevant information available on the web.

In the lab you isolated and amplified DNA from the ß-globin gene - or so you were told. Through the web you will perform a database search to see if the primers you used are unique to the ß-globin gene. In addition, you will explore the world of multiple alignment of sequences, phylogenetic trees and view information about population variation at the DNA level.

The DNA sequence of the left primer you used for PCR was agggttggccaatctactcc and the right primer was gaaactgggcatgtggagac. You will first search a database with one primer and then the other.

(anything in blue/purple and underlined is "hotlinked"-click on it and go)

  1. Copy the first sequence and search the sequence against known nucleotide sequences to see if you can identify the gene from which it is from. Do this by using the BLAST algorithm developed at the National Center for Biotechnology Information. BLAST is used by thousands of scientists around the world to compare a sequence of interest with all sequences already known. This is an incredibly powerful and fast tool that can provide you with useful information. These sequence databases are updated on a daily basis.The database you will be search currently contains more than a billion nucleotides from nearly 50,000 different organisms.

  2. Copy and paste the left primer sequence into the form. Search the nr (non-redundant nucleotide) database using the BLASTN program. When you have entered your sequence, submit your query.

    When doing a database query, results are presented in terms of sequence alignments between the input sequence and the database sequence. The strength of the alignment, measured by the alignment score, indicates those sequences most closely related. The confidence of the alignment is reported as an E-value. The E value estimates the statistical significance of the match, specifying the number of matches, with a given score, that are expected in a search of a database of this size absolutely by chance. An E value of two, with a given score, would indicate that two matches with this score, are expected purely by chance. A smaller E-value indicates a more significant match.

  3. What did you get? How many hits are displayed? What gene is this segment of DNA from? Look at the sequence alignments (to do this scroll down the page) to see what sequences are similar to your query sequence. The alignments show what bases in the input sequence match sequences in the database. For each alignment listed, a definition of the entry is displayed as well as the number of bases in the entry. Notice the variation reported in these entries as well as the different organisms with sequences that are identical or nearly identical to the left primer.

    On the result page, click on the links to get additional information about this sequence. Clicking on the entry name will link you to the Entrez page. For example, locate the entry named HUMHBB and click on it. This will give you a look at the entire genomic sequence (all 73308 bp) for the human ß globin region. If you scroll through this page, you will find all sorts of interesting information about the sequence of this region. When you have finished look at this entry, click on the Back button on the top of the page to get back to the BLAST results.

  4. Repeat the BLASTN search with the right primer. Are the results different? Why?

  5. Are you convinced that these primers were specific to ß globin? Why?

  6. While you were doing that, your assistant copied the sequences from the nucleotide sequence database and constructed an alignment of ß globin genes from various animals. You can view that alignment (yellow boxes indicate identical bases; click on upper right corner to enlarge figure and when you are done, close this window by clicking in little box at the top of the window). Based on the alignment, we can also construct a phylogentic tree which shows how the sequences are related evolutionarily. Branches signify points of evolutionary divergence, i.e. a sequence that branches off from another sequence very early(farther to the left) is more distant evolutionarily than a sequence with a much more recent branch. The numbers provide an estimate of the number of nucleotide substitions going from one sequence to another.

    What does this tell you about the relationship among different animals with respect to the ß globin gene?.

    (Phylogenetic trees based on alignments of HIV sequences have shown that a dentist transmitted the virus to his patients.)

  7. Now that we've observed the variation of the ß globin gene across organisms, let's take a look at the variation that is observed in human populations. First we'll look at an example of variation within a human population. The display you see shows the 19 different alleles at a position in the human genome near the ß globin gene. Any given person has only two alleles, one inherited from each parent. Two primers were used to amplify this region of genomic DNA named D11S922. This particular polymorphism ( e.g. difference in DNA sequence among individuals, with the least frequent variant occurring in at least 1% of the population) is measured in terms of size of DNA amplified. The column named Values refers to the size of the DNA in kilobases. For example, .1 kilobases equals 100 bases; .138 equals 138 bases. For this location in the genome, the amplified size of DNA ranges from 88 bases to 138 bases in different individuals. These differences are considered to be normal variation and have no effect on the individuals. There are many thousands of such differences in the human genome. How easy do you think it would be to identify someone based on this type of variation?

  8. Now let us look at an example of variation at the DNA level that has been observed among many populations. Scroll down the page to see the number of different populations that have been tested for this variation. In this example only two alleles are observed - one is 8.3 kilobases and the other is 22 kilobases.The variation is detected using a method known as RFLP - restriction fragment length polymorphism (eg. variation between individuals in DNA fragment sizes cut by specific restriction enzymes). How easy do you think it would be to identify someone based on this type of variation?

 

A Brief Summary of what we have done working with DNA sequences

  1. We started with two primers and searched a sequence database.
  2. We compared the ß globin gene from many different organisms as an alignment and as a phylogenetic tree.
  3. We looked at different kinds of variation at the DNA level both within and between populations.