Congratulations, you have successfully
completed a "wet" laboratory on DNA forensics. Now it's time to
explore the world of Biocomputing to get a feel for the relevant
information available on the web.
In the lab you isolated and amplified DNA from
the ß-globin gene - or so you were told. Through the web you
will perform a database search to see if the primers you used are
unique to the ß-globin gene. In addition, you will explore
the world of multiple alignment of sequences, phylogenetic trees
and view information about population variation at the DNA
level.
The DNA sequence of the left primer you used
for PCR was
agggttggccaatctactcc
and the right primer was
gaaactgggcatgtggagac.
You will first search a
database with one primer and then the other.
- Copy the first sequence and search the
sequence against known nucleotide sequences to see if you can
identify the gene from which it is from. Do this by using the
BLAST
algorithm developed at the National Center for Biotechnology
Information. BLAST is used by thousands of scientists around the
world to compare a sequence of interest with all sequences already
known. This is an incredibly powerful and fast tool that can
provide you with useful information. These sequence databases are
updated on a daily basis.The database you will be search currently
contains more than a billion nucleotides from nearly 50,000
different organisms.
- Copy and paste the left primer sequence into
the form. Search the nr (non-redundant nucleotide) database
using the BLASTN program. When you have entered your
sequence, submit your query.
When doing a database query, results are
presented in terms of sequence alignments between the input
sequence and the database sequence. The strength of the alignment,
measured by the alignment score, indicates those sequences most
closely related. The confidence of the alignment is reported as an
E-value. The E value estimates the statistical significance of the
match, specifying the number of matches, with a given score, that
are expected in a search of a database of this size absolutely by
chance. An E value of two, with a given score, would indicate that
two matches with this score, are expected purely by chance. A
smaller E-value indicates a more significant match.
- What did you get? How many hits are displayed?
What gene is this segment of DNA from? Look at the sequence
alignments (to do this scroll down the page) to see what sequences
are similar to your query sequence. The alignments show what bases
in the input sequence match sequences in the database. For each
alignment listed, a definition of the entry is displayed as well
as the number of bases in the entry. Notice the variation reported
in these entries as well as the different organisms with sequences
that are identical or nearly identical to the left primer.
On the result page, click on the links to get
additional information about this sequence. Clicking on the entry
name will link you to the Entrez page. For example, locate
the entry named HUMHBB and click on it. This will give you a look
at the entire genomic sequence (all 73308 bp) for the human
ß globin region. If you scroll through this page, you will
find all sorts of interesting information about the sequence of
this region. When you have finished look at this entry, click on
the Back button on the top of the page to get back to the
BLAST results.
- Repeat the BLASTN
search with the right primer. Are the results different?
Why?
- Are you convinced that these primers were
specific to ß globin? Why?
- While you were doing that, your assistant
copied the sequences from the nucleotide sequence database and
constructed an alignment of ß globin genes from various
animals. You can view that alignment
(yellow boxes indicate identical
bases; click on upper right corner to enlarge figure and when you
are done, close this window by clicking in little box at the top
of the window). Based on the alignment, we can also construct a
phylogentic
tree which shows how the sequences
are related evolutionarily. Branches signify points of
evolutionary divergence, i.e. a sequence that branches off from
another sequence very early(farther to the left) is more distant
evolutionarily than a sequence with a much more recent branch. The
numbers provide an estimate of the number of nucleotide
substitions going from one sequence to another.
What does this tell you about the relationship
among different animals with respect to the ß globin gene?.
(Phylogenetic trees based on alignments of HIV
sequences have shown that a dentist
transmitted the virus to his
patients.)
- Now that we've observed the variation of the
ß globin gene across organisms, let's take a look at the
variation that is observed in human populations. First we'll look
at an example of variation
within a human population. The display
you see shows the 19 different alleles at a position in the human
genome near the ß globin gene. Any given person has only two
alleles, one inherited from each parent. Two primers were used to
amplify this region of genomic DNA named D11S922. This particular
polymorphism ( e.g. difference in DNA sequence among individuals,
with the least frequent variant occurring in at least 1% of the
population) is measured in terms of size of DNA amplified. The
column named Values refers to the size of the DNA in
kilobases. For example, .1 kilobases equals 100 bases; .138 equals
138 bases. For this location in the genome, the amplified size of
DNA ranges from 88 bases to 138 bases in different individuals.
These differences are considered to be normal variation and have
no effect on the individuals. There are many thousands of such
differences in the human genome. How easy do you think it would be
to identify someone based on this type of variation?
- Now let us look at an example of variation at
the DNA level that has been observed among
many populations. Scroll down the page
to see the number of different populations that have been tested
for this variation. In this example only two alleles are observed
- one is 8.3 kilobases and the other is 22 kilobases.The variation
is detected using a method known as RFLP - restriction fragment
length polymorphism (eg. variation between individuals in DNA
fragment sizes cut by specific restriction enzymes). How easy do
you think it would be to identify someone based on this type of
variation?