You'll be using a sample of expression data from a study using Affymetrix (one color) U95A arrays that were hybridized to tissues from fetal and human liver and brain tissue. Each hybridization was performed in duplicate. Many other tissues were also profiled but won't be used for these exercises.
What we'll be doing to analyze these data:
You'll be using R and Bioconductor
(a set of packages that run in R) to do most of the mathematical analyses.
R is a free, very powerful statistics environment but it requires commands to perform every step of an analysis pipeline.
These commands can be pasted into the program. Type '?myCommand' to get a help page about the command 'myCommand'.
Preliminary information: Image analysis and calculation of expression value
Class 1 exercises
Part 0. Preprocessing and normalization of Affymetrix expression data
source("http://bioconductor.org/biocLite.R")
biocLite()
library(affy)
affy.data = ReadAffy()
eset.mas5 = mas5(affy.data)
exprSet.nologs = exprs(eset.mas5)
# List the column (chip) names
colnames(exprSet.nologs)
# Rename the column names if we want
colnames(exprSet.nologs) = c("brain.1", "brain.2",
"fetal.brain.1", "fetal.brain.2",
"fetal.liver.1", "fetal.liver.2",
"liver.1", "liver.2")
exprSet = log(exprSet.nologs, 2)
eset.rma = justRMA()
or GCRMA (which also outputs log2-transformed expression values)
library(gcrma)
eset.gcrma = justGCRMA()
or dChip (also known as MBEI; not log-transformed)
eset.dChip = expresso(affy.data, normalize.method="invariantset",
bg.correct=FALSE, pmcorrect.method="pmonly",summary.method="liwong")
write.table(exprSet, file="Su_mas5_matrix.txt", quote=F, sep="\t")
to get a tab-delimited file that we could view in Excel or a text editor.
# Run the Affy A/P call algorithm on the CEL files we processed above
data.mas5calls = mas5calls(affy.data)
# Get the actual A/P calls
data.mas5calls.calls = exprs(data.mas5calls)
# Print the calls as a matrix
write.table(data.mas5calls.calls, file="Su_mas5calls.txt", quote=F, sep="\t")
Part I. Normalization of expression data [Optional]
exprSetRaw = read.delim("Su_raw_matrix.txt")
trmean.col.1 = mean(exprSetRaw[,1], trim=0.02)
trmean = apply(exprSetRaw, 2, mean, trim=0.02)
trmean
sd = apply(exprSetRaw, 2, sd)
sd
median = apply(exprSetRaw, 2, median)
median
mean.of.trmeans = mean(trmean)
exprSet.trmean = exprSetRaw / trmean * mean.of.trmeans
write.table(exprSet.trmean, file="Su_mas5_trmean_norm.txt", quote=F, sep="\t")
exprSet = exprSet.trmean
library(limma)
exprSet.quantile = normalizeQuantiles(exprSet)
brain.fetalbrain.2color = read.maimages("brain.fetalbrain.2color.data.txt",
columns=list(G="brain.1", R="fetal.brain.1", Gb="bg1", Rb="bg2"))
brain.fetalbrain.2color.loess =
normalizeWithinArrays(brain.fetalbrain.2color, method="loess")
# Set up a page with two figures next to each other
par(mfrow=c(1,2))
# Print the figures
plotMA(brain.fetalbrain.2color)
plotMA(brain.fetalbrain.2color.loess)
Part II. Calculating log2 ratios
brain.mean = apply(exprSet[, c("brain.1", "brain.2")], 1, mean)
fetal.brain.mean = apply(exprSet[, c("fetal.brain.1", "fetal.brain.2")], 1, mean)
liver.mean = apply(exprSet[, c("liver.1", "liver.2")], 1, mean)
fetal.liver.mean = apply(exprSet[, c("fetal.liver.1", "fetal.liver.2")], 1, mean)
brain.fetal.to.adult = fetal.brain.mean - brain.mean
liver.fetal.to.adult = fetal.liver.mean - liver.mean
Part III. Put all the data together
all.data = cbind(exprSet, brain.mean, fetal.brain.mean, liver.mean, fetal.liver.mean,
brain.fetal.to.adult, liver.fetal.to.adult)
# Check what data we have here
colnames(all.data)
write.table(all.data, file="Microarray_Analysis_data_1_SOLUTION.txt", quote=F, sep="\t")