Relational Databases for Biologists

Session 2: Building and modifying a database with SQL

Solutions to Exercise 3: Building and modifying a database with SQL

Goals: Generate SQL to create tables and input data. Use SQL to query, modify, and delete data in your database.

1. To check that the SQL worked, try

DESCRIBE Targets;
Is it consistent with your desired design?

2. The SQL file looks like this

CREATE TABLE Data
(
	affyId varchar(30) NOT NULL,
	exptId varchar(10) NOT NULL,
	level int(5),
	PRIMARY KEY (affyId, exptId)
);

CREATE TABLE Sources
(
	exptId varchar(10) NOT NULL,
	source varchar(100),
	PRIMARY KEY (exptId)
);

CREATE TABLE Targets
(
	affyId varchar(20) NOT NULL,
	description varchar(100) NOT NULL,
	PRIMARY KEY (affyId)
);

4. After the second line of the query, you should get output like this:

Query OK, 1 row affected, 1 warning (0.00 sec)
showing that one row of data was affected (the data was loaded) but that there's a warning.

The output of "SHOW warnings;" is

+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'exptId' at row 1 |
+---------+------+---------------------------------------------+
1 row in set (0.00 sec) 
We tried to put text ("my_experiment") into a field that's designed to hold a "varchar" of 10 characters. So either you have to change the data you're importing or change the design of the table to hold a longer "varchar".

7. The last LOAD command should have given you a warning:

+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'exptId' at row 3 |
+---------+------+---------------------------------------------+
1 row in set (0.00 sec) 
In the next step we'll look at some ways to correct the problem.

8. You could react to a warning like this in one of several ways:

a. If truncation is not much of a problem, you could ignore the warning.

b. You can shorten the data to fit the field:

UPDATE Sources
SET exptId = "hs-gzrd-1"
WHERE exptId = "hs-gizzard-1";

c. If the data line is wrong, you could delete it:

DELETE FROM Sources
WHERE exptId = "hs-gizzard";

d. You could modify the data type of the field:

ALTER TABLE Sources
CHANGE exptId exptId varchar(12);
The "exptId exptId" shows that you want to keep the name ("exptId") of the field.

After changing the table structure, however, the mistake will still be there, so you'll have to alter it or delete all the data and reload it.

9.

UPDATE Data
SET level = 10000
WHERE level > 10000;
and check it
SELECT * from Data
WHERE level >= 10000;

10. As an example, let's imagine I'd like to get a list of high-confidence differentially expressed genes, which I'll define as those genes that have an expression level 2^5 (32-fold) higher in one tissue that in another tissue. So here's a query to get these

SELECT Data.*, DataCopy.exptId, DataCopy.level,
FORMAT((LOG2(Data.level) - LOG2(DataCopy.level)), 4) AS log2Ratio
FROM Data, Data DataCopy
WHERE Data.affyId=DataCopy.affyId
AND Data.exptId != DataCopy.exptId
AND
( (LOG2(Data.level) - LOG2(DataCopy.level)) > 5
OR
(LOG2(Data.level) - LOG2(DataCopy.level)) < -5 )
ORDER by LOG2(Data.level) - LOG2(DataCopy.level) DESC;
Note the nested combination of AND and OR conditions, and also note the use of FORMAT(x, 4) which rounds off number "x" to 4 decimal places.

After placing this query into a file ("getDiffExp.sql") into the current directory, we want to issue the query and save the output into another file:

mysql -u username -h hebrides.wi.mit.edu -D user_db -p < getDiffExp.sql > DiffExp.txt
which should put the data for the differentially expressed genes into the file "DiffExp.txt". Look at the file and see if it worked.

11. To design a table to hold the data from (10), the following is possible

CREATE TABLE Interesting
(
    affyId varchar(30) NOT NULL,
    exptId_1 varchar(10) NOT NULL,
    level_1 int(5),
    exptId_2 varchar(10) NOT NULL,
    level_2 int(5),
    log2ratio float(10, 5),
    PRIMARY KEY (affyId, exptId_1, exptId_2)
); 
And finally, to load the data, log in to MySQL, go to my database, and use the command
LOAD DATA LOCAL INFILE "DiffExp.txt" INTO TABLE Interesting;
Check that the data loaded okay.


Questions or comments?   gbell@wi.mit.edu
Bioinformatics and Research Computing at Whitehead Institute