Relational Databases for Biologists

Session 3: Building and modifying a database with SQL

Exercise: Building and modifying a database with SQL

Goals: Generate SQL to create tables and input data. Use SQL to query, modify, and delete data in your database.

See /education/bioinfo2006/db4bio/ for the course page.

See the solutions to this exercise to check your work.

To connect to MySQL on hebrides, you have two choices:

a. You can connect directly from your laptop or desktop computer to MySQL on hebrides without logging on to hebrides. To do this, you need MySQL on your computer, which is the case for the class laptops. If you need to do this on your own computer, go to the most recent MySQL download page, select your operating system, download, and install. This will get you the MySQL database management system (DBMS), which has the 'mysql' command to connect you to another computer (or you can create a database on your own computer).

b. You can first log on to hebrides and then connect to MySQL. To do this, you need to have or install a ssh (secure shell) application. See Connecting to tak or hebrides or (if using Windows) install Cygwin as described on Using Cygwin/X for X Windows on Tak or Hebrides.

Note that you can install and use MySQL on your own computer. We have chosen to use MySQL on a Linux computer so everyone in the class can access the same database. If you ever use MySQL on your own computer, the MySQL queries would work just the same (but be perhaps somewhat slower).


Connect to MySQL on hebrides using either step 1a (recommended) or 1b, depending upon if you're doing step (a) or (b) above.

1a. Get to the Unix prompt (Mac OS X) or Unix-like prompt (Windows) on your laptop. If you're using Mac OS X, open the X11 terminal (Applications > Utilities > X11), which is also on the dock. If you're using Windows, open Cygwin, which is a "Unix emulator", meaning that it makes your Windows computer act as if it's a Unix computer.

In either case, you now have a terminal, essentially a place to send commands to your computer, including the possibility of connnecting to a remote computer.

Change directories ("cd", a Unix command) to go to the desktop:

cd "$USERPROFILE/Desktop"   (for Windows)
cd Desktop   (for Macintosh)

1b. From the terminal on your laptop, connect to Hebrides using the secure shell (ssh) command:

ssh -l username hebrides.wi.mit.edu

and you'll be prompted for your password.

3. To access MySQL on Hebrides, use this command to connect to the user_db database:

mysql -u username -h hebrides.wi.mit.edu -D user_db -p
with the following options:

-u username (replace with your MySQL username [not necessarily your hebrides username])

-h host (where the "host" is hebrides.wi.mit.edu; not necessary if you're already connected to hebrides)

-D database (which takes you directly to the database we'll be using ('user_db'), the same as your username)

-p password (for which you will be prompted)

ex:   mysql -u guest99 -h hebrides.wi.mit.edu -D guest99 -p

This 'user_db'has been configured so you can create and modify tables in it.

Note that usually, the username and the user's database don't both have the same name.


Try some SQL TO create tables and load data into them.

This can be done any one of several ways (as with other SQL queries):

  1. Type the SQL at the MySQL prompt.
  2. Copy the SQL and paste it into the MySQL prompt.
  3. Create a text file (called, for example, "create_Targets.sql") in your working directory with the SQL. From the MySQL prompt, type
    mysql> source create_Targets.sql;
  4. Send the SQL to MySQL from the Unix prompt (without being logged in to MySQL):
    mysql -u username -h hebrides.wi.mit.edu -D user_db -p < create_Targets.sql

You may wish to consult the solutions.

1. Manually create a new table in your database.

CREATE TABLE Targets
(
	affyId VARCHAR(20) NOT NULL,
	gbId VARCHAR(20) NOT NULL,
	PRIMARY KEY (affyId)
);

You don't have permission to do this in 'db4bio' but you should have permission to do it the database with the same name as your username.

Notice that "affyId" has been selected as the primary key.

If you make a mistake with your table design, you can modify it, but it may be easier to delete it

DROP TABLE Targets;
and re-create it.

2. Download (right click and save to your desktop) a SQL script to create two other tables (Data, Sources). Look at the file to figure out what it's going to do.

Run that SQL to create the tables in your database, using one of the four methods above.

(For those logged into hebrides) This file is also on hebrides: use the command "cp /home/george/db4bio/*.sql ." (without the quotes) to copy it to your directory.

3. Query your database to view the tables and their structure. What data do they contain?

SHOW tables;

describe Targets;
describe Data;
describe Sources;

SELECT * FROM Targets LIMIT 5;
SELECT * FROM Data LIMIT 5;
SELECT * FROM Sources LIMIT 5;
The tables should be empty.

4. Try loading one row of data at a time into the table Data. What do the three fields refer to?


INSERT INTO Data
VALUES("1001_at", "expt_1", 125);
INSERT INTO Data
VALUES("1000_at", "my_experiment", 156);
Do a SELECT and check that the SQL worked and the data loaded as expected. If not, why not? How do you know? Try
SHOW warnings;
if that would be helpful.

5. Correct data in the Data table by performing UPDATE on the table:


UPDATE Data
SET exptId = "expt_1"
WHERE exptId = "my_experim";
Do a SELECT and check that the SQL worked as expected.

6. Delete selected rows from Data:


DELETE FROM Data
WHERE exptId = "expt_1";
Do a SELECT and check that the SQL worked as expected.

7. Download (right click and save) some data into your current directory (the desktop), and load it into the three tables you've created. The data files must be in the same directory from which you logged into MySQL. The data:

  • Data_blood_Hs.txt
  • Data_ovary_Hs.txt
  • Hs_targets.txt
  • Hs_sources.txt

    (For those logged into hebrides) These files are also on hebrides: use the command "cp /home/george/db4bio/*.txt ." (without the quotes) to copy it to your directory.

    The SQL:

    LOAD DATA LOCAL INFILE "Data_blood_Hs.txt" INTO TABLE Data;
    LOAD DATA LOCAL INFILE "Data_ovary_Hs.txt" INTO TABLE Data;
    LOAD DATA LOCAL INFILE "Hs_targets.txt" INTO TABLE Targets;
    LOAD DATA LOCAL INFILE "Hs_sources.txt" INTO TABLE Sources;
    

    Check if you got any warnings. If so, you can view them by

    SHOW warnings;
    

    If you make a mistake with your table loading, you can modify the data, but it may be easier to delete it

    DELETE FROM Targets;
    and re-load it.
  • 8. How should you react to the warning?

    9. You realize that your expression system isn't accurate for very high levels, so you want to set any expression level greater than 10000 to 10000. How would you do that? Execute the query, and check that it worked as expected.

    10. Optional extra credit (review): Query at least two tables in your database any way you want, and print the output to a file on your laptop.

    11. Optional extra credit: Design a table to hold the output data from (10), create it, and load the data.



    Questions or comments?   gbell@wi.mit.edu
    Bioinformatics and Research Computing at Whitehead Institute