Entrez Gene Database Description and Files

Entrez Gene is NCBI's repository for gene-specific information. Access to this information either through the Entrez Gene website or by flat files via NCBI's ftp site can be time consuming and limiting in regards to the number of and what questions you can ask about the data. A better solution for intense data mining is to create a relational database.
We offer our MySQL based database and data loading script as an easy-to-implement solution to this problem. While the ER diagram describes the database we created, we also offer the SQL syntax for both the tables and indexes. The data loading script, which is written in Perl, will automatically download Entrez Gene data files, parse the data and load it into the MySQL database.

Requirements
UnixIncluding wget, tar and gzip
PerlBasic Installation, no special modules needed.
Mysql Free, see mysql.com to download.


Files to Download
ER diagramSimple Diagram of the Database
InstructionsGetting started.
Tables and IndexesSQL script to create tables and indexes.
Parsing ScriptPerl script to parse database. See script for details.
Script to load new dataShell script to load new data into database. See script for details.
Entrez Gene Sample QueriesSome example queries you can do with this database.


Descriptions of The Tables
For detailed description of each table and the data within. See NCBI's readme file for Gene README