Unix Tips and Tricks - Part 1 January 29, 2008 Exercises Perform the following tasks and answer the questions. Everything can be done with a Unix command or commands. Exercises are at http://iona.wi.mit.edu/bio/education/hot_topics/unix1/unix_tips_tricks_1_ex.txt These solutions are at http://iona.wi.mit.edu/bio/education/hot_topics/unix1/solutions_1.txt 0 - Open a Unix terminal on your computer 1 - mkdir unix1 cd unix1 2 - wget http://iona.wi.mit.edu/bio/education/hot_topics/unix1/refSeq_hg18.zip 3 - unzip refSeq_hg18.zip 4 - head refSeq_hg18.txt How many fields wide is it? 6 wc -l refSeq_hg18.txt How many lines long is it? 25990 5 - 6 - cat -A refSeq_hg18.txt | head ^I refers to tabs, so it's a tab-delimited file 7 - cut -f2 refSeq_hg18.txt | sort -u | wc -l cut -f2 refSeq_hg18.txt | sort | uniq | wc -l 25031 8 - cut -f1 refSeq_hg18.txt | sort -u | wc -l cut -f1 refSeq_hg18.txt | sort | uniq | wc -l 18593 9 - cut -f2 refSeq_hg18.txt | sort | uniq -d | wc -l 579 appear more than once, largely because mRNA sequence maps to multiple places in the genome cut -f2 refSeq_hg18.txt | sort | uniq -d > mult_transcripts.txt 10- cut -f3 refSeq_hg18.txt | sort -u | wc -l 42 (Note that there are some random bits of the genome that can't be placed in context.) 11- grep chrY refSeq_hg18.txt | cut -f1 > chrY_genes.txt 12- sort -k3,3 -k4,4 -k 5,5n refSeq_hg18.txt > refSeq_hg18_sorted.txt 13- awk -F"\t" '$4 == "+" {print $0}' refSeq_hg18.txt > refSeq_hg18_plus.txt awk -F"\t" '$4 == "-" {print $0}' refSeq_hg18.txt > refSeq_hg18_neg.txt 14- wc -l refSeq_hg18.txt [25990 lines / 4 ==> ~ 6498] split -l 6498 -d refSeq_hg18.txt Part_ mv Part_00 Part_1.txt mv Part_01 Part_2.txt mv Part_02 Part_3.txt mv Part_03 Part_4.txt 15- grep chr19 refSeq_hg18.txt | sort -k6nr | cut -f1 | head -7 CHMP2A MZF1 TRIM28 UBE2M ZBTB45 16- sort -k6nr refSeq_hg18.txt | head -1 chr1 is at least 247179967 nt 17- grep BMP refSeq_hg18.txt > BMP_etc.txt 18- cut -f1 BMP_etc.txt | sort | uniq -d 19- awk -F"\t" '{print $0 "\t" $6 - $5}' refSeq_hg18.txt | sort -k7nr | head -1 CNTNAP2 (NM_014141) awk -F"\t" '{print $0 "\t" $6 - $5}' refSeq_hg18.txt | sort -k7n | head -1 KRTAP22-1 (NM_181620) 20- awk -F"\t" '{ print $2 "\t" $3 ":" $5 "-" $6}' refSeq_hg18.txt> refSeq_hg18_coords.txt =================