Exercises in Unix
Table of Contents
Given the tutorial/overview, try to complete the following exercises.
exercise 1
Get onto the system and look around
- Log into a server using your ssh client
- The shared servers available are:
- maple
- hickory
- cypress
- cedar
- franklin.stowers.org (this is the machine we will be using)
Normally, if you were on the internal network you could log into the servers above using: username@maple (to log into maple, for instance). However, since none of us are on the internal network, these machines are not accessible. Instead we have an outside machine with a fully qualified domain address called: franklin.stowers.org
- Find out: who's on the system?
- Type
w
to list who else is logged on.
- Type
- Find out if the server is busy.
- Type
top
orhtop
to see a list of what's happening on the system.q
to quit. (you can google: linux top to see how to interpret top).
- Type
exercise 2
Look around and explore a file
- Step through the examples from the introduction and accomplish the following:
- How many chromosomes have genes as represented by the gene data file: dm6.Ens_98.gene_data.txt?
- How many chromosomes are represented by the fly genome fasta file: dm6.fa?
exercise 3
Now that you know how to get on the system, and look around, it's time to learn how to create content.
A. Create a file to document your work
- Learn to use a text editor to manipulate files. Use emacs or a text editor to create a file to keep track of your work. You're free to use whatever editor you want. If you have an editor you know and like (vi, nano, etc.) you can skip this part.
B. Learn Markdown
- Learn a little markdown to format your writing and share your document.
- Create a markdown file to document your work in the following exercise. Record your UNIX commands.
exercise 4
Analyze two gene lists
Twist and Snail are two transcription factors involved in Drosophila development. We have ChIP-Seq results for each of these proteins where each peak of binding has been assigned to the nearest gene. There is one result file for each factor. They are located in the Data directory:
/home/cws/CompGenomics/Data/ChIP-Seq/dm3/twist_peaks_genes_1kb.txt /home/cws/CompGenomics/Data/ChIP-Seq/dm3/snail_peaks_genes_1kb.txt
Your job is to answer the following questions about these two factors and document your work in a markdown document.
- How many peaks were found for each factor?
- How many unique genes are associated with each factor?
- How many genes are bound by both factors?
exercise 5
Here is a file of gene expression results from another lab (Rembold et al. 2014):
http://furlonglab.embl.de/labData/publications/2014/Rembold-et-al-2014_GenesDev/limma_result_table_stage7.txt.gz
- How would you retreive it from the command line? Grab a copy to your local directory.
- How big is this file? (how many lines are in the file).
- Without decompressing the file to disk, how would you count how many lines are in the file?
- What kinds of fields are in the file?
- This result is from 2014, could you figure out how to count how many gene ids are no longer used?
- Now that you no longer need the file, remove it. How do you do this?
- Document your answers in your analysis markdown file.
Bonus Exercise
Publish your results
- Render your markdown document as html using pandoc.
- See the markdown guide for an example.
- Figure out the URL for your document and send it to a friend.