RNA Seq Project

Genomics Course, Fall 2020

Examination of gene expression differences upon modulation of the twist transcription factor during drosophila development.

Objective: Perform RNA Seq analysis comparing toll10 mutants with various levels of twist expression to gd7 mutants with no twist expression to determine potential targets of the twist transcription factor.

Background: The RNA-seq data are from 2-4h old embryos, thus the same time point as Twist binding. We are using mutants with different Dorsal activities.

In Toll10b, Dorsal is highly activated and induces the expression of Twist throughout the embryo. This results in mesodermal precursor cells (giving rise to muscle).

In gd7, there is no Dorsal activity and thus Twist is not expressed. This results in dorsal ectodermal precursors (giving rise to extraembryonic tissues and epidermis).

The third genotype is Tollrm9/10, which has medium levels of Dorsal activity, low levels of Twist (maybe) and results in neurectodermal precursors (nervous system and epidermis).

Sample Table

no. genotype Dorsal Activity Replicate ResultFile
1 gd7 none 1 gd7_1.fastq.gz
2 gd7 none 2 gd7_2.fastq.gz
3 gd7 none 3 gd7_3.fastq.gz
4 Tollrm9/10 medium 1 rm9rm10_1.fastq.gz
5 Tollrm9/10 medium 2 rm9rm10_2.fastq.gz
6 Tollrm9/10 medium 3 rm9rm10_3.fastq.gz
7 Toll10b high 1 toll10b_rna_1.fastq.gz
8 Toll10b high 2 toll10b_rna_2.fastq.gz

Overall Plan:

  1. Align the three gd7 FASTQ files using STAR to produce a gapped alignment in BAM format.
  2. Use edgeR to determine differences between mutants.

Note: You only have to align the three gd7 fastq files. All the other fastq files have been aligned already, and are located in a directory called “BAM”.

Optional: Visualize reads in a genome browser Create coverage track and visualize in genome browser

Resources

FASTQ files: /home/cws/CompGenomics/Data/RNA-Seq/

Alignment Index: /home/cws/CompGenomics/dm6/Ens98_STAR_51

Transcript Annotation (GTF): /home/cws/CompGenomics/dm6/dm6.Ens_98.gtf

BAM files (existing alignments): /home/cws/CompGenomics/Data/RNA-Seq/BAM

Run STAR

Use the STAR program, along with a genome index, and a GTF file to align the fastq reads to the drosophila genome to produce an alignment file in BAM format, as well as collect read counts on genes. The STAR tutorial page has details.

Run edgeR

Consolidate the read counts for each sample into a single table and then use edgeR in the R environment to evaluate your table of gene counts between conditions to assess whcih genes are differentially expressed.

Software Documentation

STAR Manual - from github repository

EdgeR - from Bioconductor

Edger User Guide - Clearly written guide. Very helpful!!

Addenda

STAR will produce output track files for a genome browser. Look up the options in the manual.

Other aligners simply produce BAM files and you have to use other tools to create track files. For instance you can turn a BAM file into a genome coverage file using bedtools:

# generate coverage from BAM file
genomeCoverageBed -split -bg -ibam aligned.sorted.bam -g dm6.chrom.sizes > aligned.bedgraph
# convert to bigWig file
wigToBigWig aligned.bedgraph dm6.chrom.sizes myfile.bw

BedTools Genome Coverage

Can you figure out how to scale it to reads per million?