Affy GeneChip MOE430A: Probe Sequences in a ProbeSet

Affy simplifies the probe set selection process with the following diagram from
www.affymetrix.com/support/technical/manual/allother_comparision_manual.pdf

Affy defines an exemplar string or a consensus string, but not both.

Affy provides the following FASTA sequence files for MOE430A (see www.affymetrix.com/analysis/download_center.affx):

Probe sequence information is available in two different files:  MOE430A_probe_fasta or MOE430A_probe_tab.  Information can be loaded into a database table with the "tab" file.

Here's what the first consensus sequence looks like:

head -71 MOE430A_consensus | columnize 80
>consensus:MOE430A:1415770_at; gb|NM_031392; gb:NM_031392.1 /DB_XREF=gi:13878226
/GEN=Wdr6 /FEA=FLmRNA /CNT=712 /TID=Mm.29493.1 /TIER=FL+Stack /STK=570 /UG=Mm.29493
/LL=83669 /DEF=Mus musculus WD repeat domain 6 (Wdr6), mRNA.
/PROD=WD repeat domain 6 /FL=gb:AB041854.1 gb:NM_031392.1 gb:AF348591.1
ttggacgtccgcagcccagcgccagggctttgctacctgtagctaagggcttcggagctctgcagcgcggctctccagna
cggntggttctcgcgagaacgcagctctcttcttgtccgtggnagccacagcagctccaggaacgtcatggacgctttcg
gggantatgtctggccgcgggcganttccgagctcatactcctcccggttacgggtctggagtgtgttgggganaggctg
ctggcgggcgaggggcctgatttactggtgtacaacttggaccttggtgggcatctccgaatggtgaagagagtccagaa
cctgcttggtcactttctcatccatgggttccgagtgcgaccagagcctaaaggagacctggactctgaggccatgatag
ctgtgtttgggagcaagggcctcaaagttgtgaaagtcagctggggtcaaagccatcttcgggagctctggcgctctggc
ctgtggaacatgtccgactggatctgggatgtccgctggatcgagggtaacgtagccgtggccttgggccacaactcggt
ggtactgtatgacccagtgatagggtgcatgctgcaggacgtcccctgcacagacaggtgtaccctgtcctcagcctgcc
tggttggtgacacctggaaggaactgaccatcgtggctggcgcggtttccaatgagctcctgatctggtacccagccact
gctttaacagacaacaaacccgtggcccctgaccggcgggttagtggccatgtgggtgtcatctttagcatgtcatacct
ggaaagcaagggcctgctggcaactgcttcagaagaccgaagtgttcgtctctggaaggtgggggacctccgggtgcctg
ggggtcgggttcagaatattggccactgctttgggcacagtgcccgagtgtggcaggtgaagctcttagagaactatctc
atcagtgcaggagaagactgtgtctgcttggtgtggagccacgaaggcgagatccttcaagcctttcggggccaccgggg
ccgaggtatccgggccatagccactcacgagaggcaggcctgggtggtcactgggggagacgactcaggcattcgactct
ggcacctggcaggccgagggtacccaggcttgggcgtctcatccctgtccttcaaatctcctagccggccaggtgccctc
aaggctgtgactctggctggttcctggcgagtcctggcagtgactgatgtggggtccctgtacctctacgaccttgaggt
caagtcctgggagcagctgctggaggacaatcgctttcggtcttactgcctgctagaggcagctcctgggcccgagggct
ttggactctgtgccttggccaacggggagggtcttgttaaggtggttcccatcaacacccccaccgctgccgtcgagcag
aaactgttccaggggaaggtgcacagcctgagctgggcccttcgtggttacgaggagctgcttttgttagcatcgggccc
tggtggggtgatagcttgtttggagatctcagctgcacccactggcaaggctatctttgtcaaggaacgttgccggtacc
tccttcccccaagcaagcaacgatggcacacatgtagtgctttcctgcccccgggtgacttcctcgtctgtggggaccgc
cgtggctctgtgatgctattccctgtcagaccatgtctattcanaaaagcctggnggccggaagcaanggctattactgc
agctgaggcacctggagctggtagtgggagcggtgggtctgagagtgtcccaacaggaataggccccgtctctacactcc
attctctgcatgggaaacagggtgtgacctcagttacctgccatggtggctacctatacagcacagggcgggatagctcc
tacttccagctctttgtacatggcggccacctccagccggtcctaaggcagaaagcctgtcgaggcatgaactgggtagc
tgggcttcggatggtgcctgatggaagtatggtcatcttgggtttccatgccaacgagtttgtagtgtggagcccccggt
cccatgagaagctgcatatcgtcaactgtgggggagggcaccgctcctgggccttttctgatactgaggcagccatggcc
tttacctacctgaaggatggtgaggtcatgctctatcgggctctaggtggctgcatccggcccaatgtgattctccggga
gggtctgcatggccgtgaaatcacatgtgtaaagcgtgtgggcacagtgaccctgggccctgaatttgaggtacccaact
tggagcatcctgactccctggagcctggcagtgaggggcctggtctgattgacatagtgataacaggcagtgaggacact
actgtctgtgtcctagcacttcccaccaccacaggctcagcccacgccctcacttctgtctgtaaccatatctcctctgt
gcgagccctggcagtgtgggctgttggcaccccaggtggcccacaggatactcgaccagggctcactgctcaggtagtgt
ctgcagggggccgagcggagattcactgcttcagcgtcatggtcactccggacgctagcacccccagccgccttgcctgt
catgtcatgcacctttcatcccaccggctggatgaagtactgggaccggcagcggaacaagcacaagatgatcaaggtgg
accctgagaccaggtacatgtctcttgctatttgtgagcttgacaacgataggcctggcctcggccctnggcccccttgt
ggctgcagcctgtagtgatggagcagtgaggctctttctcctgcaggactctgggcgaattctgcatctcctagctgagt
ctttccaccacaagcggtgtgtcctcaaagtccactccttcacacatgaggcacctaaccagcgtcggaggctgatcctg
tgcagtgcagctacagatggcagcctagccttctgggatctcaccacggcaatggacaaaggctctactaccctggagct
tccagcacaccctgggcttccctaccagatgggcaccccctccatgaccgtgcaagcccatagctgtggcgtcaatagcc
tgcacactttgcctacacctgagggccaccaccttgtggccagtggcagtgaggatgggtccctgcatgtcttcacactt
gctgtgaagatgccagagccggaagaagctgatggggaggctgagctggtgccccagttatgtgtcctagaggaatattc
cgtcccctgcgcacatgctgcccatgtgacaggcgtcaagatcctaagtcccaagctcatggtctcagcctccatagacc
agcggctgaccttctggcgtctgggacagggtgagcccaccttcatgaatagcactgtgtaccatgtgccagatgtggcc
gacatggactgctggcctgtgagccctgagtttggccaccgctgtgctctggcaggtcagggacttgaggtttacaactg
gtatgactgagttatcccagtgggtggagaactgagcacagggcctgactgcagacagtagagcagggatcagctgtctg
tgtcacgctcagtgtgntctnnggggaggcgaggcagtaccatagttctcgcaagtattccagtagggtgttgcatagga
ggaccaagaacacgcctcactctcacaataggatgaaactgtatttattctgactttaagtgcccaacatctgtgaggtc
ttgtttctttcccagttgatgcttttataaacattcccagttattgggcccttagatgtggctcagcggagggaggccca
gcatagccaagcctgtgtggaacacctca
cncactgccctcaaaagcntgtaggcgagcaaancatctganccaaagagg
tgtggccngaggttcctgaaagaaaagcagccaggcgcatcctcatttcccgtgtgctcagccnttgcccnacatttccc
ngcagaccccccttgctgtatgctcacccctagaatatgtactcggttatagtaggagctgaaatccatgctgagctgca
ccaggaacttgcatacctagagacagacgttgantcgttgagctgttntcttttttcttgtgttacaacccagaataaag
aataatgtgtgaaatgncnnnnnnnnnnnnnnnnn

[Note:  "columnize" is a UNIX SEALS command to re-wrap sequences for a given width.]

The string highlighted in various colors above is the corresponding target string for probe set 1415770_at:

head -6 MOE430A_target | columnize 80
>target:MOE430A:1415770_at; gb|NM_031392; gb:NM_031392.1 /DB_XREF=gi:13878226
/GEN=Wdr6 /FEA=FLmRNA /CNT=712 /TID=Mm.29493.1 /TIER=FL+Stack /STK=570 /UG=Mm.29493
/LL=83669 /DEF=Mus musculus WD repeat domain 6 (Wdr6), mRNA.
/PROD=WD repeat domain 6 /FL=gb:AB041854.1 gb:NM_031392.1 gb:AF348591.1
ggaggcgaggcagtaccatagttctcgcaagtattccagtagggtgttgcataggaggaccaagaacacgcctcactctc
acaataggatgaaactgtatttattctgactttaagtgcccaacatctgtgaggtcttgtttctttcccagttgatgctt
ttataaacattcccagttattgggcccttagatgtggctcagcggagggaggcccagcatagccaagcctgtgtggaaca
cctca

Colors in the consensus sequence above correspond to some of the probes in this file:

Probe sequences are in either of two files:

tail +2201 MOE430A_probe_fasta | head -22
>probe:MOE430A:1415770_at:223:513; Interrogation_Position=3637; Antisense;
GGAGGCGAGGCAGTACCATAGTTCT
>probe:MOE430A:1415770_at:22:57; Interrogation_Position=3644; Antisense;
AGGCAGTACCATAGTTCTCGCAAGT
>probe:MOE430A:1415770_at:255:431; Interrogation_Position=3657; Antisense;
GTTCTCGCAAGTATTCCAGTAGGGT
>probe:MOE430A:1415770_at:178:191; Interrogation_Position=3665; Antisense;
AAGTATTCCAGTAGGGTGTTGCATA
>probe:MOE430A:1415770_at:258:125; Interrogation_Position=3704; Antisense;
ACGCCTCACTCTCACAATAGGATGA
>probe:MOE430A:1415770_at:581:557; Interrogation_Position=3764; Antisense;
TGTGAGGTCTTGTTTCTTTCCCAGT
>probe:MOE430A:1415770_at:157:439; Interrogation_Position=3775; Antisense;
GTTTCTTTCCCAGTTGATGCTTTTA
>probe:MOE430A:1415770_at:417:583; Interrogation_Position=3792; Antisense;
TGCTTTTATAAACATTCCCAGTTAT
>probe:MOE430A:1415770_at:120:139; Interrogation_Position=3803; Antisense;
ACATTCCCAGTTATTGGGCCCTTAG
>probe:MOE430A:1415770_at:222:513; Interrogation_Position=3844; Antisense;
GGAGGCCCAGCATAGCCAAGCCTGT
>probe:MOE430A:1415770_at:600:113; Interrogation_Position=3857; Antisense;
AGCCAAGCCTGTGTGGAACACCTCA
 

Alternately, this same information can be viewed in the Probe table loaded from the MOE430A_probe_tab file:

Vector NTI's ContigExpress can be used to display the probes relative to the target sequence.  The file 1415770_at.fa was constructed to have the target sequence and the probes shown above (with slightly shorter deflines):

cat 1415770_at.fa
>target:1415770_at
ggaggcgaggcagtaccatagttctcgcaagtattccagtagggtgttgcataggaggaccaagaacacgcctcactctc
acaataggatgaaactgtatttattctgactttaagtgcccaacatctgtgaggtcttgtttctttcccagttgatgctt
ttataaacattcccagttattgggcccttagatgtggctcagcggagggaggcccagcatagccaagcctgtgtggaaca
cctca
>probe:3637
GGAGGCGAGGCAGTACCATAGTTCT
>probe:3644
AGGCAGTACCATAGTTCTCGCAAGT
>probe:3657
GTTCTCGCAAGTATTCCAGTAGGGT
>probe:3665
AAGTATTCCAGTAGGGTGTTGCATA
>probe:3704
ACGCCTCACTCTCACAATAGGATGA
>probe:3764
TGTGAGGTCTTGTTTCTTTCCCAGT
>probe:3775
GTTTCTTTCCCAGTTGATGCTTTTA
>probe:3792
TGCTTTTATAAACATTCCCAGTTAT
>probe:3803
ACATTCCCAGTTATTGGGCCCTTAG
>probe:3844
GGAGGCCCAGCATAGCCAAGCCTGT
>probe:3857
AGCCAAGCCTGTGTGGAACACCTCA

 

  1. Start | Programs | Informax 2003 | Vector NTI Suite 9 | ContigExpress
  2. Drag and drop the file 1415770_at.fa onto Vector NTI (I dropped the file on the right of the screen) | OK
  3. With all the fragments selected, then select Assemble | Assemble Selected Fragments from the pull-down menu | OK
  4. Double-click on Contig 1 to view the assembly that shows the Affy target sequence and the corresponding Affy probes:

[See http://www.sci.uidaho.edu/biosci/lecture/Wichman/210/Genetics_Tutorial1.pdf for additional Vector NTI Contig Express tutorial information.]