Understanding CAMDA '06 Data

1 Dec 2005

"... this year's dataset provides a level of complexity in the dataset previously unseen within CAMDA."

Understanding the CAMDA '06 data relationships is needed before the real "fun" can begin in analyzing these datasets, and possibly contributing to understanding of the Chronic Fatigue Syndrome being studied.  The purpose of these pages is to

  • shed some light on the diverse datasets
  • describe how the data can be accessed and manipulated
  • reduce the drudgery of everyone discovering and re-discovering the same problems in the data, and perhaps
  • show how datasets can be connected to each other and to other sources of useful information. 

I'm hoping this might be a collaborative effort and others may be willing to share their insights about the CAMDA data.  Please E-mail me any information about this data that you're willing to share, or any corrections that you find in my descriptions here.


6 Dec 2005

To better understand the information on the CAMDA '06 web page, I have included information from E-mails exchanged with Suzanne Vernon, Centers for Disease Control and Prevention.  Such quotations are tagged with her initials, "sdv".

Note on the Publications page the articles described there are background information and do NOT describe any of the Chronic Fatigue Syndrome data on the CAMDA site, except for some of the clinical health assessment questions.  The CAMDA '06 publications provide background information about Chronic Fatigue Syndrome, "but there are none (yet) that have come out with the microarray, genetic, proteomic data" [sdv, 4 Dec 2005]

Each data table has the subjects identified with and ABTID.  There were 227 people/subjects eligible for the study.  Of these 227 people, only 177 samples from these people gave satisfactory microarray results.  For the SNP data, there are several SNP results for 10 genes on 225 people.  Finally, the proteomic data is only a subset of 60 of the 227 people (and again, linked or traced by ABTID).   [sdv, 4 Dec 2005]


11 Jan 2006

See the Publications, Clinical Data and Gene Expression pages for information about a recently published paper that sheds light on patients that were excluded from the study.



Download CAMDA datasets

I downloaded all of the CAMDA06 data to a directory using wget:

wget -t 50 -o wget.log -r ftp://ftp.camda.duke.edu/CAMDA06_DATASETS

This transfer took13 hours, 16 minutes (on 4 Oct 2005).

In our environment these data can be analyzed with Windows tools in a U:\camda\2006 directory, which can also be accessed as /n/projects/camda/2006 for analysis with Linux tools. 

The files in directory U:\camda\2006\ftp.camda.duke.edu were made Read/Only so local changes to these original files should not be possible.  This directory is intended to be an exact mirror of the CAMDA '06 raw data.

ZIP files were uncompressed into directories with the same name as the ZIPs.

Note that a 29-page Word document, wichita_clinical_irb_protocol.doc, is included in this directory.  This was the proposal for "Clinical Assessment of Subjects with Chronic Fatigue Syndrome and Other Fatiguing Illnesses in Wichita" and provides some background information (p. 7), Design and Methods (p.10), and other information about the planned study:

Note the "Self-Administered Questionnaires" consist of two parts:

  • Multidimensional Fatigue Inventory (20-items designed to measure fatigue, ...)
  • Medical Outcomes Study (36-tiem short form health survey)

A description of blood work for neuroendocrine and immune measures in on p. 12-14.

A section called "Gene Expression Studies" is on p. 16, but nowhere are the words "microarray" used.  "Genotyping Studies" are described on p. 16.

