efg's Research Notes
Embarrassingly Parallel Computations Using the Sun Grid Engine


The following are practical suggestions on how to submit jobs to a Sun Grid Engine (SGE) to speed up computations using an embarrassingly parallel approach on a Linux cluster.  These notes assume you have a cluster administrator to setup and configure your cluster and the Sun Grid Engine.  Your installation may be configured differently than described here.

Background.  A cluster job is a Linux script (I will only use bash scripts here) that performs a computation with most inputs and outputs being read from and written to files.  A cluster job "runs headless", i.e., without any display monitor to see the results.  You'll need to design your cluster job to run in this environment, which is similar to the normal Linux command-line environment, but there may be some differences.  Examples of cluster jobs will be discussed later.

I usually use a submit.bash script to submit cluster jobs in a repeatable and documented way using the qsub Sun Grid Engine command.  The submit.bash script uses qsub to schedule another script, e.g., job.bash in Fig 1, to execute on one of the cluster nodes as soon as possible.  

Fig. 1.  Script submit.bash calls the SGE qsub command
to submit jobs to cluster nodes for execution

Examples

Example Description
Getting started:  Compare/contrast Linux environment differences among the cluster nodes, the cluster "head node", and a developer's Linux box.  Understanding these differences can be critical to submitting cluster jobs successfully.
Array Job: Submit many cluster jobs with a single SGE qsub command. This simple example shows the array job mechanics.

Helpful Links


E a r l   F.   G l y n n
e f g @ s t o w e r s - i n s t i t u t e . o r g

Updated
 17 Dec 2007