We have a license for Allinea DDT, a nice and easy-to-use parallel debugger for MPI and OpenMP on our supercomputers at SDSC. Although it's installed and mostly ready to go, there are a few initial configuration parameters you have to specify in order to get it working.

SSH with X11 forwarding

Since DDT is a graphical debugger, you need to SSH into Gordon with X11 forwarding enabled. This is easier on Linux and MacOS X, where you typically have to pop open a terminal and do

$ ssh -X

If you are using Windows or a newer version of OS X which does not ship with X11, you will have to install an X server. XQuartz provides which is the standard option on Mac, and I recommend Xming for Windows.

Once you've gotten into Gordon with X11 forwarding enabled (you can verify by running xeyes), load the ddt module and start it up.

$ module load ddt
$ ddt

You should see an Allinea splash screen for a moment, then be presented with the DDT Configuration Wizard.

Set up job submission

Once greeted with this wizard, create a new configuration profile:

Step 2

For the rest of this guide, I assume you are using the Gordon default compiler (Intel) and MPI stack (mvapich2). Select mvapich 2 from the pulldown list...

Step 3

And confirm that you have to submit jobs through a job scheduler. If you skip this step, you will not be able to use DDT.

Step 4

We provide a bunch of template submit scripts that DDT can use to submit jobs to the batch system on your behalf. The best one to use, assuming you are using Intel and mvapich2, is the appropriately named pbs_intel_mvapich2_native.qtf located in the /opt/ddt/templates path. If you are doing this on Trestles, you will want to use pbs_pgi.qtf instead.

Step 5

On the next screen, you will have to fill in some values. Namely,

Step 6

Then be sure that

Finally, click the "Edit Queue Submission Parameters" button (red arrow) to get to the next screen:

Step 7

This is where you set the default job parameters. Every time you run a debugging job through DDT, you will have the option to tweak these parameters, but it doesn't hurt to select reasonable defaults here. You should have reasonable defaults already set; just enter your six-character account name to which you charge jobs (use the show_accounts command to view this if you aren't sure) and move on.

Step 8

You aren't an administrator, so there's no point in setting up a site-wide configuration. Click Next to finish the setup wizard and be presented with the regular DDT startup screen.

Launching a job through the debugger

Now that the initial configuration is complete, you will see the following screen every time you launch DDT. The most common task is to launch a job through the debugger via the batch system, and the option to that is highlighted by the red arrow below.

Step 9

Once you click that option, you will have to specify all of the fields you'd normally string into your mpirun_rsh command in pieces. For the purposes of this example, let's say I want to debug LAMMPS. If I were running LAMMPS normally, my job script would look something like this:

#PBS -l nodes=1:ppn=16:native
#PBS -l walltime=00:30:00
#PBS -q normal
mpirun_rsh -np 16 -hostfile $PBS_NODEFILE /opt/lammps/bin/lammps -log output.log < /opt/lammps/examples/melt/in.melt

And is the same information that we need to present to DDT. Here is a screenshot of the same job being inputted into DDT:

Step 10

So let's decompose those commands into sections following the same color coding I used in the arrows in the picture above.

#PBS -l nodes=1:ppn=16:native
#PBS -l walltime=00:30:00
#PBS -q normal
#PBS -A use300
mpirun_rsh -np 16 -hostfile $PBS_NODEFILE /opt/lammps/bin/lammps -log output.log < /opt/lammps/examples/melt/in.melt

And to break it down even further (or if you are color blind)

After this is all entered correctly, hold your breath and hit "Submit" at the bottom. If you set everything correctly, you should see your job enter the queue:

Step 11

If your job exited in an error, DDT will immediately tell you.

Once in the queue, you may have to wait for a while for the job to get scheduled and launched, just like a non-debugging job has to wait. Unfortunately we do not have any development or debugging queues like some other supercomputing sites do, so there's no way to speed up the debugging process aside from using fewer nodes and requesting a shorter walltime. Hopefully your job will launch without too much of a delay, and DDT will immediately break and allow you to specify breakpoints, watchpoints, and other debugging parameters before allowing your job to execute.