These are some notes I've written up on SDSC Gordon's architecture. There's a bit more to know than what we provide in the official user guide simply because the gory details are confusing to most users and really provide no benefit.

Torus Topology

16 compute nodes (each with 16 cores) are attached together on a single Infiniband switch, and this 16-node unit is the fundamental building block of Gordon. These 16-node units (which correspond to a single "switch") are attached to each other in a 4×4×4 torus, giving 64 switches (4×4×4), each with 16 nodes, resulting in the full 1024 nodes of Gordon.

If a job requests nodes=128:ppn=16, given 16 nodes per switch, it will need 8 (128/16) such switches. The smallest chunk of torus to encapsulate these 8 switches would be 2×2×2 switches. However for a compute node connected to the torus switch at the (x,y,z)=(0,0,0) position to communicate with a compute node on the the (1,1,1) switch, it needs to make three hops: one in x, one in y, one in z.

A relatively common problem occurs when people submit a large, multi-switch job but specify an unattainable value for Catalina_maxhops. For the sake of laying it all out:

We don't allow jobs larger than 128 nodes on Gordon, so this should cover every case. In theory, the maximum hop distance across all of Gordon is six hops: 2 in x, 2 in y, and 2 in z.

Hop Distance Calculator

Here is a little tool that lets you punch in two gordon compute nodes and get the number of torus hops in between them.

Node 1:
Node 2:

For more information on Gordon's precise topology, you may want to play with my Gordon Topology Tool, a little script written in Python that calculates the torus position of all nodes and can calculate the hop distance between every pair in a given list of job nodes. Bear in mind that the numbers presented in this applet and my python script are idealized; Gordon's actual routing for a given job on a given day may differ.

For what it's worth, each hop within Gordon adds between 0.1 and 0.2 microseconds of latency for a zero-sized message at the MPI layer:

Gordon hop latency

Topological Details

This section contains more detail than most people will ever care to see, but it is here for that curious minority.

The following diagram schematically represents the full topology of one of the two torus rails on Gordon. Red lines comprise the actual torus, and each red line is actually three QDR 4X links.

Gordon torus graph

As mentioned above, sixteen compute nodes (black) hang off of each torus node. Each torus node is connected to six other torus nodes: north, south, east, west, up, and down.