Scalability Results
Model Description
These scalability tests were run using CitcomS 3.2.0 with default configuration. The mesh for these tests is a regional cap with 129×129×129 nodes. Total velocity unknowns is 129^3 x 3 = 6.4 million. The model is run for 11 time steps. The result reported is the total wall clock time. Each node on this cluster has 2 Xeon 5680 series 3.33GHz hex-core processors with a 12MB unified L3 cache and 24GB RAM, for a total of 12 cores per node. The interconnect is QDR InfiniBand.
Partition | Total Procs | Wall Time (sec) | Speedup | Scalability |
1×1×1 | 1 | 47217 | 1.000 | 1.000 |
1×1×2 | 2 | 25466 | 1.854 | 0.927 |
1×1×4 | 4 | 14645 | 3.224 | 0.806 |
2×2×1 | 4 | 14438 | 3.270 | 0.818 |
2×2×2 | 8 | 8980 | 5.258 | 0.657 |
2×2×4 | 16 | 4432 | 10.654 | 0.666 |
4×4×1 | 16 | 5367 | 8.798 | 0.550 |
4×4×2 | 32 | 2460 | 19.194 | 0.600 |
4×4×4 | 64 | 1346 | 35.079 | 0.548 |
8×8×2 1 | 28 | 583 | 80.990 | 0.633 |
8×8×4 | 256 | 337 | 140.110 | 0.547 |
The input file is available here. It is currently configured for 1×1×1 processors, to do different processor divisions you must change the nprocx, nprocy, and nprocz parameters. You must create a folder named “scratch” in the working directory for the output files. The input file uses the non-Python version of CitcomS, located at CitcomS-3.2.0/bin/CitcomSRegional.