You are here: Home / Groups / Earths Interior / Wiki / CitcomS / Scaling
  • Discoverability Visible
  • Join Policy Invite Only
  • Created 30 Dec 2020

CitcomS /

Scaling

Scalability Results

Model Description

These scalability tests were run using CitcomS 3.2.0 with default configuration. The mesh for these tests is a regional cap with 129×129×129 nodes. Total velocity unknowns is 129^3 x 3 = 6.4 million. The model is run for 11 time steps. The result reported is the total wall clock time. Each node on this cluster has 2 Xeon 5680 series 3.33GHz hex-core processors with a 12MB unified L3 cache and 24GB RAM, for a total of 12 cores per node. The interconnect is QDR InfiniBand.

Partition Total Procs Wall Time (sec) Speedup Scalability
1×1×1 1 47217 1.000 1.000
1×1×2 2 25466 1.854 0.927
1×1×4 4 14645 3.224 0.806
2×2×1 4 14438 3.270 0.818
2×2×2 8 8980 5.258 0.657
2×2×4 16 4432 10.654 0.666
4×4×1 16 5367 8.798 0.550
4×4×2 32 2460 19.194 0.600
4×4×4 64 1346 35.079 0.548
8×8×2 1 28 583 80.990 0.633
8×8×4 256 337 140.110 0.547

The input file is available here: input.sample.zip (2 KB, uploaded by Denise Kwong 6 months 2 weeks ago). It is currently configured for 1×1×1 processors, to do different processor divisions you must change the nprocx, nprocy, and nprocz parameters. You must create a folder named “scratch” in the working directory for the output files. The input file uses the non-Python version of CitcomS, located at CitcomS-3.2.0/bin/CitcomSRegional.

Created on , Last modified on