[cig-commits] [commit] devel: Vicious bug fix. Multi GPU per nodes implied multiple synchronizations of the same device when checking error after the call to deviceCount. (1d2434c)

cig_noreply at geodynamics.org cig_noreply at geodynamics.org
Wed Feb 12 07:12:18 PST 2014


Repository : ssh://geoshell/specfem3d

On branch  : devel
Link       : https://github.com/geodynamics/specfem3d/compare/cc878e6a5c1692b8aaeaca1803d4685e56b20e41...1d2434c01aa85bb8e6d5f2e1c4897e5a23651615

>---------------------------------------------------------------

commit 1d2434c01aa85bb8e6d5f2e1c4897e5a23651615
Author: Matthieu Lefebvre <ml15 at princeton.edu>
Date:   Wed Feb 12 09:51:18 2014 -0500

    Vicious bug fix. Multi GPU per nodes implied multiple synchronizations of the same device when checking error after the call to deviceCount.


>---------------------------------------------------------------

1d2434c01aa85bb8e6d5f2e1c4897e5a23651615
 src/cuda/initialize_cuda.cu | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/cuda/initialize_cuda.cu b/src/cuda/initialize_cuda.cu
index ef53f7a..79a7d2d 100644
--- a/src/cuda/initialize_cuda.cu
+++ b/src/cuda/initialize_cuda.cu
@@ -93,9 +93,12 @@ void FC_FUNC_(initialize_cuda_device,
   // Gets number of GPU devices
   device_count = 0;
   cudaGetDeviceCount(&device_count);
-
-  // checks if command failed
-  exit_on_cuda_error("CUDA runtime error: cudaGetDeviceCount failed\ncheck if driver and runtime libraries work together\nexiting...\n");
+  // Do not check if command failed: 
+  // `exit_on_cuda_error` call cudaDevice/ThreadSynchronize. If multiple 
+  // MPI tasks access multiple GPUs per node, they will try to synchronize
+  // GPU 0 and depending on the order of the calls error will be raised
+  // when setting the device number. If MPS is enabled, some GPUs will silently
+  // not be used.
 
   // returns device count to fortran
   if (device_count == 0) exit_on_error("CUDA runtime error: there is no device supporting CUDA\n");



More information about the CIG-COMMITS mailing list