[CIG-LONG] Gale 2.0 in cluster

Walter Landry walter at geodynamics.org
Wed Feb 8 17:04:06 PST 2012


Hi Leonardo,

I am Cc'ing the list in case anyone else has similar problems.

I do not remember doing anything special to the input files.  I am
attaching an input file that I have run on 16 cores on Lonestar at
TACC.  Does Gale write the xml version of the file in the output
directory, or does it segfault before that?  Do you get any error
messages?  Have you tried specifying the complete path to the input
file?

Manually setting shadowDepth should not be necessary.

If none of that works, can you try inserting print statements into

  StGermain/Base/IO/src/IO_Handler.cxx

after lines 373, 376, 377, 380, and 381, and tell me where it dies.

Cheers,
Walter Landry

"Leonardo  Cruz" <leocruz at stanford.edu> wrote:
> Hello Walter,
> 
> We are having some issues running gale 2.0 in the cluster. Dennis is
> getting some segmentation fault errors when running the .json files
> that come with the binaries package. Interestingly enough, he is
> able to run previous xml files with this new gale built.
> Could you provide us with a json file that you have tested in your
> cluster for testing?
> 
> Thanks in advance for your help .
> 
> Leonardo
> 
> ----- Forwarded Message -----
> From: "Leonardo Cruz" <leocruz at stanford.edu>
> To: "Dennis Michael" <dennis at stanford.edu>
> Cc: "George Hilley" <hilley at stanford.edu>, "María Helga Guðmundsdóttir" <mariahg at stanford.edu>
> Sent: Monday, February 6, 2012 9:30:07 AM
> Subject: Re: Gale 2.0 in cluster
> 
> Dennis,
> I will ask Walter Landry to provide one of json files that he has tested in parallel, so we have something to start with.
> 
> Thanks
> Leonardo
> 
> ----- Original Message -----
> From: "Dennis Michael" <dennis at stanford.edu>
> To: "Leonardo Cruz" <leocruz at stanford.edu>
> Cc: "George Hilley" <hilley at stanford.edu>
> Sent: Monday, February 6, 2012 8:59:18 AM
> Subject: Re: Gale 2.0 in cluster
> 
> Leo,
> 
> The info about the parallel variable is interesting since Gale2 is failing at 
> the point when it starts to run MPI.  The master node sets up the job on the 
> various compute nodes, then gets a segmentation fault from Gale2 when the job 
> starts.
> 
> Dennis
> 
> On 2/4/2012 7:41 AM, Leonardo Cruz wrote:
>> Dennis,
>>
>> I have run two more files from the Gale Package (extension.json&  viscous.json) in my mac (binary version) successfully.
>> I tried those files in the cluster unsuccessfully (data/cees/temp1/leocruz1/gale). Then I added a variable (shadowDepth) to extension.json that according to the manual needs to be included when running in parallel
>>
>> ..."shadowDepth: When running in parallel, every parameter only computes quantities over a portion of the
>> grid. To do this, each processor must keep copies of points that belong to other processors. This
>> parameter specifies how wide the region of copied points is. You should never need to change this from
>> 1.....
>>
>> but still got errors (galejob_extension.err).
>>
>>
>> ----- Original Message -----
>> From: "Dennis Michael"<dennis at stanford.edu>
>> To: "Leonardo Cruz"<leocruz at stanford.edu>
>> Cc: "George Hilley"<hilley at stanford.edu>
>> Sent: Friday, February 3, 2012 3:10:42 PM
>> Subject: Re: Gale 2.0 in cluster
>>
>>
>> I've reinstalled everything, and still the *.json packages do not work.  I've
>> tried different compilers and packages.
>>
>> Gale-2 runs fine on the Dragonsback*.xml file, so I don't think the binary is at
>> fault.
>>
>> Can we take a look at the *.json files?  Are they known to be good?
>>
>> the only thing I can do now is to download the latest development versions in
>> the hopes that there's a fix.
>>
>> Dennis
>>
>> On 2/1/2012 6:39 PM, Leonardo Cruz wrote:
>>> Dennis,
>>> I ran the file yielding.json using the script yielding1.sh (data/cees/temp1/leocruz1/gale) and got the error indicated in galejob_yielding.err.
>>>
>>> One thing I noticed is that you are running a xml file and I am using a new json file from tested input files that come with the package.
>>>
>>> Any help is appreciated as usual!
>>>
>>> Thanks
>>> Leonardo
>>>
>>> -------------
>>> Leonardo Cruz
>>> Dept. Geological and Environmental Sciences
>>> 450 Serra Mall
>>> Braun Hall, Building 320
>>> Stanford University
>>> Stanford, CA 94305-2115
>>>
>>> ----- Original Message -----
>>> From: "Leonardo Cruz"<leocruz at stanford.edu>
>>> To: "Dennis Michael"<dennis at stanford.edu>
>>> Cc: "George Hilley"<hilley at stanford.edu>
>>> Sent: Tuesday, January 31, 2012 10:53:04 AM
>>> Subject: Re: Gale 2.0 in cluster
>>>
>>> Thanks Dennis!
>>> I will run my files this afternoon.
>>> Leo
>>>
>>> -------------
>>> Leonardo Cruz
>>> Dept. Geological and Environmental Sciences
>>> 450 Serra Mall
>>> Braun Hall, Building 320
>>> Stanford University
>>> Stanford, CA 94305-2115
>>>
>>> ----- Original Message -----
>>> From: "Dennis Michael"<dennis at stanford.edu>
>>> To: "Leonardo Cruz"<leocruz at stanford.edu>
>>> Cc: "George Hilley"<hilley at stanford.edu>
>>> Sent: Tuesday, January 31, 2012 10:51:07 AM
>>> Subject: Re: Gale 2.0 in cluster
>>>
>>>
>>> I have Gale-2.0.0 running.  It's been running almost 30 minutes so I think I
>>> fixed the segmentation faults.  I'm not sure how long it will run - it will be
>>> killed after 2 hours.
>>>
>>> Output is in /data/cees/dennis/Gale.   The script 'rundb.sh' shows how I ran it.
>>>
>>> The code is in /usr/local/Gale-2_0_0
>>>
>>> Dennis
>>>
>>> On 1/30/2012 9:05 AM, Leonardo Cruz wrote:
>>>> Hi Dennis,
>>>> I hope you have a speedy recovery and thanks for your help!
>>>>
>>>> Leo
>>>>
>>>> -------------
>>>> Leonardo Cruz
>>>> Dept. Geological and Environmental Sciences
>>>> 450 Serra Mall
>>>> Braun Hall, Building 320
>>>> Stanford University
>>>> Stanford, CA 94305-2115
>>>>
>>>> ----- Original Message -----
>>>> From: "Dennis Michael"<dennis at stanford.edu>
>>>> To: "Leonardo Cruz"<leocruz at stanford.edu>
>>>> Cc: "George Hilley"<hilley at stanford.edu>
>>>> Sent: Monday, January 30, 2012 8:00:33 AM
>>>> Subject: Re: Gale 2.0 in cluster
>>>>
>>>>
>>>> P.S. sorry for the late response.  I came down with a bad cold on Friday and I'm
>>>> still struggling.
>>>>
>>>> Dennis
>>>>
>>>> On 1/27/2012 5:59 PM, Leonardo Cruz wrote:
>>>>> Hi Dennis,
>>>>>
>>>>> I installed Gale 2.0 in my local machine and ran one of the cookbook files successfully a few minutes ago. I am using the same file to test it in the cluster but I got some errors.
>>>>> Any help is appreciated as usual. I am attaching the input, job script, output and error files to this email.
>>>>>
>>>>> Thanks
>>>>> Leonardo
>>>>>
>>>>> -------------
>>>>> Leonardo Cruz
>>>>> Dept. Geological and Environmental Sciences
>>>>> 450 Serra Mall
>>>>> Braun Hall, Building 320
>>>>> Stanford University
>>>>> Stanford, CA 94305-2115
>>>>>
> 
> -- 
> Dennis Michael
> Manager, High Productivity Technical Computing
> Stanford Center for Computational Earth and Environmental Science (CEES)
> School of Earth Sciences
> Stanford University
> 397 Panama Mall Mitchell Building room 415
> http://cees.stanford.edu/
> phone # (650) 723 2014
> 
-------------- next part --------------
{
    "EulerDeform":
    {
        "systems": [
            {
                "mesh": "v-mesh",
                "p-mesh": "p-mesh",
                "remesher": "velocityRemesher",
                "velocityField": "VelocityField",
                "wrapTop": "True"
            }
        ]
    },
    "components":
    {
        "buoyancyForceTerm":
        {
            "Type": "BuoyancyForceTerm",
            "ForceVector": "mom_force",
            "Swarm": "gaussSwarm",
            "gravity": "gravity"
        },
        "backgroundShape":
        {
            "Type": "EquationShape",
            "equation": "1"
        },
        "backgroundViscosity":
        {
            "Type": "MaterialViscosity",
            "eta0": "1.0"
        },
        "viscous":
        {
            "Type": "RheologyMaterial",
            "Shape": "backgroundShape",
            "density": "1.0",
            "Rheology": [
                "backgroundViscosity",
                "storeViscosity",
                "storeStress"
            ]
        },
        "surfaceAdaptor":
        {
            "Type": "SurfaceAdaptor",
            "mesh":"v-mesh",
            "sourceGenerator": "v-mesh-generator",
            "topSurfaceType": "topo_data",
            "topSurfaceName": "test.topo",
            "topNx": "32",
            "topNz": "12",
            "topMinX": "minX",
            "topMaxX": "maxX",
            "topMinZ": "minZ",
            "topMaxZ": "maxZ",
            
            "bottomEquation": "x<1 ? -0.1*x : -0.1"
        }
    },
    "velocityBCs" :{
        "type": "CompositeVC",
        "vcList": [
            {
                "type": "WallVC",
                "wall": "front",
                "variables": [
                    {
                        "name": "vz",
                        "value": "0.0"
                    }
                ]
            },
            {
                "type": "WallVC",
                "wall": "back",
                "variables": [
                    {
                        "name": "vz",
                        "value": "0.0"
                    }
                ]
            },
            {
                "type": "WallVC",
                "wall": "left",
                "variables": [
                    {
                        "name": "vx",
                        "value": "0"
                    }
                ]
            },
            {
                "type": "WallVC",
                "wall": "right",
                "variables": [
                    {
                        "name": "vx",
                        "value": "1.0"
                    }
                ]
            },
            {
                "type": "WallVC",
                "wall": "bottom",
                "variables": [
                    {
                        "name": "vy",
                        "value": "0"
                    }
                ]
            }
        ]
    },

    "FieldVariablesToCheckpoint": [
        "StrainRateInvariantField",
        "VelocityField",
        "PressureField"
    ],
    "maxTimeSteps": "10",
    "outputPath": "./output",
    "dim": "3",
    "minX": "0",
    "minY": "0",
    "minZ": "0",
    "maxX": "2",
    "maxY": "0.35",
    "maxZ": "0.3",
    "nx": "16",
    "ny": "4",
    "nz": "4",
    "particlesPerCell": "40",
    "seed": "13",
    "checkpointEvery": "1",
    "gravity": "1.0"
}


More information about the CIG-LONG mailing list