[CIG-SHORT] Fwd: Re: Error message from running on cluster

Matthew Knepley knepley at mcs.anl.gov
Wed Jun 20 14:05:55 PDT 2012


On Wed, Jun 20, 2012 at 2:57 PM, Hongfeng Yang <hyang at whoi.edu> wrote:

> Hi Jonathan,
>
> Then I need to turn on the debugger in the code. How should I set xterm
> display in order to attach gdp?
>

Just to clarify:

  The situation is that we have run successfully on a single node (12
processes), but
at 24 processes we get a SEGV. It appears to be during file access, but the
best way
to debug this error would be to use the debugger.

PETSc has the ability to spawn and attach gdb to make this possible in
parallel. On other
clusters, we have just set the DISPLAY env var to enable the xterm to spawn
on the user
machine. Hopefully this is possible on this cluster.

  Thanks,

     Matt


> Thanks,
>
> Hongfeng
>
> On 6/20/12 2:32 PM, Jonathan Murray wrote:
> > we use nfs to access the filesystem
> > the filesystem is either xfs or ext4
> >
> > how are you submitting your jobs? is there a shell script?
> >
> > thanks
> >
> > On 06/20/2012 04:18 PM, Hongfeng Yang wrote:
> >> Hi Jonathan,
> >>
> >> I am running a software on the cluster scylla. So far I can run the code
> >> on ONE compute node, but could not get it proceed on two or more nodes.
> >>
> >> The code developers suspect that it could be related to the filesystem
> >> on our cluster as the code gets stuck in reading the mesh. Do you know
> >> what kind of filesystems are on the master and compute nodes on scylla?
> >>
> >> Thanks,
> >>
> >> Hongfeng
> >>
> >>
> >> -------- Original Message --------
> >> Subject:     Re: Error message from running on cluster
> >> Date:        Wed, 20 Jun 2012 08:01:18 -0600
> >> From:        Matthew Knepley<knepley at mcs.anl.gov>
> >> To:  Hongfeng Yang<hyang at whoi.edu>
> >> CC:  cig-short at geodynamics.org
> >>
> >>
> >>
> >> On Wed, Jun 20, 2012 at 7:18 AM, Hongfeng Yang<hyang at whoi.edu
> >> <mailto:hyang at whoi.edu>>  wrote:
> >>
> >>      Hi Matt,
> >>
> >>      Last night I did not send the error message of running pylith on
> our
> >>      cluster at WHOI. Sorry for that. Here it is.
> >>
> >>      Could you help figure out what the problem might be?
> >>
> >>
> >> So it looks like there is a problem when trying to read in the mesh,
> >> although it is hard to tell since
> >> we get an SEGV. My guess is that the filesystem is not exactly what you
> >> think it is.I recommend
> >> going through the cluster documentation to understand exactly how the
> >> filesystem is accessed from
> >> nodes other than the head node.
> >>
> >>     Matt
> >>
> >>
> >>      Thanks,
> >>
> >>      Hongfeng
> >>
> >>      --
> >>      Postdoctoral Investigator
> >>      Department of Geology and Geophysics
> >>      Woods Hole Oceanographic Institution
> >>      360 Woods Hole Rd, MS 24
> >>      Woods Hole, MA 02543
> >>
> >>
> >>
> >>
> >> --
> >> What most experimenters take for granted before they begin their
> >> experiments is infinitely more interesting than any results to which
> >> their experiments lead.
> >> -- Norbert Wiener
> >>
> >
>
>
> --
> Postdoctoral Investigator
> Department of Geology and Geophysics
> Woods Hole Oceanographic Institution
> 360 Woods Hole Rd, MS 24
> Woods Hole, MA 02543
>
> _______________________________________________
> CIG-SHORT mailing list
> CIG-SHORT at geodynamics.org
> http://geodynamics.org/cgi-bin/mailman/listinfo/cig-short
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://geodynamics.org/pipermail/cig-short/attachments/20120620/d170e34c/attachment.htm 


More information about the CIG-SHORT mailing list