[adf-list]在已运行ADF的节点上运行时,ADF崩溃

瑞典 reuti在员工.uni-marburg.de.
星期二19:01:26 CET 2014

Am 18.03.2014 um 13:25 schrieb Reuti:

> Hi,
> 
> Am 18.03.2014 um 12:07 schrieb SCM Support (Hans van Schoot):
> 
>> Yes, both ADF2013.01d and the snapshot builds of ADF2013 intelMPI
>> version are build with impi 4.1, you can download them here:
>> http://www.tofoba.com/Downloads/Snapshots?branch=fix2013
> 
> Using IntelMPI 4.1 now is fine, but as we have no IntelMPI:

I read this:

http://software.intel.com/sites/default/files/article/327178/intelmpi4.1-releasenotes-linux.pdf

and thought IntelMPI 4.1 is fine now to support SGE directly.

===

I checked it and my findings are:

- IntelMPI still uses an mpd-ring startup by default instead of Hydra.

- You can change to Hydra startup by using the supplied `mpiexec.hydra`.

- This version of Hydra still fails to detect that it's running under SGE and should use its startup mechanism, i.e. it uses the hostlist of SGE (this is fine of course), but still wants to startup by `ssh`: "... --rmk sge --launcher ssh ..." for unknown reason.

===

Nevertheless, you can get it working by changing the line for the SGE case in $ADFBIN/start:

  elif test "$SGE_JOB_SPOOL_DIR" != "" -a -f $SGE_JOB_SPOOL_DIR/pe_hostfile; then
#  This has partially been tested
    mpiexec.hydra -bootstrap sge "$PROG" "[email protected]" -DSCM_EXPORT="$SCM_EXPORT" -DSCM_DEBUG="$SCM_DEBUG" \

i.e. add "-bootstrap sge" after the adjusted! `mpiexec.hydra` call.

Important side note: this version of Hydra fails to detect that the hostname of the master node of the parallel job is one of the machines in the SGE supplied hostfile. Therefore it tries to make a local `qrsh -inherit ...` call. To allow this, the parallel environment in SGE needs to have these lines:

$ qconf -sp adf
pe_name            adf
...
control_slaves     TRUE
job_is_first_task  FALSE
...

===

The original MPICH2 (on which IntelMPI is based on) integrates well with SGE out-of-the-box for some time now. As Hydra is a separate package, I don't know which version of Hydra was used by Intel.

HTH -- Reuti


> The default therein is still to use `mpirun` and start the mpd-ring before the job? The integration into SGE will only work, in case the `mpirun` calls in $ADFBIN/start is/are changed to use `mpiexec.hydra` instead of it (which is the default for MPICH2/3 for quite some time to use Hydra solely).
> 
> I'll look into it and will post my result.
> 
> -- Reuti
> 
> 
>> Hans
>> 
>> On 03/18/2014 09:54 AM, Reuti wrote:
>>> Am 18.03.2014 um 09:16 schrieb SCM Support (Hans van Schoot):
>>> 
>>>> Dear Edrisse,
>>>> 
>>>> Please try the intelMPI version.
>>> Was the version updated to IntelMPI 4.1? The IntelMPI 4.0.3 doesn't play nicely with SGE.
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> The true problem is most likely somewhere deep inside the cluster
>>>> configuration, but switching MPI versions might help you get around it.
>>>> 
>>>> Best regards,
>>>> Hans van Schoot
>>>> 
>>>> 
>>>> On 03/18/2014 04:56 AM, Edrisse Chermak wrote:
>>>>> Hi,
>>>>> 
>>>>> I added in $SGE_ROOT/default/common/sge_request the line :
>>>>> 
>>>>> -v PSM_RANKS_PER_CONTEXT=4
>>>>> 
>>>>> then restarted the sge_qmaster and launched two ADF jobs.
>>>>> The second job still crashes, but only with now only one error line:
>>>>> ======================================================================
>>>>> c1bay3.7ipath_userinit: assign_context command failed: Network is down
>>>>> ======================================================================
>>>>> And I'm using the built-in platform-mpi provided by ADF.
>>>>> Do you think it would be interesting to test other MPI versions ? If
>>>>> so, which one would you propose ?
>>>>> 
>>>>> Thanks & Regards,
>>>>> Edrisse
>>>>> 
>>>>> On 03/17/2014 03:20 PM, Reuti wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> Am 17.03.2014 um 13:10 schrieb Edrisse Chermak:
>>>>>> 
>>>>>>> Thanks for your prompt answer :
>>>>>>> - We have 64 cores per machine
>>>>>>> - We request 16 cores per job
>>>>>>> - Our IB devices are :  QLogic Corp. IBA7322 QDR InfiniBand
>>>>>>> HCA(InfiniPath_QMH7342)
>>>>>>> - Our queuing system is Grid Engine 2011.11
>>>>>> You can try to put in $SGE_ROOT/default/common/sge_request the line:
>>>>>> 
>>>>>> -v PSM_RANKS_PER_CONTEXT=4
>>>>>> 
>>>>>> or test it beforehand on the command line to distribute this
>>>>>> environment variable to the job. Although the default should be to
>>>>>> allow 64 processes, maybe it's not working as intended. Do you use
>>>>>> Platform-MPI or Intel-MPI of ADF?
>>>>>> 
>>>>>> -- Reuti
>>>>>> 
>>>>>> 
>>>>>>> You're probably right, it may be an IB problem.
>>>>>>> But this is strange since we are using single-node restricted
>>>>>>> parallel calculations.
>>>>>>> Would you have any idea on how to confirm the probable IB issue ?
>>>>>>> 
>>>>>>> Best Regards,
>>>>>>> Edrisse
>>>>>>> 
>>>>>>> On 03/16/2014 05:31 PM, Reuti wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Am 16.03.2014 um 06:21 schrieb Edrisse Chermak:
>>>>>>>> 
>>>>>>>>> Dear ADF developers and users,
>>>>>>>>> 
>>>>>>>>> We remarked on our cluster that ADF crashes if it is launched on a
>>>>>>>>> node
>>>>>>>>> where an other ADF job is already running. :
>>>>>>>>> 
>>>>>>>>> ============================================================================
>>>>>>>>> 
>>>>>>>>> c1bay6.1ipath_userinit: assign_context command failed: Network is
>>>>>>>>> down
>>>>>>>>> c1bay6.1can't open /dev/ipath, network down (err=26)
>>>>>>>> Might be an IB problem.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> adf.exe: Rank 0:1: MPI_Init: psm_ep_open() failed
>>>>>>>>> adf.exe: Rank 0:1: MPI_Init: Can't initialize RDMA device
>>>>>>>>> adf.exe: Rank 0:1: MPI_Init: Internal Error: Cannot initialize
>>>>>>>>> RDMA protocol
>>>>>>>>> MPI Application rank 1 exited before MPI_Init() with status 1
>>>>>>>>> mpirun: Broken pipe
>>>>>>>>> ============================================================================
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Is there any way to overcome this, to run more than 1 ADF job per
>>>>>>>>> node ?
>>>>>>>> How many cores per machine you have?
>>>>>>>> How many cores per job you request?
>>>>>>>> What brand/type of IB cards?
>>>>>>>> 
>>>>>>>> And just for interest: what type of queuing system?
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Notes: - I checked that this issue is independent from both (i) the
>>>>>>>>> number available cpus on the node
>>>>>>>> You mean, that already 2 times a job with 2 cores will crash the
>>>>>>>> second job?
>>>>>>>> 
>>>>>>>> -- Reuti
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> and (ii) the queuing system.
>>>>>>>>>     - We are running ADF in parallel on single nodes. The
>>>>>>>>> version is
>>>>>>>>> 2013.01.
>>>>>>>>> 
>>>>>>>>> Thanks in advance & Best Regards,
>>>>>>>>> Edrisse
>>>>>>>>> 
>>>>>>>>> ________________________________
>>>>>>>>> 
>>>>>>>>> This message and its contents including attachments are intended
>>>>>>>>> solely for the original recipient. If you are not the intended
>>>>>>>>> recipient or have received this message in error, please notify me
>>>>>>>>> immediately and delete this message from your computer system. Any
>>>>>>>>> unauthorized use or distribution is prohibited. Please consider
>>>>>>>>> the environment before printing this email.
>>>>>>>>> _______________________________________________
>>>>>>>>> ADFlist mailing list
>>>>>>>>> ADFlist at scm.com
>>>>>>>>> http://lists.tofoba.com/mailman/listinfo/adflist
>>>>>>>> _______________________________________________
>>>>>>>> ADFlist mailing list
>>>>>>>> ADFlist at scm.com
>>>>>>>> http://lists.tofoba.com/mailman/listinfo/adflist
>> 
> 
> _______________________________________________
> ADFlist mailing list
> ADFlist at scm.com
> http://lists.tofoba.com/mailman/listinfo/adflist



有关Adflist邮件列表的更多信息