[adf-list]问题:'MPI应用程序排名13在MPI_Finalize()之前退出状态154'

Alexei Yakovlev. yakovlev在scm.com
星月12月12日14:36:51 CEST 2014

Hi all,

On 09/05/2014 21:21, Reuti wrote:
> Hi,
>
> Am 09.05.2014 um 17:39 schrieb Karina Muñoz:
>
>> I am having a recurrent problem with some (large) opt calculations (large outputs). The output crashes down when the program is printing large information, showing the next warning:
>> (in this case, after the sentence:'This molecular quadrupole moment is calculated with analytic integration') -->
> Are you running these interactively or inside a queuing system? SIGTERM could be a warning from E.g. Torque.
SIGTERM with the stacktrace actually comes from the slave processes that 
are killed by the MPI run-time after rank 13 has crashed. The real 
culprit is behind the "MPI Application rank 13 exited before 
MPI_Finalize() with status 154" message.

Karina, please check the job execution directory for a file named 
KidOutput__13. It may contain some useful messages if it exists. 
Otherwise it will be hard to debug. First thing I'd suggest in this case 
is to try the latest adf2013 bugfix snapshot 
(http://www.tofoba.com/Downloads/Snapshots?branch=fix2013). If it still 
crashes then you can set the SCM_TRACETIMER environment variable before 
the $ADFBIN/adf command in your run script as:

export SCM_TRACETIMER=ADF
$ADFBIN/adf <<eor
...

This will increase the amount of output significantly but the last lines 
in the output should point to the place where ADF is crashing. You 
should send them to support at scm.com.

Kind regards,
Alexei

>
> -- Reuti
>
>
>> MPI Application rank 13 exited before MPI_Finalize() with status 154
>> forrtl: error (78): process killed (SIGTERM)
>> Image              PC                Routine            Line        Source
>> libc.so.6          0000003DEC0CEBB7  Unknown               Unknown  Unknown
>> libpcmpi.so        00002AEEC4CD94C6  Unknown               Unknown  Unknown
>> libpcmpi.so        00002AEEC4CE976A  Unknown               Unknown  Unknown
>> libpcmpi.so        00002AEEC4CC7FBF  Unknown               Unknown  Unknown
>> libpcmpi.so        00002AEEC4D55FB0  Unknown               Unknown  Unknown
>> libpcmpi.so        00002AEEC4D55C1F  Unknown               Unknown  Unknown
>> libmpi.so          00002AEEC4B4C3E9  Unknown               Unknown  Unknown
>> adf.exe            000000000148EC96  Unknown               Unknown  Unknown
>> adf.exe            0000000001067613  Unknown               Unknown  Unknown
>> adf.exe            0000000001042F9D  Unknown               Unknown  Unknown
>> adf.exe            0000000000C77322  Unknown               Unknown  Unknown
>> adf.exe            00000000008A7747  Unknown               Unknown  Unknown
>> adf.exe            000000000088FB90  Unknown               Unknown  Unknown
>> adf.exe            0000000000598DA1  Unknown               Unknown  Unknown
>> adf.exe            0000000000442CA8  Unknown               Unknown  Unknown
>> adf.exe            00000000004127F8  Unknown               Unknown  Unknown
>> adf.exe            000000000041254C  Unknown               Unknown  Unknown
>> ....
>> ....
>> ....(continue...)
>>
>> (Sometimes crashes down when the program is printing another information, not necessarily the Quadrupole Moment, as in this example)
>>
>> Thank you for your help
>>
>> Karina
>>
>> _______________________________________________
>> ADFlist mailing list
>> ADFlist at scm.com
>> http://lists.tofoba.com/mailman/listinfo/adflist
> _______________________________________________
> ADFlist mailing list
> ADFlist at scm.com
> http://lists.tofoba.com/mailman/listinfo/adflist



有关Adflist邮件列表的更多信息