【发布时间】:2021-11-30 21:35:16
【问题描述】:
运行批处理脚本时
#!/bin/sh
#SBATCH -n 1
#SBATCH --output=serial_test_%j.log
hostname
echo "test" > test.log
serial_test_142.log 文件没有写入,test.log 也没有写入。 /var/log/slurm_JobComp 列表
JobId=142 UserId=ebryer(1000) GroupId=ebryer(1000) Name=hw.sh JobState=FAILED Partition=n TimeLimit=UNLIMITED StartTime=2021-11-30T13:52:14 EndTime=2021-11-30T13:52:14 NodeList=n2 NodeCnt=1 ProcCnt=1 WorkDir=/home/ebryer ReservationName= Gres= Account= QOS= WcKey= Cluster=unknown SubmitTime=2021-11-30T13:52:12 EligibleTime=2021-11-30T13:52:12 DerivedExitCode=0:0 ExitCode=1:0
和 /var/log/slurmctld 列表
[2021-11-30T13:52:12.678] _slurm_rpc_submit_batch_job: JobId=142 InitPrio=4294901740 usec=2069
[2021-11-30T13:52:14.894] sched: Allocate JobId=142 NodeList=n2 #CPUs=1 Partition=n
[2021-11-30T13:52:14.894] prolog_running_decr: Configuration for JobId=142 is complete
[2021-11-30T13:52:14.962] _job_complete: JobId=142 WEXITSTATUS 1
[2021-11-30T13:52:14.962] _job_complete: JobId=142 done
如果我将批处理脚本行更改为 #SBATCH --output=/tmp/serial_test_%j.log,则作业的退出状态为 0,如 /var/log/slurm_JobComp 所示:
[2021-11-30T13:52:35.265] _slurm_rpc_submit_batch_job: JobId=143 InitPrio=4294901739 usec=2067
[2021-11-30T13:52:35.970] sched: Allocate JobId=143 NodeList=n2 #CPUs=1 Partition=n
[2021-11-30T13:52:35.971] prolog_running_decr: Configuration for JobId=143 is complete
[2021-11-30T13:52:36.247] _job_complete: JobId=143 WEXITSTATUS 0
[2021-11-30T13:52:36.247] _job_complete: JobId=143 done
scontrol 显示成功,JobState=COMPLETED Reason=None Dependency=(null),但 /tmp 或 test.log 输出中没有日志文件。有人能说明为什么会这样吗?
【问题讨论】:
标签: slurm