【问题标题】:Running command on master node from qsub submission script从 qsub 提交脚本在主节点上运行命令
【发布时间】:2017-08-17 21:42:40
【问题描述】:

使用 Sun Grid Engine,有没有办法在 qsub 提交脚本中的主节点上运行命令?如果我在qsub 脚本中运行/bin/hostname,我已经在其中一台队列计算机上,而不是主节点。简而言之,我想在我刚刚自动提交的作业上运行qstat。如果我尝试从其中一个工作节点运行qstat,我会收到一条错误消息,告诉我工作节点既不是提交也不是管理主机。

我意识到我可以在 qsub 脚本之外执行此操作,但该脚本定义了许多有用的变量,例如作业名称和 sge 作业 ID。

【问题讨论】:

    标签: cluster-computing qsub sungridengine


    【解决方案1】:

    如果您的目标只是获取有关已提交作业的详细信息,则最好使用提交客户端提供的环境变量,即作业脚本中可用的环境变量。请参阅 qsub 手册页 (man qsub) 的 环境变量 部分:

    ENVIRONMENTAL VARIABLES
         SGE_ROOT       Specifies the location of the Sun Grid Engine
                        standard configuration files.
    
         SGE_CELL       If set, specifies the default Sun Grid Engine
                        cell. To address a Sun Grid Engine cell qsub,
                        qsh, qlogin or qalter use (in  the  order  of
                        precedence):
    
                             The name of the cell  specified  in  the
                             environment  variable SGE_CELL, if it is
                             set.
    
                             The  name  of  the  default  cell,  i.e.
                             default.
    
    
         SGE_DEBUG_LEVEL
                        If  set,  specifies  that  debug  information
                        should  be written to stderr. In addition the
                        level of detail in which debug information is
                        generated is defined.
    
         SGE_QMASTER_PORT
                        If set,  specifies  the  tcp  port  on  which
                        sge_qmaster(8) is expected to listen for com-
                        munication requests.  Most installations will
                        use  a  services  map  entry  for the service
                        "sge_qmaster" instead to define that port.
    
         DISPLAY        For qsh jobs the DISPLAY has to be  specified
                        at job submission.  If the DISPLAY is not set
                        by using the -display or the -v  switch,  the
                        contents  of the DISPLAY environment variable
                        are used as default.
    
         In addition to those environment variables specified  to  be
         exported  to the job via the -v or the -V option (see above)
         qsub, qsh, and qlogin add the following variables  with  the
         indicated values to the variable list:
    
    
         SGE_O_HOME     the home directory of the submitting client.
    
         SGE_O_HOST     the name of the host on which the  submitting
                        client is running.
    
         SGE_O_LOGNAME  the LOGNAME of the submitting client.
    
         SGE_O_MAIL     the MAIL of the submitting  client.  This  is
                        the mail directory of the submitting client.
    
         SGE_O_PATH     the executable search path of the  submitting
                        client.
    
         SGE_O_SHELL    the SHELL of the submitting client.
    
         SGE_O_TZ       the time zone of the submitting client.
    
         SGE_O_WORKDIR  the absolute  path  of  the  current  working
                        directory of the submitting client.
    
         Furthermore, Sun Grid Engine sets additional variables  into
         the job's environment, as listed below.
    
         ARC
    
         SGE_ARCH       The Sun Grid Engine architecture name of  the
                        node on which the job is running. The name is
                        compiled-in into the sge_execd(8) binary.
    
         SGE_CKPT_ENV   Specifies the checkpointing  environment  (as
                        selected with the -ckpt option) under which a
                        checkpointing  job  executes.  Only  set  for
                        checkpointing jobs.
    
         SGE_CKPT_DIR   Only set  for  checkpointing  jobs.  Contains
                        path  ckpt_dir  (see  checkpoint(5)  ) of the
                        checkpoint interface.
    
         SGE_STDERR_PATH
                        the pathname of the file to which  the  stan-
                        dard  error  stream  of  the job is diverted.
                        Commonly used for enhancing the  output  with
                        error  messages from prolog, epilog, parallel
                        environment   start/stop   or   checkpointing
                        scripts.
    
         SGE_STDOUT_PATH
                        the pathname of the file to which  the  stan-
                        dard  output  stream  of the job is diverted.
                        Commonly used for enhancing the  output  with
                        messages   from   prolog,   epilog,  parallel
                        environment   start/stop   or   checkpointing
                        scripts.
    
         SGE_STDIN_PATH the pathname of the file from which the stan-
                        dard  input  stream of the job is taken. This
                        variable might be used  in  combination  with
                        SGE_O_HOST   in   prolog/epilog   scripts  to
                        transfer the input file from  the  submit  to
                        the execution host.
    
         SGE_JOB_SPOOL_DIR
                        The  directory  used  by  sge_shepherd(8)  to
                        store  job related data during job execution.
                        This directory is owned by root or by  a  Sun
                        Grid  Engine  administrative account and com-
                        monly is not open for read or write access to
                        regular users.
    
         SGE_TASK_ID    The index number of  the  current  array  job
                        task (see -t option above). This is an unique
                        number in each array job and can be  used  to
                        reference  different  input data records, for
                        example. This environment variable is set  to
                        "undefined"  for non-array jobs. It is possi-
                        ble to change the predefined  value  of  this
                        variable with -v or -V (see options above).
    
         SGE_TASK_FIRST The index number of the first array job  task
                        (see  -t  option  above).  It  is possible to
                        change the predefined value of this  variable
                        with -v or -V (see options above).
    
         SGE_TASK_LAST  The index number of the last array  job  task
                        (see  -t  option  above).  It  is possible to
                        change the predefined value of this  variable
                        with -v or -V (see options above).
    
         SGE_TASK_STEPSIZE
                        The step size of the array job  specification
                        (see  -t  option  above).  It  is possible to
                        change the predefined value of this  variable
                        with -v or -V (see options above).
    
         ENVIRONMENT    The ENVIRONMENT variable is set to  BATCH  to
                        identify that the job is being executed under
                        Sun Grid Engine control.
    
         HOME           The  user's  home  directory  path  from  the
                        passwd(5) file.
    
         HOSTNAME       The hostname of the node on which the job  is
                        running.
    
         JOB_ID         A   unique   identifier   assigned   by   the
                        sge_qmaster(8)  when  the  job was submitted.
                        The job ID is a decimal integer in the  range
                        1 to 99999.
    
         JOB_NAME       The job name. For batch jobs or jobs  submit-
                        ted  by  qrsh with a command, the job name is
                        built as basename of the qsub script filename
                        resp. the qrsh command.  For interactive jobs
                        it is set  to  `INTERACTIVE'  for  qsh  jobs,
                        `QLOGIN'  for  qlogin  jobs and `QRLOGIN' for
                        qrsh jobs without a command.
    
                        This default may be overwritten  by  the  -N.
                        option.
    
         JOB_SCRIPT     The path to the job script which is executed.
                        The value can not be overwritten by the -v or
                        -V option.
    
         LOGNAME        The user's  login  name  from  the  passwd(5)
                        file.
    
         NHOSTS         The number of hosts in use by a parallel job.
    
         NQUEUES        The number of queues allocated  for  the  job
                        (always 1 for serial jobs).
    
         NSLOTS         The number of queue slots in use by a  paral-
                        lel job.
    
         PATH           A default shell search path of:
                        /usr/local/bin:/usr/ucb:/bin:/usr/bin
    
         SGE_BINARY_PATH
                        The path where the Sun Grid  Engine  binaries
                        are installed. The value is the concatenation
                        of   the    cluster    configuration    value
                        binary_path   and   the   architecture   name
                        $SGE_ARCH environment variable.
    
         PE             The parallel environment under which the  job
                        executes (for parallel jobs only).
    
         PE_HOSTFILE    The path of a file containing the  definition
                        of the virtual parallel machine assigned to a
                        parallel job by  Sun  Grid  Engine.  See  the
                        description  of the $pe_hostfile parameter in
                        sge_pe(5) for details on the format  of  this
                        file. The environment variable is only avail-
                        able for parallel jobs.
    
         QUEUE          The name of the cluster queue  in  which  the
                        job is running.
    
         REQUEST        Available for batch jobs only.
    
                        The request name of a job as  specified  with
                        the  -N  switch  (see  above) or taken as the
                        name of the job script file.
    
         RESTARTED      This variable is set to 1 if a job  was  res-
                        tarted either after a system crash or after a
                        migration in case of a checkpointing job. The
                        variable has the value 0 otherwise.
    
         SHELL          The user's login  shell  from  the  passwd(5)
                        file. Note: This is not necessarily the shell
                        in use for the job.
    
         TMPDIR         The absolute  path  to  the  job's  temporary
                        working directory.
    
         TMP            The same as TMPDIR; provided for  compatibil-
                        ity with NQS.
    
         TZ             The  time   zone   variable   imported   from
                        sge_execd(8) if set.
    
         USER           The user's  login  name  from  the  passwd(5)
                        file.
    
         SGE_JSV_TIMEOUT
                        If the response time of  the  client  JSV  is
                        greater than this timeout value, then the JSV
                        will attempt to be  re-started.  The  default
                        value  is  10 seconds, and this value must be
                        greater than  0.  If  the  timeout  has  been
                        reached,  the  JSV  will only try to re-start
                        once, if the  timeout  is  reached  again  an
                        error will occur.
    

    【讨论】:

      【解决方案2】:

      客户端命令必须可以从作业运行的节点访问。您可以尝试提供qstat 的完整路径,这可能与它驻留在头节点上的位置相匹配。如果没有找到,您必须在计算节点上安装它(或要求管理员这样做)。

      编辑:一些管理员不喜欢这样做,因为“qstat spam”可能会使服务器超载,在足够繁忙的系统上。如果您可以打电话,请谨慎行事,保持礼貌,不要每隔几秒钟就打电话。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-09-24
        • 1970-01-01
        • 1970-01-01
        • 2017-03-10
        • 1970-01-01
        相关资源
        最近更新 更多