使用 Beanstalkd 队列解析文件答案

【问题标题】：File parsing with Beanstalkd Queue使用 Beanstalkd 队列解析文件
【发布时间】：2015-02-16 16:35:59
【问题描述】：

我目前正在重写一个文件上传器。当前存在的不同数据类型的解析脚本是 perl 脚本。程序是用php编写的。目前的方式是它只允许单个文件上传，一旦文件在服务器上，它将调用 perl 脚本以获取上传文件的数据类型。我们有超过 20 种数据类型。

到目前为止，我所做的是编写一个允许多个文件上传的新系统。它将首先让您在上传之前验证您的属性，使用 zipjs 压缩它们，上传压缩文件，在服务器上解压缩，为每个文件调用解析器。

我在我需要为每个文件说的部分，将解析器调用放入队列中。我不能一次运行多个解析器。粗略的草图如下。

for each file 
$job = "exec('location/to/file/parser.pl file');";
// using the pheanstalkd library 
$this->pheanstalk->useTube('testtube')->put($job);

根据文件的不同，解析可能需要 2 分钟或 20 分钟。当我将作业放入队列时，我需要确保 file2 的解析器在 file1 的解析器完成后触发。我怎样才能做到这一点？谢谢

【问题讨论】：

标签： php file-upload queue beanstalkd

【解决方案1】：

Beanstalk 没有作业之间的依赖关系的概念。你似乎有两份工作：

作业 A：解析文件 1
作业 B：解析文件 2

如果您需要作业 B 仅在作业 A 之后运行，最直接的方法是作业 A 创建作业 B 作为其最后一个操作。

【讨论】：

【解决方案2】：

如果解析器花费的时间超过一分钟，我已经实现了我想要的请求更多时间。 Worker 是一个 php 脚本，当我为解析器可执行文件执行“exec”命令时，我可以获得进程 ID。我目前在我的工作人员中使用下面的代码 sn-p。

$job = $pheanstalk->watch( $tubeName )->reserve();
// do some more stuff here ... then 
// while the parser is running on the server
while( file_exists( "/proc/$pid" ) )
{
// make sure the job is still reserved on the queue server
    if( $job )  {
        // get the time left on the queue server for the job
        $jobStats = $pheanstalk->statsJob( $job );
        // when there is not enough time, request more
        if( $jobStats['time-left'] < 5 ){
            echo "requested more time for the job at ".$jobStats['time-left']." secs left \n";
            $pheanstalk->touch( $job );
        }
    } 
}

【讨论】：