【问题标题】:I need to know how to implement multithreading我需要知道如何实现多线程
【发布时间】:2018-09-15 21:35:46
【问题描述】:

我正在尝试实现多线程功能 perl 脚本以提高速度 我正在尝试实现多线程功能 perl 脚本以提高速度

我需要知道如何为下面的perl代码实现多线程

#!/usr/bin/perl

use if $^O eq "MSWin32", Win32::Console::ANSI;
use Getopt::Long;
use HTTP::Request;
use LWP::UserAgent;
use IO::Select;
use HTTP::Headers;
use IO::Socket;
use HTTP::Response;
use Term::ANSIColor;
use HTTP::Request::Common qw(POST);
use HTTP::Request::Common qw(GET);
use URI::URL;
use IO::Socket::INET;
use Data::Dumper;
use LWP::Simple;
use LWP;
use URI;
use JSON qw( decode_json encode_json );
use threads;

my $ua = LWP::UserAgent->new;
$ua    = LWP::UserAgent->new(keep_alive => 1);
$ua->agent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31");


{
    chomp($site);
    push(@threads, threads->create (\&ask, \&baidu, $site));
    sleep(1) while(scalar threads->list(threads::running) >= 50);
}

eval {
    $_->join foreach @threads;
    @threads = ();
};

########### ASK ###########

sub ask {
    for ( $i = 0; $i < 20; $i += 1) {

        my $url = "https://www.ask.com/web?o=0&l=dir&qo=pagination&q=site%3A*.fb.com+-www.fb.com&qsrc=998&page=$i";

        my $request  = $ua->get($url);
        my $response = $request->content;

        while( $response =~ m/((https?):\/\/([^"\>]*))/g ) {

            my $link = $1;

            my $site = URI->new($link)->host;
            if ( $site =~ /$s/ ) {
                if ( $site !~ /</ ) {   
                    print "ask: $site\n";
                }
            } 
        }
    } 
}


########### Baidu ###########

sub baidu {
    for ( my $ii = 10; $ii <= 760; $ii += 10 ) {

        my $url = "https://www.baidu.com/s?pn=$ii&wd=site:fb.com&oq=site:fb.com";


        my $request  = $ua->get($url);
        my $response = $request->content;

        while ( $response =~ m/(style="text-decoration:none;">([^\/]*))/g ) {
            my $site = $1;
            $site =~ s/style="text-decoration:none;">//g;
            if ( $site =~ /$s/ ) {
                print "baidu: $site\n";
            }
        }
    }
}

如果运行此代码,我只会得到来自Ask.com 的结果。 我该如何解决这个问题并感谢大家?

C:\Users\USER\Desktop>k.pl -d fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com    ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: messenger.fb.com
ask: yourbusinessstory.fb.com
ask: research.fb.com
ask: communities.fb.com
ask: shemeansbusiness.fb.com
ask: nonprofits.fb.com
ask: politics.fb.com
ask: communities.fb.com
ask: live.fb.com
ask: techprep.fb.com
ask: newsroom.fb.com
ask: rightsmanager.fb.com

【问题讨论】:

标签: multithreading perl


【解决方案1】:

好的,所以首先 - 你在这里做的一些看起来真的很恶心的事情,我建议你需要退后一步检查你的代码。由于以下原因,它看起来有点“货物崇拜”:

use HTTP::Request::Common qw(POST);
use HTTP::Request::Common qw(GET);

或者:

my $ua = LWP::UserAgent->new;
$ua    = LWP::UserAgent->new(keep_alive => 1);

...您正在创建一个新的LWP::UserAgent 实例,然后...创建另一个具有不同参数的实例。

您还有很多没有看到的错误,因为您没有包含最重要的use 项目:

use strict;
use warnings qw ( all );

先打开这些,然后修复错误。

但这里例如:

push(@threads, threads->create (\&ask, \&baidu, $site));

你认为这条线应该做什么?因为这里实际发生的是你尝试调用ask sub,然后将代码引用的参数传递给baidu sub,以及一个字符串$site——此时代码中尚未定义。但那是学术性的,因为你从不在你的子程序中阅读它们。

因此,您的代码无法正常工作并不奇怪 - 这是无稽之谈。

但除此之外 - perls 线程模型经常被误解。它不是您可能在其他编程语言中想到的轻量级线程 - 实际上它是相当重量级的。

您每次迭代都会创建和生成一个线程,这也不是很有效。

你真正想做的是使用Thread::Queue

为每个任务生成少量“工作”线程,让它们从队列中读取,然后单独完成它们的工作。

end 队列完成后,让线程退出并被主进程收割。

类似这个答案:Perl daemonize with child daemons

...但是你确定没有一个模块可以做你想要的吗?

【讨论】:

    猜你喜欢
    • 2016-07-14
    • 2015-06-24
    • 2017-01-20
    • 1970-01-01
    • 2012-01-12
    • 2012-07-18
    • 1970-01-01
    • 2011-04-15
    • 1970-01-01
    相关资源
    最近更新 更多