【问题标题】:MySQL Selecting million records to generate urlsMySQL 选择百万条记录生成 url
【发布时间】:2013-10-01 04:03:08
【问题描述】:

我目前从不同的表中获取 200 万条记录来生成一个 url 来创建站点地图。该脚本占用了太多资源并使用了 100% 的服务器性能

查询

 SELECT CONCAT("/url/profile/id/",u.id,"/",nickname) as url FROM users AS u
    UNION ALL
    Select CONCAT("url/city/", c.id, "/paramId/",p.id,"/",Replace(p.title, " ", "+"),"/",r.region_Name,"/",c.city_Name) AS url
    From city c 
    Join region r On r.id = c.id_region 
    Join country country On country.id = c.id_country
    cross join param p
    Where country.used = 1
    And p.active = 1 

//我将它存储在一个数组 $url_list 中,然后处理创建站点地图..但这需要时间和大量资源

//我尝试使用LIMIT 0,50000批量获取数据 但是获取 maxrow 进行分页需要时间。代码看起来也不太好,因为我必须运行一个包含大量数据的两个查询

$url_list = array();


$maxrow = SELECT COUNT(*) AS max from (
 SELECT CONCAT("/url/profile/id/",u.id,"/",nickname) as url FROM users AS u
        UNION ALL
        Select CONCAT("url/city/", c.id, "/paramId/",p.id,"/",Replace(p.title, " ", "+"),"/",r.region_Name,"/",c.city_Name) AS url
        From city c 
        Join region r On r.id = c.id_region 
        Join country country On country.id = c.id_country
        cross join param p
        Where country.used = 1
        And p.active = 1) as tmp

$limit = 50,000;
$bybatch = ceil($maxrow/$limit);
$start = 0;
for($i = 0;$i < $bybatch; $i++){
   // run query and store to $result
       (SELECT CONCAT("/url/profile/id/",u.id,"/",nickname) as url FROM users AS u
        UNION ALL
        Select CONCAT("url/city/", c.id, "/paramId/",p.id,"/",Replace(p.title, " ", "+"),"/",r.region_Name,"/",c.city_Name) AS url
        From city c 
        Join region r On r.id = c.id_region 
        Join country country On country.id = c.id_country
        cross join param p
        Where country.used = 1
        And p.active = 1 LIMIT $start,$limit); 

     $start += $limit;
     //push to $url_list
     $url_list = array_push($result);
}

//完成后我用它来创建站点地图

$linkCount = 1;
        $fileNomb = 1;
        $i = 0;
foreach ($url_list as $ul) { 

            $i += 1; 
            if ($linkCount == 1) {
                $doc  = new DOMDocument('1.0', 'utf-8');
                $doc->formatOutput = true;
                $root = $doc->createElementNS('http://www.sitemaps.org/schemas/sitemap/0.9', 'urlset');
                $doc->appendChild($root);
            }


            $url= $doc->createElement("url");
            $loc= $doc->createElement("loc", $ul['url']); 
            $url->appendChild($loc);
            $priority= $doc->createElement("priority",1); 
            $url->appendChild($priority);


            $root->appendChild($url);

            $linkCount += 1;

            if ($linkCount == 49999) { 
                $f = fopen($this->siteMapMulti . $fileNomb .'.xml', "w");
                fwrite($f,$doc->saveXML());
                fclose($f);

                $linkCount = 1;
                $fileNomb += 1;
            }

        }

有更好的方法吗?还是加快性能?

已添加

为什么这比 sql 查询快,却消耗了 100% 的服务器资源和性能

$this->db->query('SELECT c.id, c.city_name, r.region_name, cr.country_name FROM city AS c, region AS r, country AS cr  WHERE r.id = c.id_region AND cr.id = c.id_country AND cr.id IN (SELECT id FROM country WHERE use = 1)');

$arrayCity = $this->db->recordsArray(MYSQL_ASSOC);

 $this->db->query('SELECT id, title FROM param WHERE active = 1');

$arrayParam = $this->db->recordsArray(MYSQL_ASSOC);

foreach ($arrayCity as $city) {
        foreach ($arrayParam as $param) {
          $paramTitle = str_replace(' ', '+', $param['title']);
          $url = 'url/city/'. $city['id'] .'/paramId/'. $param['id'] .'/'. $paramTitle .'/'. $city['region_name'] .'/'. $city['city_name'];
          $this->addChild($url);
        }
}

【问题讨论】:

  • 好吧,如果你被 mysql & php 卡住了,最好的方法是简单地将任务排队并将限制从 50000 减少到 10000。然后创建一个计划作业来处理该任务。
  • 你的表有索引吗?
  • 是的。 .它仍然很慢,也尝试将它放在临时表中,但创建临时表仍然很慢
  • 请注意,您循环遍历大量数据两次:一次由 ORM $this-&gt;db-&gt;recordsArray(MYSQL_ASSOC); 完成,然后由您完成。在这种情况下,你最好不要使用 ORM,就像我之前说的那样使用 while($row = mysql_fetch_assoc($result)){ concatenate write to file row-by-row instead collecting data in memory} 这样的结构

标签: php mysql


【解决方案1】:

我建议您不要使用UNION,只需发出两个单独的查询。它将加速查询本身。 此外,正如您在上面提到的,分批获取数据是个好主意。

最后,不要收集内存中的所有数据。立即将其写入循环中的文件。

只需在开头打开文件,在循环中写入每个 URL 条目并在结尾关闭文件。

——打开文件进行写入

——count查询用户表

——在循环中使用LIMIT 进行多次选择(就像你已经完成的那样)

——就在循环中 while ($row = mysql_fetch_array()) 将每一行写入文件

然后对另一个表重复这种算法。 实现一个将数据写入文件的函数会很有用,因此您可以调用该函数并遵守DRY 原则。

【讨论】:

  • 处理这么多数据对于 DB 来说并不是一件简单的事情。您可以尝试在 PHP 中获取字段并将它们连接起来。可能会有帮助。
猜你喜欢
  • 2016-03-06
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-02-28
  • 2018-11-19
  • 1970-01-01
  • 2010-11-27
相关资源
最近更新 更多