Dijkstra 算法优化/缓存答案

【问题标题】：Dijkstra algorithm optimization/cachingDijkstra 算法优化/缓存
【发布时间】：2011-06-29 09:49:45
【问题描述】：

我有以下具有 3 个输入变量（开始、停止和时间）的 Dijkstra 算法。大约需要0.5-1s才能完成。我的托管服务提供商说它使用了太多资源，我应该实现一些缓存机制。我的问题是，怎么做？

因为我有 3 个变量，如果其中只有一个发生变化 - 整个结果是不同的（因为我有一些关于时间的附加语句，没关系）。那么如何实现一些缓存机制或者做一些优化呢？

我有 1700 个节点。

<?php require_once("../includes/db_connection.php"); ?>
<?php require("../includes/functions.php"); ?>
<?php require("../includes/global_variables.php"); ?>
<?php
    // Function to put "maxValues" into time (in my case 10 000 because I know it can't take longer than that from source to end node)
    function array_push_key(&$array, $key, $value) {
        $array[$key] = $value;
    }

    // Start the counter
    $timeM = microtime(); $timeM = explode(' ', $timeM); $timeM = $timeM[1] + $timeM[0]; $start = $timeM;

    // Get provided values
    $startStop = $_GET["start"];
    $endStop = $_GET["end"];
    $startTime = $_GET["time"];

    // Initialize arrays
    $time = array();
    $previousNode = array();
    $allStops = array();

    // [5] = 119 --> You can get to stop no. 5 by line no. 119
    // line to source node is 0
    $lineToThisStop = array();
    $lineToThisStop[$startStop] = 0;

    // Populate arrays
    $result=mysql_query("SELECT stop_id FROM db_stops", $connection);
    potvrdi_unos($result);
    $counter = 0;
    while($rows = mysql_fetch_array($result)){
        array_push_key($time, $rows["stop_id"], 10000);
        array_push($allStops, $rows["stop_id"]);
        // Locate startStop in the allStops array to unset it few lines later
        if ($rows["id"] == $startStop) {
            $poz = $brojac;
        }
        $counter++;
    }

    // Set starting time to starting stop
    $time[$startStop] = $startTime;
    // Set it activeNode
    $activeNode = $startStop;

    // Unset it in allStops array (so it doens't have to be checked later)
    unset($allStops[$poz]);
    $allStops = array_values($allStops);

    // I can put "while (true)" because all nodes are connected in "one piece", there is NO UNCONNECTED nodes
    while (true) {       
        $result=mysql_query("SELECT route_id, next_stop FROM db_stop_times WHERE stop_id = $activeNode", $connection);
        potvrdi_unos($result);

        while($rows = mysql_fetch_array($result)) {         
            // Draw paths from active node to all other (connected) stops
            $nextStopArray = $rows["next_stop"];

            // nextStopArray is something like "0,34,123,3123,213" - those are all stops from current, active node/stop
            $nextStopArray = explode(",", $nextStopArray);

            // sometimes it's just "4" to convert it into array
            if (!is_array($nextStopArray)) {
                $nextStopArray[0] = $nextStopArray;
            }

            for ($p = 0; $p < sizeof($nextStopArray); $p++) {
                $nextStop = $nextStopArray[$p];

                $walkToTheStop = false;

                // Few checks                   
                if ($p == 0) {
                    if ($nextStop != 0) {
                        $pathDuration = 2;                          

                        if ($lineToThisStop[$activeNode] != $rows["route_id"]) {
                            $pathDuration = $pathDuration * 2;
                        }
                    }
                } else {
                    $walkToTheStop = true;

                    $pathDuration = 1;                          
                }

                // If that's shortest path from ActiveNode to nextStop insert it into it's time array (time to get to that stop)
                if (($pathDuration + $time[$activeNode]) < $time[$nextStop]) {
                    $time[$nextStop] = $pathDuration + $time[$activeNode];

                    array_push_key($previousNode, $nextStop, $activeNode);

                    // Some aditional records (5000 means "you must walk to that stop")
                    if ($walkToTheStop) {
                        $lineToThisStop[$nextStop] = 5000;
                    } else {
                        $lineToThisStop[$nextStop] = $rows["route_id"];
                    }
                }
            }           
        }

        // Traži slijedeću stanicu (vrh) s najmanjom vrijednosti        
        $lowestValue = 10000 + 1;
        for ($j = 0; $j < sizeof($allStops); $j++) {
            if ($time[$allStops[$j]] < $lowestValue) {
                $lowestValue = $time[$allStops[$j]];                        
                $activeNode = $allStops[$j];

                // Record it's position so I can unset it later
                $activeNodePosition = $j;
            }
        }

        // Unset the active node from the array - so loop before will be shorter every time for one node/stop
        unset($allStops[$activeNodePosition]);
        $allStops = array_values($allStops);

        // If you get to the end stop, feel free to break out of the loop
        if ($activeNode == $endStop) {
            break;
        }
    }

    // How long did it take?
    $timeM = microtime(); $timeM = explode(' ', $timeM); $timeM = $timeM[1] + $timeM[0]; $finish = $timeM;

    $total_time = round(($finish - $start), 4);
    echo 'Total time '.$total_time.' seconds.'."<br />";
?>

<?php require_once("../includes/close_connection.php"); ?>

【问题讨论】：

标签： php caching optimization dijkstra

【解决方案1】：

微优化，但不做：

for ($p = 0; $p < sizeof($nextStopArray); $p++) { 
   ...
}

计算 sizeof($nextStopArray) 在循环之前，否则你在每次迭代都进行计数（并且这个值不会被改变）

$nextStopArraySize = sizeof($nextStopArray);
for ($p = 0; $p < $nextStopArraySize; ++$p) { 
   ...
}

这有几个地方应该改变。

如果你要迭代几千次，++$p 比 $p++ 快

但是分析函数...找出执行时间最长的部分，并寻求优化这些部分。

编辑

摆脱 array_push_key 作为函数，只需内联执行它...否则会花费您不必要的函数调用

在 while(true) 循环之外从数据库中构建所有节点的数组...在单个 SQL 查询中检索所有数据并构建查找数组。

更换

for ($p = 0; $p < sizeof($nextStopArray); $p++) {

与

$nextStopArraySize = sizeof($nextStopArray);
$p = -1
while (++$p < $nextStopArraySize) { 
   ...
}

也可能证明更快（只需检查逻辑是否循环通过正确的次数）。

【讨论】：

我听说过这个 ++p 和 p++ 但我不相信。现在我做 :D 现在大约。通过这三个优化，速度提高了 2 倍。你能提点别的吗？
您听到的一些优化是不正确的，而另一些则是。将真相与谣言区分开来的唯一方法是亲自测试它们。这些年来，我测试了很多方法，其中一些确实值得。
一开始我只做了一次查询...结果？经常低于 0.01 秒 :)（没有 memcached）
希望您的托管服务提供商对此感到满意。您将使用更多内存，但只是一小部分时间。

【解决方案2】：

一目了然（顺便说一句，您确实应该进行一些分析），罪魁祸首是您正在为每个图节点执行查询以查找其邻居：

$result=mysql_query("SELECT route_id, next_stop FROM db_stop_times WHERE stop_id = $activeNode", $connection);

如果您有 1,700 个节点，这将发出大约一千个查询。与其经常访问数据库，不如将这些数据库结果缓存在 memcached 之类的内容中，并且仅在缓存未命中时回退到数据库。

【讨论】：

【解决方案3】：

它使用了太多资源

什么资源？（CPU？内存？网络带宽？数据库服务器上的 I/O 负载？）

while (true) {       
    $result=mysql_query("SELECT route_id, next_stop FROM db_stop_times WHERE stop_id = $activeNode", $connection);

如果我没看错，那么您在每次寻路尝试中都会对每个节点进行数据库调用。这些调用中的每一个都会阻塞一段时间，等待数据库的响应。即使您有一个快速的数据库，也必然需要几毫秒（除非数据库与您的代码在同一台服务器上运行）。所以我大胆猜测您的大部分执行时间都花在等待数据库的回复上。

此外，如果您的数据库缺少适当的索引，每个查询都可以进行全表扫描...

解决方案很简单：在应用程序启动时将 db_stop_times 加载到内存中，并在解析邻居节点时使用该内存中的表示。

编辑：是的，stop_id 上的索引将是此查询的正确索引。至于实际的缓存，我不知道 PHP，但是对于 Java（或 C#，或 C++，甚至 C）之类的东西，我会使用表单的表示形式

class Node {
    Link[] links;
}

class Link {
    int time;
    Node destination;
}

这会比 memcached 快一点，但假设您可以轻松地将整个表放入主内存中。如果你不能这样做，我会使用像 memcached 这样的缓存系统。

【讨论】：