使用随机图数据,我可以在大约 37 秒内运行 4000 万条边(peaking at 4.4GiB of memory 根据 Massif)。
/tmp$ od -Anone -w4 -t u2 -v /dev/urandom | head -n 40000000 > pairs.txt
/tmp$ time ./test
Reading 40000000 done in 5543ms
Building graph done in 3425ms
Algorithm done in 8957ms
writing file
Writing done in 52ms
real 0m37.339s
user 0m36.078s
sys 0m1.202s
1。内存分配
但是请注意,我通过使用边缘列表的向量对其进行了调整,以便保留所需的容量:
typedef adjacency_list<listS, vecS, undirectedS, no_property, no_property,
no_property, vecS> Graph;
这个
2。顶点id缩放
还有一个重要的注意事项是,存储需求随着顶点数的数量而变化。更具体地说,它们随着 顶点域 进行缩放。例如。加载这样的文件:
1 7
2 7
5 6
4 9
将有大大比
更少的内存需求
1 70000
2 70000
5 60000
4 90000
事实上,重新运行上述基准测试,输入完全相同,但只有第一行改变了
47662 60203
进入
476624766 602036020
结果如下:
Reading 40000000 done in 5485ms
tcmalloc: large alloc 14448869376 bytes == 0x7c0f2000 @ 0x7f30f60aad9d 0x7f30f60caaa9 0x4023ab 0x4019d4 0x7f30f57d7de5 0x401e6a (nil)
Building graph done in 6754ms
tcmalloc: large alloc 2408144896 bytes == 0x49fe46000 @ 0x7f30f60aad9d 0x7f30f60caaa9 0x401ced 0x7f30f57d7de5 0x401e6a (nil)
tcmalloc: large alloc 2408144896 bytes == 0x52ffd0000 @ 0x7f30f60aad9d 0x7f30f60cb339 0x402e45 0x401d5e 0x7f30f57d7de5 0x401e6a (nil)
Algorithm done in 31644ms
writing file
Writing done in 75921ms
real 2m20.318s
user 1m30.224s
sys 0m49.821s
正如您所见,google 的 malloc 实现(来自gperftools)甚至会警告异常大的分配,实际上,它的运行速度要慢得多。 (哦,内存使用量变得如此巨大,以至于 Massif 不再使用了,但我已经看到它在 htop 中达到了 23GiB)。
完整代码
看到它Live On Coliru在 4000 条边上运行:
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/connected_components.hpp>
#include <fstream>
#include <iostream>
#include <chrono>
using Clock = std::chrono::high_resolution_clock;
int main()
{
using namespace boost;
typedef adjacency_list<listS, vecS, undirectedS, no_property, no_property, no_property, vecS> Graph;
Graph G;
// read edges
auto start = Clock::now();
std::ifstream infile("pairs.txt", std::ios::binary);
std::vector<std::pair<int, int> > as_read;
int node1, node2;
while (infile >> node1 >> node2)
as_read.emplace_back(node1, node2);
std::cout << "Reading " << as_read.size() << " done in " << std::chrono::duration_cast<std::chrono::milliseconds>(Clock::now() - start).count() << "ms\n";
start = Clock::now();
// build graph
G.m_edges.reserve(as_read.size());
for(auto& pair : as_read)
add_edge(pair.first,pair.second,G);
std::cout << "Building graph done in " << std::chrono::duration_cast<std::chrono::milliseconds>(Clock::now() - start).count() << "ms\n";
start = Clock::now();
// find connected components
std::vector<int> component(num_vertices(G));
int num = connected_components(G, &component[0]);
std::cout << "Algorithm done in " << std::chrono::duration_cast<std::chrono::milliseconds>(Clock::now() - start).count() << "ms\n";
start = Clock::now();
// write output
std::cout <<"writing file"<<std::endl;
std::ofstream out;
out.open("connected_component.txt");
for (size_t i = 0; i != component.size(); ++i) {
out << i << "\t "<< component[i] << std::endl;
}
out.close();
std::cout << "Writing done in " << std::chrono::duration_cast<std::chrono::milliseconds>(Clock::now() - start).count() << "ms\n";
}