【问题标题】:MPI Gathering arrays of stringsMPI 收集字符串数组
【发布时间】:2020-08-09 19:50:19
【问题描述】:

我正在尝试将字典集合合并到根进程中。这是一个简短的例子:

#define MAX_CF_LENGTH 55

    map<string, int> dict;

    if (rank == 0)
    {
        dict = {
            {"Accelerator Defective", 33},
            {"Aggressive Driving/Road Rage", 27},
            {"Alcohol Involvement", 19},
            {"Animals Action", 30}};
    }
    if (rank == 1)
    {
        dict = {
            {"Driver Inexperience", 6},
            {"Driverless/Runaway Vehicle", 46},
            {"Drugs (Illegal)", 38},
            {"Failure to Keep Right", 24}};
    }
    if (rank == 2)
    {
        dict = {
            {"Lost Consciousness", 1},
            {"Obstruction/Debris", 8},
            {"Other Electronic Device", 25},
            {"Other Lighting Defects", 43},
            {"Other Vehicular", 7}};
    }

    Scatterer scatterer(rank, MPI_COMM_WORLD, num_workers);
    scatterer.gatherDictionary(dict, MAX_CF_LENGTH);

gatherDictionary() 内部的想法是将每个键放在每个进程的char 数组中(允许重复)。之后,将所有键收集到根中并在广播之前创建最终(合并)字典。代码如下:

void Scatterer::gatherDictionary(map<string,int> &dict, int maxKeyLength)
{
    // Calculate destination dictionary size
    int numKeys = dict.size();
    int totalLength = numKeys * maxKeyLength;
    int finalNumKeys = 0;
    MPI_Reduce(&numKeys, &finalNumKeys, 1, MPI_INT, MPI_SUM, 0, comm);

    // Computing number of elements that are received from each process
    int *recvcounts = NULL;
    if (rank == 0)
        recvcounts = new int[num_workers];

    MPI_Gather(&totalLength, 1, MPI_INT, recvcounts, 1, MPI_INT, 0, comm);

    // Computing displacement relative to recvbuf at which to place the incoming data from each process
    int *displs = NULL;
    if (rank == 0)
    {
        displs = new int[num_workers];

        displs[0] = 0;
        for (int i = 1; i < num_workers; i++)
            displs[i] = displs[i - 1] + recvcounts[i - 1] + 1;
    }

    char(*dictKeys)[maxKeyLength];
    char(*finalDictKeys)[maxKeyLength];
    dictKeys = (char(*)[maxKeyLength])malloc(numKeys * sizeof(*dictKeys));
    if (rank == 0)
        finalDictKeys = (char(*)[maxKeyLength])malloc(finalNumKeys * sizeof(*finalDictKeys));

    // Collect keys for each process
    int i = 0;
    for (auto pair : dict)
    {
        strncpy(dictKeys[i], pair.first.c_str(), maxKeyLength);
        i++;
    }

    MPI_Gatherv(dictKeys, totalLength, MPI_CHAR, finalDictKeys, recvcounts, displs, MPI_CHAR, 0, comm);

    // Create new dictionary and distribute it to all processes
    dict.clear();
    if (rank == 0)
    {
        for (int i = 0; i < finalNumKeys; i++)
            dict[finalDictKeys[i]] = dict.size();
    }

    delete[] dictKeys;
    if (rank == 0)
    {
        delete[] finalDictKeys;
        delete[] recvcounts;
        delete[] displs;
    }

    broadcastDictionary(dict, maxKeyLength);
}

我确信broadcastDicitonary() 的正确性,因为我已经对其进行了测试。调试收集功能,我得到以下部分结果:

Recvcounts:
220
220
275

Displacements:
0
221
442

FinalDictKeys:
Rank:0 Accelerator Defective
Rank:0 Aggressive Driving/Road Rage
Rank:0 Alcohol Involvement
Rank:0 Animals Action
Rank:0 
Rank:0 
Rank:0 
Rank:0 
Rank:0 
Rank:0 
Rank:0 
Rank:0 
Rank:0 

由于只收集根数据,我想知道这是否与字符分配有关,即使它应该是连续的。我不认为这与最后缺少空字符有关,因为每个字符串/键已经有很多填充。 提前感谢您指出任何缺失或改进,如果您需要任何额外信息,请发表评论。

如果您想自己测试它,我已将所有代码放在一个文件中,它已准备好编译和运行(当然这适用于 3 个 mpi 进程)。 Code Here

【问题讨论】:

  • 它在第一行中说明,但为了清楚起见,我已将其添加到参数列表中。感谢您指出。
  • displs[i] = displs[i - 1] + recvcounts[i - 1] + 1; 为什么最后是+1
  • 感谢您的建议。代码有很多依赖关系,所以我只把必需品放在一起粘贴。添加链接。
  • @DanielLangr 因为计数例如是 220,所以我认为下一个位移应该从 221 等开始。没有考虑到我实际上是从 0 开始的事实......这相当尴尬,因为+1 改变了一切。非常感谢。
  • 是的,确实如此。如果它不是性能关键的收集广播组合,那么你可能没问题。如果代码应该运行在具有数万或数十万个 MPI 进程的大型超级计算机上,那么我会根据条件选择运行MPI_GathervMPI_Allgatherv

标签: c++ string mpi


【解决方案1】:
displs[i] = displs[i - 1] + recvcounts[i - 1] + 1;

最后的+ 1 是多余的。将其更改为:

displs[i] = displs[i - 1] + recvcounts[i - 1];

【讨论】:

    猜你喜欢
    • 2018-05-20
    • 2014-08-27
    • 2013-11-13
    • 2012-05-16
    • 1970-01-01
    • 2015-03-08
    • 2011-12-13
    • 2014-01-04
    • 1970-01-01
    相关资源
    最近更新 更多