【问题标题】：Algorithm: use union find to count number of islands算法：使用 union find 计算岛屿的数量
【发布时间】：2019-08-12 12:30:27
【问题描述】：

假设您需要计算矩阵上的岛屿数量

                    {1, 1, 0, 0, 0},
                    {0, 1, 0, 0, 1},
                    {1, 0, 0, 1, 1},
                    {0, 0, 0, 0, 0},
                    {1, 0, 1, 0, 1}

当输入矩阵大小适合内存时，我们可以简单地使用 DFS 或 BFS。

但是，如果输入矩阵非常大，无法放入内存，我们该怎么办？

我可以将输入矩阵分块/分割成不同的小文件并分别读取它们。

但是如何合并它们呢？

我对如何合并它们感到困惑。我的想法是，在合并它们时，我们必须阅读一些重叠的部分。但是具体的方法是什么？

试图理解马特的解决方案。

当我在白板上绘制以下示例并逐行处理时。向左合并，然后向上合并，好像不行。

来自马特的解决方案。

不知道topidx、botidx是什么意思

            int topidx = col * 2;
            int botidx = topidx + 1;

【问题讨论】：

如何定义“岛”？
连接的 1 被认为是一个岛。

标签： algorithm disjoint-sets

【解决方案1】：

使用union-find，基本算法（不用担心内存）是：

为每个1 创建一个集合
合并每对相邻1s 的集合。您找到它们的顺序并不重要，因此阅读顺序通常没问题。
计算根集的数量 - 每个岛都有一个。

简单且稍加注意，您可以使用对矩阵的顺序访问和仅 2 行的内存来做到这一点：

将岛屿计数初始化为 0
读取第一行，为每个1创建一个集合，并合并相邻列中的集合。
每增加一行：
1. 读取行，为每个1创建一个集合，并合并相邻列中的集合；
2. 将新行中的集合与前一行中的相邻集合合并。始终将链接指向下方，以免新行中的集合链接到旧行中的父级。
3. 计算上一行中剩余的根集，并将该数字添加到您的岛计数中。这些将永远无法与其他任何东西合并。
4. 丢弃上一行中的所有集合 - 您将永远不再需要它们，因为您已经数过了它们，没有任何链接指向它们。
最后，计算最后一行中的根集并将它们添加到您的岛屿计数中。

当然，这样做的关键是，每当您链接不同行中的集合时，始终将链接指向下方。这不会损害算法的复杂性，如果您使用自己的联合查找，那么很容易完成。如果您使用的是库数据结构，那么您可以只对每一行使用它，并自己跟踪不同行中根集之间的链接。

因为这实际上是我最喜欢的算法之一，所以这里有一个 Java 实现。这不是最易读的实现，因为它涉及一些低级技巧，但是超级高效且简短——我会在性能非常重要的地方写这种东西：

import java.util.Arrays;

public class Islands
{
    private static final String[] matrix=new String[] {
        "  #############  ###   ",
        "  #      #####   ##    ",
        "  #  ##  ##   #   #    ",
        "    ###      ##   #  # ",
        "  #   #########  ## ## ",
        "          ##       ##  ",
        "          ##########   ",
    };

    // find with path compression.
    // If sets[s] < 0 then it is a link to ~sets[s].  Otherwise it is size of set
    static int find(int[] sets, int s)
    {
        int parent = ~sets[s];
        if (parent>=0)
        {
            int root = find(sets, parent);
            if (root != parent)
            {
                sets[s] = ~root;
            }
            return root;
        }
        return s;
    }

    // union-by-size
    // If sets[s] < 0 then it is a link to ~sets[s].  Otherwise it is size of set
    static boolean union(int[] sets, int x, int y)
    {
        x = find(sets,x);
        y = find(sets,y);
        if (x!=y)
        {
            if ((sets[x] < sets[y]))
            {
                sets[y] += sets[x];
                sets[x] = ~y;
            }
            else
            {
                sets[x] += sets[y];
                sets[y] = ~x;
            }
            return true;
        }
        return false;
    }

    // Count islands in matrix

    public static void main(String[] args)
    {
        // two rows of union-find sets.
        // top row is at even indexes, bottom row is at odd indexes.  This arrangemnt is chosen just
        // to make resizing this array easier.
        // For each value x:
        // x==0 => no set. x>0 => root set of size x. x<0 => link to ~x
        int cols=4;
        int[] setrows= new int[cols*2];

        int islandCount = 0;

        for (String s : matrix)
        {
            System.out.println(s);
            //Make sure our rows are big enough
            if (s.length() > cols)
            {
                cols=s.length();
                if (setrows.length < cols*2)
                {
                    int newlen = Math.max(cols,setrows.length)*2;
                    setrows = Arrays.copyOf(setrows, newlen);
                }
            }
            //Create sets for land in bottom row, merging left
            for (int col=0; col<s.length(); ++col)
            {
                if (!Character.isWhitespace(s.charAt(col)))
                {
                    int idx = col*2+1;
                    setrows[idx]=1; //set of size 1
                    if (idx>=2 && setrows[idx-2]!=0)
                    {
                        union(setrows, idx, idx-2);
                    }
                }
            }
            //merge up
            for (int col=0; col<cols; ++col)
            {
                int topidx = col*2;
                int botidx = topidx+1;
                if (setrows[topidx]!=0 && setrows[botidx]!=0)
                {
                    int toproot=find(setrows,topidx);
                    if ((toproot&1)!=0)
                    {
                        //top set is already linked down
                        union(setrows, toproot, botidx);
                    }
                    else
                    {
                        //link top root down.  It does not matter that we aren't counting its size, since
                        //we will shortly throw it aaway
                        setrows[toproot] = ~botidx;
                    }
                }
            }
            //count root sets, discard top row, and move bottom row up while fixing links
            for (int col=0; col<cols; ++col)
            {
                int topidx = col * 2;
                int botidx = topidx + 1;
                if (setrows[topidx]>0)
                {
                    ++islandCount;
                }
                int v = setrows[botidx];
                setrows[topidx] = (v>=0 ? v : v|1); //fix up link if necessary
                setrows[botidx] = 0;
            }
        }

        //count remaining root sets in top row
        for (int col=0; col<cols; ++col)
        {
            if (setrows[col*2]>0)
            {
                ++islandCount;
            }
        }

        System.out.println("\nThere are "+islandCount+" islands there");
    }

}

【讨论】：

嘿@matt 感谢您的回复，向下链接是什么意思？我相信您的想法是一次读取两行。对于网格G[i][j]，必须检查其周围的 i-1、i、i+1、j-1、j、j+1。不清楚你的想法将如何转化为代码实现。
首先，您只需要检查每个网格点的两个方向——例如向上和向左——因为向下和向右方向将被从其他单元格向上和向左检查所覆盖。回覆。向下链接：要将两个集合合并到一个联合查找结构中，您可以在每个集合上调用 find() 以获取它们的根集，然后将一个根集链接到另一个。将链接指向下方意味着如果您需要将不同行中的两个集合链接在一起，请始终将上面的集合指向下面的集合，而不是相反