模拟非矩形阵列答案

【问题标题】：Emulating non-rectangular arrays模拟非矩形阵列
【发布时间】：2018-09-16 12:13:38
【问题描述】：

通常您希望数组的性能优于链表，但又不符合矩形数组的要求。

以六边形网格为例，此处以中灰色显示单元格 (3, 3) 的 1 距离邻居和浅灰色的 2 距离邻居。假设我们想要一个数组，其中包含每个单元格的每个 1 距离和 2 距离邻居的索引。一个小问题是单元格具有不同数量的 X 距离邻居 - 网格边界上的单元格比靠近网格中心的单元格具有更少的邻居。

（出于性能原因，我们想要一个相邻索引数组 --- 而不是从单元坐标到相邻索引的函数。）

我们可以通过跟踪每个单元格有多少邻居来解决这个问题。假设你有一个数组 neighbors2 的大小为 R x C x N x 2，其中 R 是网格行数，C 是列数，N 是网格中任何单元格的最大 2 距离邻居数。然后，通过保留一个大小为R x C 的附加数组n_neighbors2，我们可以跟踪neighbors2 中的哪些索引被填充，哪些只是零填充。例如，要检索单元格 (2, 5) 的 2 距离邻居，我们只需像这样对数组进行索引：

someNeigh = neighbors2[2, 5, 0..n_neighbors2[2, 5], ..]

someNeigh 将是一个n_neighbors2[2, 5] x 2 索引数组（或视图），其中someNeigh[0, 0] 产生第一个邻居的行，someNeigh[0, 1] 产生第一个邻居的列，依此类推。注意位置上的元素

neighbors2[2, 5, n_neighbors2[2, 5]+1.., ..]

无关紧要；这个空间只是填充以保持矩阵矩形。

如果我们有一个函数可以找到任何单元格的 d 距离邻居：

import           Data.Bits (shift)
rows, cols = (7, 7)
type Cell = (Int, Int)

generateNeighs :: Int -> Cell -> [Cell]
generateNeighs d cell1 = [ (row2, col2)
                            | row2 <- [0..rows-1]
                            , col2 <- [0..cols-1]
                            , hexDistance cell1 (row2, col2) == d]

hexDistance :: Cell -> Cell -> Int
hexDistance (r1, c1) (r2, c2) = shift (abs rd + abs (rd + cd) + abs cd) (-1)
  where
    rd = r1 - r2
    cd = c1 - c2

我们如何创建上述数组neighbors2 和n_neighbors2？假设我们事先知道最大 2 距离邻居 N。然后可以修改generateNeighs 以始终返回相同大小的列表，因为我们可以用 (0, 0) 填充剩余的条目。在我看来，这留下了两个问题：

我们需要一个函数来填充neighbors2，它不是对每个单独的索引进行操作，而是对一个切片进行操作，在我们的例子中，它应该一次填充一个单元格。
n_neighbors2 应同时填充为 neighbors2

欢迎使用 repa 或 accelerate 数组的解决方案。

【问题讨论】：

抱歉，我没有仔细阅读您的问题，但是仅查看您的图像，数组是完美的矩形，您只是有一些特殊的邻居关系。查看您的图像skewed 30 degrees to the right 更容易看出如何根据所选中心图块周围的 6 条线段定义这种关系。如果您已经知道这一切并在问题中写到（我跳过了），我们深表歉意。
@WillNess 问题不是将网格存储为一个数组，而是为每个单元格存储其邻居的“列表”。这不会是矩形的，因为单元格有不同数量的邻居。
我几乎可以肯定，除非网格很小，否则使用相邻索引数组将比动态计算它们慢。 CPU 缓存是一种非常有限且极其宝贵的资源；使用其中的一些来保存可以在几个廉价指令中计算的值听起来并不好。
我建议您考虑替代 monolithic-array-with-some-dummy-entries 方法。这是一个非常 Matlab 要做的事情。我会考虑将所有单元格的所有邻居排列在一个 1D 未装箱向量中，然后为每个单元格存储该向量的偏移量和邻居的数量。以纯函数方式构建会更好，如果邻居的数量变化很大，它也需要更少的总内存。
@davidcox 在将网格存储在矩形阵列中时会导致空间浪费。请参阅redblobgames.com/grids/hexagons，获取有关六边形网格的优秀资源。

标签： haskell repa accelerate-haskell

【解决方案1】：

这是您向右倾斜 30 度的图片：

如您所见，您的数组实际上是完美的矩形。

社区外围的索引很容易找到，在所选中心单元周围有六个直线段，例如（想象n == 2是图中外围到中心(i,j) == (3,3)的距离）：

periphery n (i,j) = 
   --     2 (3,3)
  let 
    ((i1,j1):ps1) = reverse . take (n+1) . iterate (\(i,j)->(i,j+1)) $ (i-n,j) 
    --                                                                 ( 1, 3)
    ((i2,j2):ps2) = reverse . take (n+1) . iterate (\(i,j)->(i+1,j)) $ (i1,j1) 
    --                                                                 ( 1, 5)
    .....
    ps6           = .......                                          $ (i5,j5)
  in filter isValid (ps6 ++ ... ++ ps2 ++ ps1)

整个街区简直就是

neighborhood n (i,j) = (i,j) : concat [ periphery k (i,j) | k <- [1..n] ]

对于每个单元格/距离组合，只需动态生成邻域索引并在 O(1) 时间内为每个索引对访问您的数组。

【讨论】：

【解决方案2】：

完整地写出来自@WillNess 的答案，并结合来自@leftroundabout 的提议，以将indecies 存储在一维向量中，我们得到了：

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate (Acc, Array, DIM1, DIM2, DIM3, Z(..), (:.)(..), (!), fromList, use)

rows = 7
cols = 7

type Cell = (Int, Int)

(neighs, nNeighs) = generateNeighs

-- Return a vector of indices of cells at distance 'd' or less from the given cell
getNeighs :: Int -> Cell -> Acc (Array DIM1 Cell)
getNeighs d (r,c) = A.take n $ A.drop start neighs
  where
    start = nNeighs ! A.constant (Z :. r :. c :. 0)
    n = nNeighs ! A.constant (Z :. r :. c :. d)

generateNeighs :: (Acc (Array DIM1 Cell), Acc (Array DIM3 Int))
generateNeighs = (neighsArr, nNeighsArr)
  where
    idxs = concat [[(r, c) | c <- [0..cols-1]] | r <- [0..rows-1]]
    (neighsLi, nNeighsLi, n) = foldl inner ([], [], 0) idxs
    neighsArr = use $ fromList (Z :. n) neighsLi
    nNeighsArr = use $ fromList (Z :. rows :. cols :. 5) nNeighsLi
    inner (neighs', nNeighs', n') idx = (neighs' ++ cellNeighs, nNeighs'', n'')
      where
        (cellNeighs, cellNNeighs) = neighborhood idx
        n'' = n' + length cellNeighs
        nNeighs'' = nNeighs' ++ n' : cellNNeighs

neighborhood :: Cell -> ([Cell], [Int])
neighborhood (r,c) = (neighs, nNeighs)
  where
    neighsO = [ periphery d (r,c) | d <- [1..4] ]
    neighs = (r,c) : concat neighsO
    nNeighs = tail $ scanl (+) 1 $ map length neighsO

periphery d (r,c) =
  -- The set of d-distance neighbors form a hexagon shape. Traverse each of
  -- the sides of this hexagon and gather up the cell indices.
  let 
    ps1 = take d . iterate (\(r,c)->(r,c+1))   $ (r-d,c)
    ps2 = take d . iterate (\(r,c)->(r+1,c))   $ (r-d,c+d)
    ps3 = take d . iterate (\(r,c)->(r+1,c-1)) $ (r,c+d)
    ps4 = take d . iterate (\(r,c)->(r,c-1))   $ (r+d,c)
    ps5 = take d . iterate (\(r,c)->(r-1,c))   $ (r+d,c-d)
    ps6 = take d . iterate (\(r,c)->(r-1,c+1)) $ (r,c-d)
  in filter isValid (ps6 ++ ps5 ++ ps4 ++ ps3 ++ ps2 ++ ps1)


isValid :: Cell -> Bool
isValid (r, c)
  | r < 0 || r >= rows = False
  | c < 0 || c >= cols = False
  | otherwise = True

【讨论】：

顺便说一句，我使用reverse 的原因是我不想调用last 来查找下一个起点，即围栏多边形的下一个顶点。但是如果我们明确地计算多边形的六个顶点中的每一个（这并不难；我只是懒惰了），我们不需要所有这些，并且可以消除反向，并且还转换 take...iterate以zip [i,i+1..i+d-1] (repeat j) 的样式进行简单的枚举。 :) 似乎 d==1 的情况也应该是特殊情况（没有枚举，只有顶点）。
@WillNess 我在帖子中加入了您的第一个建议。我还对代码进行了基准测试（建议 #1）与使用zips 和repeats。根据 Criterion（测试代码：github.com/tsoernes/haskelldca/blob/master/test/…），后一种方法的速度大约慢了 50%
你说得对，我后来也想过这个（特例 d==1 没有必要）但没有在线发表评论。所以iterate 编译成比枚举更快的代码，很有趣。啊，确实，zip 不是一个“好的制作人”（即它不会融合），但iterate 是；所以这是有道理的。您是否确保使用 -O2 开关 BTW 进行编译？

【解决方案3】：

这可以通过使用 permute 函数一次填充 1 个单元格的邻居。

import Data.Bits (shift)
import Data.Array.Accelerate as A
import qualified Prelude as P
import Prelude hiding ((++), (==))

rows = 7
cols = 7
channels = 70

type Cell = (Int, Int)

(neighs, nNeighs) = fillNeighs

getNeighs :: Cell -> Acc (Array DIM1 Cell)
getNeighs (r, c) = A.take (nNeighs ! sh1) $ slice neighs sh2
  where
    sh1 = constant (Z :. r :. c)
    sh2 = constant (Z :. r :. c :. All)

fillNeighs :: (Acc (Array DIM3 Cell), Acc (Array DIM2 Int))
fillNeighs = (neighs2, nNeighs2)
  where
    sh = constant (Z :. rows :. cols :. 18) :: Exp DIM3
    neighZeros = fill sh (lift (0 :: Int, 0 :: Int)) :: Acc (Array DIM3 Cell)
    -- nNeighZeros = fill (constant (Z :. rows :. cols)) 0 :: Acc (Array DIM2 Int)
    (neighs2, nNeighs2li) = foldr inner (neighZeros, []) indices
    nNeighs2 = use $ fromList (Z :. rows :. cols) nNeighs2li
    -- Generate indices by varying column fastest. This assures that fromList, which fills
    -- the array in row-major order, gets nNeighs in the correct order.
    indices = foldr (\r acc -> foldr (\c acc2 -> (r, c):acc2 ) acc [0..cols-1]) [] [0..rows-1]
    inner :: Cell
      -> (Acc (Array DIM3 Cell), [Int])
      -> (Acc (Array DIM3 Cell), [Int])
    inner cell (neighs, nNeighs) = (newNeighs, n : nNeighs)
      where
        (newNeighs, n) = fillCell cell neighs


-- Given an cell and a 3D array to contain cell neighbors,
-- fill in the neighbors for the given cell
-- and return the number of neighbors filled in
fillCell :: Cell -> Acc (Array DIM3 Cell) -> (Acc (Array DIM3 Cell), Int)
fillCell (r, c) arr = (permute const arr indcomb neighs2arr, nNeighs)
  where
    (ra, ca) = (lift r, lift c) :: (Exp Int, Exp Int)
    neighs2li = generateNeighs 2 (r, c)
    nNeighs = P.length neighs2li
    neighs2arr = use $ fromList (Z :. nNeighs) neighs2li
    -- Traverse the 3rd dimension of the given cell
    indcomb :: Exp DIM1 -> Exp DIM3
    indcomb nsh = index3 ra ca (unindex1 nsh)


generateNeighs :: Int -> Cell -> [Cell]
generateNeighs d cell1 = [ (row2, col2)
                            | row2 <- [0..rows]
                            , col2 <- [0..cols]
                            , hexDistance cell1 (row2, col2) P.== d]


-- Manhattan distance between two cells in an hexagonal grid with an axial coordinate system
hexDistance :: Cell -> Cell -> Int
hexDistance (r1, c1) (r2, c2) = shift (abs rd + abs (rd + cd) + abs cd) (-1)
  where
    rd = r1 - r2
    cd = c1 - c2

【讨论】：