计算shell脚本中矩阵中沿行和列但不是对角线连接的非零的数量答案

【问题标题】：Count the number of connected non-zeros along rows and columns but not diagonaly in a Matrix in shell script计算shell脚本中矩阵中沿行和列但不是对角线连接的非零的数量
【发布时间】：2020-03-24 16:57:38
【问题描述】：

我有一个矩阵。例如 10 x 15 矩阵

$ cat input.txt
2  3  4  5  10 0  2  2  0  1  0  0  0  1  0
0  3  4  6  2  0  2  0  0  0  0  1  2  3  40
0  0  0  2  3  0  3  0  3  1  2  3  1  0  0
1  2  0  4  0  3  4  0  4  1  2  0  0  1  1
0  0  0  0  0  0  10 3  4  12 4  5  12 3  10
26 3  4  5  10 0  2  2  4  1  0  0  0  1  0
0  30 4  6  2  0  2  0  0  0  0  1  2  3  40
0  0  0  2  3  0  0  0  0  1  0  3  1  0  0
1  2  10 4  0  0  4  0  4  1  2  0  0  1  1
0  0  0  0  0  0  0  3  0  0  4  5  0  3  10

我正在寻找所有连接的非零值及其最大值。请看这张图。

所以欲望输出是：

outfile.txt
12 10    (12 is the yellow shaded connected non-zeros with maximum value 10)
42 40    (42 is the light-red shaded connected non-zeros with maximum value 40)
 1  1    (The light purple isolated non-zero)
 2  2    (The light green connected non-zeros)
15 30    and so on
 6  5
 1  4
 4 10 
 1  3

很难开发出一个合适的算法来用 fortran 或 shell 脚本进一步编写它。我正在考虑以下算法，但无法想到下一步。

step 1: #Assign the entries with a[ij], i-row, j-column
        #Now make different non-zero connected cell arrays (e.g. c1[k],c2[k],c3[k],....etc) 
        for i in {1..10};do
          for j in {1..15};do
            if [ a(i,j) != 0 ];then c1[k]=a(i,j); a(i,j)="*" #(assign a(i,j) to c1[k] and 
                                                             #replace its original value to "*" 
                                                             #because it should not be considered further)
Step 2: #Now check the left-right-up-down elements of a(i,j), if non-zero
                if [ a(i,j-1) !=0 ] && [ a(i,j) !=* ]; then c1[k]=a(i,j-1); a(i,j-1)="*"
                if [ a(i,j+1) !=0 ] && [ a(i,j) !=* ]; then c1[k]=a(i,j+1); a(i,j+1)="*"
                if [ a(i-1,j) !=0 ] && [ a(i,j) !=* ]; then c1[k]=a(i-1,j); a(i-1,j)="*"
                if [ a(i+1,j) !=0 ] && [ a(i,j) !=* ]; then c1[k]=a(i+1,j); a(i+1,j)="*"

Step 3: #continue the same process for each non-zeros until a zero at all-end.

Step 4: #Count the number of elements in c1[k] and find it maximum

【问题讨论】：

与 perl、python、C++ 等相比，Shell 不是一个好的语言。
谢谢@Shawn。 Perl、Python、C++ 也可以。
快速搜索发现this article；其中的代码质量很差，但算法可能会给你一个工作的起点。
如果需要更大的数据集，可以一次性完成，无需递归。

标签： python shell perl awk fortran

【解决方案1】：

这是 GNU awk 中的一个：

awk '
function check(x,y,s,   v) {                        # recursively check neighbours
    v=m[x][y]                                       # store val
    delete m[x][y]                                  # del to avoid loops
    if(((x+1) in m) && (y in m[x+1]) && m[x+1][y])  # test if neighbour exists
        check(x+1,y,s)                              # and check it
    if((x in m) && ((y+1) in m[x]) && m[x][y+1])
        check(x,y+1,s)
    if(((x-1) in m) && (y in m[x-1]) && m[x-1][y])
        check(x-1,y,s)
    if((x in m) && ((y-1) in m[x]) && m[x][y-1])
        check(x,y-1,s)
    if(v>max[s])
        max[s]=v                                    # keep max
    set[s]++                                        # count set size
}
{
    for(x=1;x<=NF;x++)                              # hash values
        m[x][NR]=$x
}
END {
    for(x in m)                                     # in no particular order
        for(y in m[x])                              
            if(m[x][y])
                check(x,y,++s)                      # start checking
    for(i in set)                                   # output
        print set[i],max[i]
}' file

输出：

【讨论】：

@RavinderSingh13 谢谢。刚醒来，所以可能有一些有趣的事情，.. :D
你摇滚，至少在给定的例子中对我来说看起来不错，我没有这个版本的 awk，所以会在几个小时左右后发布它，欢呼:)
真的很棒@James Brown。这非常好。自上周以来，我一直在努力，但成功率为零。非常感谢