【问题标题】:AWK to Consolidate Files (Follow-Up)AWK 合并文件(后续)
【发布时间】:2018-12-13 08:07:47
【问题描述】:

跟进this post(根据评论中的要求更新)

我从实际(伪装的)样本数据以及在该数据上运行脚本的结果重新开始。

原始数据样本:

目录中前两个文件的标题和前两行。它们是相同的数据,这不是错误。 (它可能发生在这个数据集中。)

文件 1:

Provider,,,,,,,,,,,,,,
02/01/2018 - 02/28/2018,,,,,,,,,,,,,,
Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
B002H5QQJA,803814064988,803814064988,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
B002H5QQTU,803814064988,803814064988,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
B002H5QR44,803814064988,803814064988,FUBOROFTLW9U,FUBOROFTLW9U,T,Prod_P,,,,1,foo1,bar1,JdLye,BrandX
B002H5QRBC,803814064988,803814064988,KMHRXLF2FRKH,KMHRXLF2FRKH,T,Prod_P,,,,1,foo1,bar1,Biqvo,BrandX
B002H5QSC0,803814064988,803814064988,PCLB5UPGGP9T,PCLB5UPGGP9T,T,Prod_P,,,,1,foo2,bar2,Iwvhe,BrandX
B002H5QU3M,505545471538,505545471538,3K4GDYDEOH1M,3K4GDYDEOH1M,T,Prod_P,,,,1,foo3,bar3,NWsOC,BrandY
B002H5QUAK,417248985349,417248985349,7R40MN9AD9I8,7R40MN9AD9I8,T,Prod_I,1,0,1,0,foo4,bar4,YVQeH,BrandY
B002H5QUBY,417248985349,417248985349,C04GQONG1Z5B,C04GQONG1Z5B,T,Prod_I,1,0,1,0,foo4,bar4,PERMW,BrandY
B002H5QUCI,505545471538,505545471538,4E1ZGIJR1GPR,4E1ZGIJR1GPR,T,Prod_P,,,,1,foo3,bar3,UycEB,BrandY
B002H5QUVO,804699101426,804699101426,51RXKMWGJJ30,51RXKMWGJJ30,T,Prod_P,,,,1,foo5,bar5,Qwyuy,BrandY
B002H5QUZ0,804699101426,804699101426,7L0QBQM8S80L,7L0QBQM8S80L,T,Prod_P,,,,1,foo5,bar5,nqgId,BrandY
B002H5QXF2,803814064988,803814064988,PH0Q5QI34B0R,PH0Q5QI34B0R,T,Prod_P,,,,1,foo6,bar6,hPFiY,BrandX
B002H5QXWK,803814064988,803814064988,PSCLFNIDVZS0,PSCLFNIDVZS0,T,Prod_P,,,,1,foo6,bar6,BCdzF,BrandX

文件 2:

Provider,,,,,,,,,,,,,,,
01/01/2018 - 01/31/2018,,,,,,,,,,,,,,,
Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,,Field12,Field13,Field14,Field15
B002H5N3AA,245462777033,245462777033,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,bar1,foo2,bar1,RkG7D,BrandY
B002H5N3IM,245462777033,245462777033,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,bar1,foo2,bar1,jqiGj,BrandY
B002H5N3R8,245462777033,245462777033,8ZNJHVCVO0A1,8ZNJHVCVO0A1,T,Prod_P,,,,1,bar1,foo2,bar1,Ylrcy,BrandY
B002H5N6X4,766193337142,766193337142,37YX24TRDPNW,37YX24TRDPNW,T,Prod_P,,,,1,bar2,foo3,bar2,WHxLZ,BrandX
B002H5N756,766193337142,766193337142,H56J19KCLFZP,H56J19KCLFZP,T,Prod_P,,,,1,bar2,foo3,bar2,VVw34,BrandX
B002H5N8QO,73612604823,73612604823,HZC9P776G2EP,HZC9P776G2EP,T,Prod_P,,,,1,bar3,foo4,bar3,X48HD,BrandZ
B002H5NA3U,932053704970,932053704970,XFIB2V8RQXN4,XFIB2V8RQXN4,T,Prod_P,,,,1,bar4,foo5,bar4,ghftn,BrandY
B002H5NJ6S,245675038659,245675038659,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_I,11,2,1,0,bar5,foo6,bar5,TVY19,BrandX
B002H5NJ6S,245675038659,245675038659,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_P,,,,2,bar5,foo6,bar5,M2j1i,BrandX
B002H5NJXQ,73612604823,73612604823,RJER36PXDF0T,RJER36PXDF0T,T,Prod_P,,,,1,bar6,foo7,bar6,1UnN3,BrandY
B002H5OU5C,491559514618,491559514618,X9K6BVZEHDDZ,X9K6BVZEHDDZ,T,Prod_P,,,,1,bar7,foo8,bar7,eybpO,BrandX
B002H5OU66,491559514618,491559514618,6510BKD3XD9R,6510BKD3XD9R,T,Prod_P,,,,1,bar7,foo8,bar7,yS9xk,BrandX
B002H5OU6Q,491559514618,491559514618,EFWDVP7FPCFA,EFWDVP7FPCFA,T,Prod_P,,,,1,bar7,foo8,bar7,0IXqS,BrandX

期望的输出:

Filename,Report_Period,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
FILENAME,02/01/2018 - 02/28/2018,B002H5QQJA,8.03814E+11,8.04E+11,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QQTU,8.03814E+11,8.04E+11,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QR44,8.03814E+11,8.04E+11,FUBOROFTLW9U,FUBOROFTLW9U,T,Prod_P,,,,1,foo1,bar1,JdLye,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QRBC,8.03814E+11,8.04E+11,KMHRXLF2FRKH,KMHRXLF2FRKH,T,Prod_P,,,,1,foo1,bar1,Biqvo,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QSC0,8.03814E+11,8.04E+11,PCLB5UPGGP9T,PCLB5UPGGP9T,T,Prod_P,,,,1,foo2,bar2,Iwvhe,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QU3M,5.05545E+11,5.06E+11,3K4GDYDEOH1M,3K4GDYDEOH1M,T,Prod_P,,,,1,foo3,bar3,NWsOC,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUAK,4.17249E+11,4.17E+11,7R40MN9AD9I8,7R40MN9AD9I8,T,Prod_I,1,0,1,0,foo4,bar4,YVQeH,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUBY,4.17249E+11,4.17E+11,C04GQONG1Z5B,C04GQONG1Z5B,T,Prod_I,1,0,1,0,foo4,bar4,PERMW,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUCI,5.05545E+11,5.06E+11,4E1ZGIJR1GPR,4E1ZGIJR1GPR,T,Prod_P,,,,1,foo3,bar3,UycEB,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUVO,8.04699E+11,8.05E+11,51RXKMWGJJ30,51RXKMWGJJ30,T,Prod_P,,,,1,foo5,bar5,Qwyuy,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUZ0,8.04699E+11,8.05E+11,7L0QBQM8S80L,7L0QBQM8S80L,T,Prod_P,,,,1,foo5,bar5,nqgId,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QXF2,8.03814E+11,8.04E+11,PH0Q5QI34B0R,PH0Q5QI34B0R,T,Prod_P,,,,1,foo6,bar6,hPFiY,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QXWK,8.03814E+11,8.04E+11,PSCLFNIDVZS0,PSCLFNIDVZS0,T,Prod_P,,,,1,foo6,bar6,BCdzF,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5N3AA,2.45463E+11,2.45E+11,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,foo2,bar1,RkG7D,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5N3IM,2.45463E+11,2.45E+11,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,foo2,bar1,jqiGj,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5N3R8,2.45463E+11,2.45E+11,8ZNJHVCVO0A1,8ZNJHVCVO0A1,T,Prod_P,,,,1,foo2,bar1,Ylrcy,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5N6X4,7.66193E+11,7.66E+11,37YX24TRDPNW,37YX24TRDPNW,T,Prod_P,,,,1,foo3,bar2,WHxLZ,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5N756,7.66193E+11,7.66E+11,H56J19KCLFZP,H56J19KCLFZP,T,Prod_P,,,,1,foo3,bar2,VVw34,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5N8QO,73612604823,73612604823,HZC9P776G2EP,HZC9P776G2EP,T,Prod_P,,,,1,foo4,bar3,X48HD,BrandZ
FILENAME,02/01/2018 - 02/28/2018,B002H5NA3U,9.32054E+11,9.32E+11,XFIB2V8RQXN4,XFIB2V8RQXN4,T,Prod_P,,,,1,foo5,bar4,ghftn,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5NJ6S,2.45675E+11,2.46E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_I,11,2,1,0,foo6,bar5,TVY19,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5NJ6S,2.45675E+11,2.46E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_P,,,,2,foo6,bar5,M2j1i,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5NJXQ,73612604823,73612604823,RJER36PXDF0T,RJER36PXDF0T,T,Prod_P,,,,1,foo7,bar6,1UnN3,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5OU5C,4.9156E+11,4.92E+11,X9K6BVZEHDDZ,X9K6BVZEHDDZ,T,Prod_P,,,,1,foo8,bar7,eybpO,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5OU66,4.9156E+11,4.92E+11,6510BKD3XD9R,6510BKD3XD9R,T,Prod_P,,,,1,foo8,bar7,yS9xk,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5OU6Q,4.9156E+11,4.92E+11,EFWDVP7FPCFA,EFWDVP7FPCFA,T,Prod_P,,,,1,foo8,bar7,0IXqS,BrandX

我的脚本(改编自我的 O.P. 中接受的答案):

它几乎可以工作。但它包含每个文件的第 1-3 行:

gawk '
function basename(file) {
    sub(".*/", "", file)
    return file
  }
BEGIN { FS=OFS="," }
NR < 3 {
    if ( NR == 2 ) {
        hdr = "Report_Period" OFS
        val = val $1 OFS
    }
    next
}
FNR>3 {
    print "Filename", hdr $0
    next
}
{ print basename(FILENAME), val $0 }
' OFS="," /path/to/input/files/*.csv > ~/path/to/output/file/SampleOutput.csv

实际输出

这是结果文件的全部内容。问题似乎是标题重复:

Sample1.csv,02/01/2018 - 02/28/2018,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
Filename,Report_Period,B002H5QQJA,803814064988,803814064988,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
Filename,Report_Period,B002H5QQTU,803814064988,803814064988,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
Filename,Report_Period,B002H5QR44,803814064988,803814064988,FUBOROFTLW9U,FUBOROFTLW9U,T,Prod_P,,,,1,foo1,bar1,JdLye,BrandX
Filename,Report_Period,B002H5QRBC,803814064988,803814064988,KMHRXLF2FRKH,KMHRXLF2FRKH,T,Prod_P,,,,1,foo1,bar1,Biqvo,BrandX
Filename,Report_Period,B002H5QSC0,803814064988,803814064988,PCLB5UPGGP9T,PCLB5UPGGP9T,T,Prod_P,,,,1,foo2,bar2,Iwvhe,BrandX
Filename,Report_Period,B002H5QU3M,505545471538,505545471538,3K4GDYDEOH1M,3K4GDYDEOH1M,T,Prod_P,,,,1,foo3,bar3,NWsOC,BrandY
Filename,Report_Period,B002H5QUAK,417248985349,417248985349,7R40MN9AD9I8,7R40MN9AD9I8,T,Prod_I,1,0,1,0,foo4,bar4,YVQeH,BrandY
Filename,Report_Period,B002H5QUBY,417248985349,417248985349,C04GQONG1Z5B,C04GQONG1Z5B,T,Prod_I,1,0,1,0,foo4,bar4,PERMW,BrandY
Filename,Report_Period,B002H5QUCI,505545471538,505545471538,4E1ZGIJR1GPR,4E1ZGIJR1GPR,T,Prod_P,,,,1,foo3,bar3,UycEB,BrandY
Filename,Report_Period,B002H5QUVO,804699101426,804699101426,51RXKMWGJJ30,51RXKMWGJJ30,T,Prod_P,,,,1,foo5,bar5,Qwyuy,BrandY
Filename,Report_Period,B002H5QUZ0,804699101426,804699101426,7L0QBQM8S80L,7L0QBQM8S80L,T,Prod_P,,,,1,foo5,bar5,nqgId,BrandY
Filename,Report_Period,B002H5QXF2,803814064988,803814064988,PH0Q5QI34B0R,PH0Q5QI34B0R,T,Prod_P,,,,1,foo6,bar6,hPFiY,BrandX
Filename,Report_Period,B002H5QXWK,803814064988,803814064988,PSCLFNIDVZS0,PSCLFNIDVZS0,T,Prod_P,,,,1,foo6,bar6,BCdzF,BrandX
Sample2.csv,02/01/2018 - 02/28/2018,Provider,,,,,,,,,,,,,,
Sample2.csv,02/01/2018 - 02/28/2018,01/01/2018 - 01/31/2018,,,,,,,,,,,,,,
Sample2.csv,02/01/2018 - 02/28/2018,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
Filename,Report_Period,B002H5N3AA,2.45463E+11,2.45463E+11,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,foo2,bar1,RkG7D,BrandY
Filename,Report_Period,B002H5N3IM,2.45463E+11,2.45463E+11,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,foo2,bar1,jqiGj,BrandY
Filename,Report_Period,B002H5N3R8,2.45463E+11,2.45463E+11,8ZNJHVCVO0A1,8ZNJHVCVO0A1,T,Prod_P,,,,1,foo2,bar1,Ylrcy,BrandY
Filename,Report_Period,B002H5N6X4,7.66193E+11,7.66193E+11,37YX24TRDPNW,37YX24TRDPNW,T,Prod_P,,,,1,foo3,bar2,WHxLZ,BrandX
Filename,Report_Period,B002H5N756,7.66193E+11,7.66193E+11,H56J19KCLFZP,H56J19KCLFZP,T,Prod_P,,,,1,foo3,bar2,VVw34,BrandX
Filename,Report_Period,B002H5N8QO,73612604823,73612604823,HZC9P776G2EP,HZC9P776G2EP,T,Prod_P,,,,1,foo4,bar3,X48HD,BrandZ
Filename,Report_Period,B002H5NA3U,9.32054E+11,9.32054E+11,XFIB2V8RQXN4,XFIB2V8RQXN4,T,Prod_P,,,,1,foo5,bar4,ghftn,BrandY
Filename,Report_Period,B002H5NJ6S,2.45675E+11,2.45675E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_I,11,2,1,0,foo6,bar5,TVY19,BrandX
Filename,Report_Period,B002H5NJ6S,2.45675E+11,2.45675E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_P,,,,2,foo6,bar5,M2j1i,BrandX
Filename,Report_Period,B002H5NJXQ,73612604823,73612604823,RJER36PXDF0T,RJER36PXDF0T,T,Prod_P,,,,1,foo7,bar6,1UnN3,BrandY
Filename,Report_Period,B002H5OU5C,4.9156E+11,4.9156E+11,X9K6BVZEHDDZ,X9K6BVZEHDDZ,T,Prod_P,,,,1,foo8,bar7,eybpO,BrandX
Filename,Report_Period,B002H5OU66,4.9156E+11,4.9156E+11,6510BKD3XD9R,6510BKD3XD9R,T,Prod_P,,,,1,foo8,bar7,yS9xk,BrandX
Filename,Report_Period,B002H5OU6Q,4.9156E+11,4.9156E+11,EFWDVP7FPCFA,EFWDVP7FPCFA,T,Prod_P,,,,1,foo8,bar7,0IXqS,BrandX

谢谢(再次)!

【问题讨论】:

  • 清理你自己可以做的事情(例如,你想要一个名为“Period”的标题,但你要添加一个字符串“Report_Period”,你在 BEGIN 部​​分设置 OFS,然后在脚本之后再次设置,并且您向我们展示了 NR,但必须对描述的症状使用 FNR)。现在看看您为将 hdr 分配行从之前的脚本 (hdr = hdr $1 OFS) 转换为此脚本 (hdr = "Report_Period" OFS) 所做的更改,并考虑在执行前两行时实际执行的操作3行。最后考虑FNR &lt; 3实际上意味着什么it includes lines 1-3
  • 基本上只需确保您发布的脚本是您遇到问题的 脚本并产生您描述的症状。此外,如果您使用多个输入文件运行它,那么 显示 通过包含至少 2 个示例输入文件、相关的预期输出以及对 awk 的相关正确调用(现在您正在传递它是 1 个输入文件)。
  • 再次感谢,埃德。我需要一点时间来完成它。我真的是awk的新手。 (而且我已经工作了很长时间。)但是我会弄清楚你的每一个建议,然后尝试一下,并相应地更新帖子。
  • 确保您发布的所需输出是您希望从您发布的输入文件中获得的确切输出。发布样本输入,然后发布无法从输入中派生的输出是没有用的。
  • 再次感谢,埃德。下次我会从一开始就知道如何构建我的样本数据。非常感谢您的帮助。

标签: awk


【解决方案1】:

这可能是你想要做的:

$ cat tst.awk
BEGIN { FS=OFS="," }
NR==3 {
    print "Filename", "Report_Period", $0
}
FNR==2 {
    fname = FILENAME
    sub(".*/","",fname)
    period = $1
}
FNR>3 {
    print fname, period, $0
}

我自己编写了数据来测试它(file#_row#_col#):

$ cat file1
Provider,,
02/01/2018 - 02/28/2018,,
Field1,Field2,Field3
f1_r1_c1,f1_r1_c2,f1_r1_c3
f1_r2_c1,f1_r2_c2,f1_r2_c3

$ cat file2
Provider,,
01/01/2018 - 01/31/2018,,
Field1,Field2,Field3
f2_r1_c1,f2_r1_c2,f2_r1_c3
f2_r2_c1,f2_r2_c2,f2_r2_c3

$ awk -f tst.awk file1 file2
Filename,Report_Period,Field1,Field2,Field3
file1,02/01/2018 - 02/28/2018,f1_r1_c1,f1_r1_c2,f1_r1_c3
file1,02/01/2018 - 02/28/2018,f1_r2_c1,f1_r2_c2,f1_r2_c3
file2,01/01/2018 - 01/31/2018,f2_r1_c1,f2_r1_c2,f2_r1_c3
file2,01/01/2018 - 01/31/2018,f2_r2_c1,f2_r2_c2,f2_r2_c3

如果这不是您想要的,请编辑您的问题以阐明您的要求并使用最少的示例输入/输出。

【讨论】:

    【解决方案2】:

    尝试两个。忘记包含标题。

    (pi 580) $ cat /tmp/x.sh
    #!/bin/sh
    
    gawk '
      BEGIN {FS=OFS=","}
      FNR == 1 {file=FILENAME; sub(".*/", "", file); next}
      FNR == 2 {period=$1; next}
      NR  == 3 {print "file","period",$0; next}
      FNR == 3 {next}
      {print file,period,$0}
    ' $*
    
    (pi 581) $ /tmp/x.sh /tmp/f?.*
    file,period,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
    f1.txt,02/01/2018 - 02/28/2018,B002H5QQJA,803814064988,803814064988,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
    f1.txt,02/01/2018 - 02/28/2018,B002H5QQTU,803814064988,803814064988,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
    f2.txt,01/01/2018 - 01/31/2018,B002H5N3AA,245462777033,245462777033,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,bar1,foo2,bar1,RkG7D,BrandY
    f2.txt,01/01/2018 - 01/31/2018,B002H5N3IM,245462777033,245462777033,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,bar1,foo2,bar1,jqiGj,BrandY
    

    【讨论】:

    • 顺便说一句,像这样的问题/答案让我想起了“A”、“W”和“K”是多么聪明。
    • 恕我直言,到目前为止,关于 awk 最令人印象深刻的事情是,所有 awk 变体的维护者如何坚持最初的语言概念,即只引入语言结构来完成其他方式难以完成的事情。结果是一种小巧、简单但功能极其强大的语言,没有所有其他文本处理工具所产生的代码膨胀。非常令人印象深刻的恕我直言。
    • 我注意到我的主机上有 perl 5,版本 2x。我想我在 6 年前做出了正确的决定,没有得到那本 perl 书。
    • 我很自豪地说我发现了 period=1 的错误。从字面上看,我从 12 个月前开始,零编程知识,我刚刚完成了一个完整的 R 应用程序,它从大约 7K 源文件中提取原始数据,所有文件都具有不同的结构和分隔符。我学到的大部分内容都在 R 中,但我确实必须用一些用于 ETL 的命令行内容来支持它。再次,我从零知识开始。这是数据库的交互式图表(点击拖动旋转,滚动缩放):mvabl.com/projects/UFO/RB_network.html。谢谢大家!
    • 哦,我熟悉 mike bostock 的作品。我有时会用它来获得灵感。
    猜你喜欢
    • 2018-12-11
    • 1970-01-01
    • 2014-07-02
    • 2016-06-28
    • 2013-03-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多