【发布时间】:2021-06-14 17:22:28
【问题描述】:
Input$Freq
Freq
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:.
AFR:0.1546,AMR:0.2581,EAS:0.0825,FIN:0.2270,NFE:0.0822,OTH:0.1706,ASJ:0.0729
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:.
AFR:0.1546,AMR:0.2581,EAS:0.0825,FIN:0.2270,NFE:0.0822,OTH:0.1706,ASJ:0.0729
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:.
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:.
这是数据框的一列,其中包含用逗号和冒号分隔的字符串。我想在EAS: 之后提取点或数字。我想要这样的输出
Output
Freq EAS
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:. .
AFR:0.1546,AMR:0.2581,EAS:0.0825,FIN:0.2270,NFE:0.0822,OTH:0.1706,ASJ:0.0729 0.0825
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:. .
AFR:0.1546,AMR:0.2581,EAS:0.0825,FIN:0.2270,NFE:0.0822,OTH:0.1706,ASJ:0.0729 0.0825
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:. .
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:. .
我试过在 tidyr 中提取
maf_snv_intervar <- extract(Input, Freq, into = 'EAS',
"^[^,]+,[^,]+,([^,]+),.*", remove = F, convert = T)
但我得到了这样的输出
Output
Freq EAS
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:. EAS:.
AFR:0.1546,AMR:0.2581,EAS:0.0825,FIN:0.2270,NFE:0.0822,OTH:0.1706,ASJ:0.0729 EAS:0.0825
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:. EAS:.
AFR:0.1546,AMR:0.2581,EAS:0.0825,FIN:0.2270,NFE:0.0822,OTH:0.1706,ASJ:0.0729 EAS:0.0825
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:. EAS:.
AFR:.,AMR:.,EAS:.,FIN:.,NFE:.,OTH:.,ASJ:. EAS:.
我不知道如何修改正则表达式。
【问题讨论】: