【发布时间】:2022-01-09 02:20:32
【问题描述】:
假设我有 200 多个文件,每个文件的结构如下所示:
# Peptide length 11
# Rank Threshold for Strong binding peptides 0.500
# Rank Threshold for Weak binding peptides 2.000
-----------------------------------------------------------------------------------
pos HLA peptide Core Offset I_pos I_len D_pos D_len iCore Identity 1-log50k(aff) Affinity(nM) %Rank BindLevel
-----------------------------------------------------------------------------------
0 HLA-B4402 GSHDLGIILQK GSHDLGIIL 0 0 0 0 0 GSHDLGIIL NM_000094_3_COL 0.015 42580.79 90.00
1 HLA-B4402 SHDLGIILQKI SLGIILQKI 0 0 0 1 2 SHDLGIILQKI NM_000094_3_COL 0.024 38731.55 65.00
2 HLA-B4402 HDLGIILQKIR HDLIILQKI 0 0 0 3 1 HDLGIILQKI NM_000094_3_COL 0.024 38400.24 65.00
3 HLA-B4402 DLGIILQKIRD DLGIILQKI 0 0 0 0 0 DLGIILQKI NM_000094_3_COL 0.011 44267.78 95.00
4 HLA-B4402 LGIILQKIRDM LGIILQRDM 0 0 0 6 2 LGIILQKIRDM NM_000094_3_COL 0.024 38411.46 65.00
5 HLA-B4402 GIILQKIRDMP GIILQIRDM 0 0 0 5 1 GIILQKIRDM NM_000094_3_COL 0.017 41463.75 80.00
6 HLA-B4402 IILQKIRDMPY IILQKIRDY 0 0 0 8 2 IILQKIRDMPY NM_000094_3_COL 0.025 38152.18 65.00
7 HLA-B4402 ILQKIRDMPYM ILQKIRMPY 0 0 0 6 1 ILQKIRDMPY NM_000094_3_COL 0.025 37993.98 60.00
8 HLA-B4402 LQKIRDMPYMD QKIRDMPYM 1 0 0 0 0 QKIRDMPYM NM_000094_3_COL 0.015 42595.54 90.00
9 HLA-B4402 QKIRDMPYMDP QKIRDMPYM 0 0 0 0 0 QKIRDMPYM NM_000094_3_COL 0.017 41645.82 85.00
10 HLA-B4402 KIRDMPYMDPS KDMPYMDPS 0 0 0 1 2 KIRDMPYMDPS NM_000094_3_COL 0.023 39039.53 70.00
11 HLA-B4402 IRDMPYMDPSX RDMPYMPSX 1 0 0 6 1 RDMPYMDPSX NM_000094_3_COL 0.036 33871.57 41.00
-----------------------------------------------------------------------------------
Protein NM_000094_3_COL. Allele HLA-B4402. Number of high binders 0. Number of weak binders 0. Number of peptides 12
-----------------------------------------------------------------------------------
# Rank Threshold for Strong binding peptides 0.500
# Rank Threshold for Weak binding peptides 2.000
-----------------------------------------------------------------------------------
pos HLA peptide Core Offset I_pos I_len D_pos D_len iCore Identity 1-log50k(aff) Affinity(nM) %Rank BindLevel
-----------------------------------------------------------------------------------
0 HLA-B4402 PVTGYKVQYTS TGYKVQYTS 2 0 0 0 0 TGYKVQYTS NM_000094_3_COL 0.011 44190.25 95.00
1 HLA-B4402 VTGYKVQYTSL VTGYQYTSL 0 0 0 4 2 VTGYKVQYTSL NM_000094_3_COL 0.020 40061.36 75.00
2 HLA-B4402 TGYKVQYTSLT TGYKVYTSL 0 0 0 5 1 TGYKVQYTSL NM_000094_3_COL 0.020 40487.08 75.00
3 HLA-B4402 GYKVQYTSLTG YVQYTSLTG 1 0 0 1 1 YKVQYTSLTG NM_000094_3_COL 0.017 41521.20 80.00
4 HLA-B4402 YKVQYTSLTGL YQYTSLTGL 0 0 0 1 2 YKVQYTSLTGL NM_000094_3_COL 0.031 35710.76 49.00
5 HLA-B4402 KVQYTSLTGLG KVQYTSLTL 0 0 0 8 1 KVQYTSLTGL NM_000094_3_COL 0.029 36392.20 55.00
6 HLA-B4402 VQYTSLTGLGQ VQYTSLTGL 0 0 0 0 0 VQYTSLTGL NM_000094_3_COL 0.016 42180.50 85.00
7 HLA-B4402 QYTSLTGLGQP QYTSLTGLG 0 0 0 0 0 QYTSLTGLG NM_000094_3_COL 0.011 44293.17 95.00
8 HLA-B4402 YTSLTGLGQPL YTSLLGQPL 0 0 0 4 2 YTSLTGLGQPL NM_000094_3_COL 0.034 34547.04 44.00
9 HLA-B4402 TSLTGLGQPLP SLTGLGQPL 1 0 0 0 0 SLTGLGQPL NM_000094_3_COL 0.024 38475.10 65.00
10 HLA-B4402 SLTGLGQPLPS SLTGLGQPL 0 0 0 0 0 SLTGLGQPL NM_000094_3_COL 0.026 37575.76 60.00
11 HLA-B4402 LTGLGQPLPSX LLGQPLPSX 0 0 0 1 2 LTGLGQPLPSX NM_000094_3_COL 0.014 42874.84 90.00
-----------------------------------------------------------------------------------
Protein NM_000094_3_COL. Allele HLA-B4402. Number of high binders 0. Number of weak binders 0. Number of peptides 12
-----------------------------------------------------------------------------------
# Rank Threshold for Strong binding peptides 0.500
# Rank Threshold for Weak binding peptides 2.000
-----------------------------------------------------------------------------------
pos HLA peptide Core Offset I_pos I_len D_pos D_len iCore Identity 1-log50k(aff) Affinity(nM) %Rank BindLevel
-----------------------------------------------------------------------------------
0 HLA-B4402 FLRLLDLAQEE RLLDLAQEE 2 0 0 0 0 RLLDLAQEE NM_000106_5_CYP 0.014 42841.45 90.00
1 HLA-B4402 LRLLDLAQEEL RLLDLAQEL 1 0 0 7 1 RLLDLAQEEL NM_000106_5_CYP 0.029 36648.25 55.00
2 HLA-B4402 RLLDLAQEELK RLLDLAQEL 0 0 0 7 1 RLLDLAQEEL NM_000106_5_CYP 0.029 36350.87 55.00
3 HLA-B4402 LLDLAQEELKE LLDLAQEEL 0 0 0 0 0 LLDLAQEEL NM_000106_5_CYP 0.013 43487.79 95.00
4 HLA-B4402 LDLAQEELKEE LDQEELKEE 0 0 0 2 2 LDLAQEELKEE NM_000106_5_CYP 0.008 45629.40 99.00
5 HLA-B4402 DLAQEELKEES AQEELKEES 2 0 0 0 0 AQEELKEES NM_000106_5_CYP 0.009 45287.57 99.00
6 HLA-B4402 LAQEELKEESG AEELKEESG 1 0 0 1 1 AQEELKEESG NM_000106_5_CYP 0.013 43568.32 95.00
7 HLA-B4402 AQEELKEESGF AELKEESGF 0 0 0 1 2 AQEELKEESGF NM_000106_5_CYP 0.231 4113.65 2.50
8 HLA-B4402 QEELKEESGFL QELKEESGF 0 0 0 1 1 QEELKEESGF NM_000106_5_CYP 0.123 13202.71 6.00
9 HLA-B4402 EELKEESGFLR EELKEESGF 0 0 0 0 0 EELKEESGF NM_000106_5_CYP 0.076 21904.46 13.00
10 HLA-B4402 ELKEESGFLRE ELKEESGFL 0 0 0 0 0 ELKEESGFL NM_000106_5_CYP 0.030 36301.74 55.00
11 HLA-B4402 LKEESGFLREX KEESFLREX 1 0 0 4 1 KEESGFLREX NM_000106_5_CYP 0.060 26205.35 19.00
-----------------------------------------------------------------------------------
可以看出,每个文件基本上都是表格的组合(具有相同的标题),表格之间有文本。我想只保留表格 - 如果可能的话,去掉那些虚线,只保留每行用 \t 分隔的数据(和标题)。
最佳结果是这样的:
pos HLA peptide Core Offset I_pos I_len D_pos D_len iCore Identity 1-log50k(aff) Affinity(nM) %Rank BindLevel
0 HLA-B4402 GSHDLGIILQK GSHDLGIIL 0 0 0 0 0 GSHDLGIIL NM_000094_3_COL 0.015 42580.79 90.00
1 HLA-B4402 SHDLGIILQKI SLGIILQKI 0 0 0 1 2 SHDLGIILQKI NM_000094_3_COL 0.024 38731.55 65.00
2 HLA-B4402 HDLGIILQKIR HDLIILQKI 0 0 0 3 1 HDLGIILQKI NM_000094_3_COL 0.024 38400.24 65.00
3 HLA-B4402 DLGIILQKIRD DLGIILQKI 0 0 0 0 0 DLGIILQKI NM_000094_3_COL 0.011 44267.78 95.00
4 HLA-B4402 LGIILQKIRDM LGIILQRDM 0 0 0 6 2 LGIILQKIRDM NM_000094_3_COL 0.024 38411.46 65.00
5 HLA-B4402 GIILQKIRDMP GIILQIRDM 0 0 0 5 1 GIILQKIRDM NM_000094_3_COL 0.017 41463.75 80.00
6 HLA-B4402 IILQKIRDMPY IILQKIRDY 0 0 0 8 2 IILQKIRDMPY NM_000094_3_COL 0.025 38152.18 65.00
7 HLA-B4402 ILQKIRDMPYM ILQKIRMPY 0 0 0 6 1 ILQKIRDMPY NM_000094_3_COL 0.025 37993.98 60.00
8 HLA-B4402 LQKIRDMPYMD QKIRDMPYM 1 0 0 0 0 QKIRDMPYM NM_000094_3_COL 0.015 42595.54 90.00
9 HLA-B4402 QKIRDMPYMDP QKIRDMPYM 0 0 0 0 0 QKIRDMPYM NM_000094_3_COL 0.017 41645.82 85.00
10 HLA-B4402 KIRDMPYMDPS KDMPYMDPS 0 0 0 1 2 KIRDMPYMDPS NM_000094_3_COL 0.023 39039.53 70.00
11 HLA-B4402 IRDMPYMDPSX RDMPYMPSX 1 0 0 6 1 RDMPYMDPSX NM_000094_3_COL 0.036 33871.57 41.00
0 HLA-B4402 PVTGYKVQYTS TGYKVQYTS 2 0 0 0 0 TGYKVQYTS NM_000094_3_COL 0.011 44190.25 95.00
1 HLA-B4402 VTGYKVQYTSL VTGYQYTSL 0 0 0 4 2 VTGYKVQYTSL NM_000094_3_COL 0.020 40061.36 75.00
2 HLA-B4402 TGYKVQYTSLT TGYKVYTSL 0 0 0 5 1 TGYKVQYTSL NM_000094_3_COL 0.020 40487.08 75.00
3 HLA-B4402 GYKVQYTSLTG YVQYTSLTG 1 0 0 1 1 YKVQYTSLTG NM_000094_3_COL 0.017 41521.20 80.00
4 HLA-B4402 YKVQYTSLTGL YQYTSLTGL 0 0 0 1 2 YKVQYTSLTGL NM_000094_3_COL 0.031 35710.76 49.00
5 HLA-B4402 KVQYTSLTGLG KVQYTSLTL 0 0 0 8 1 KVQYTSLTGL NM_000094_3_COL 0.029 36392.20 55.00
6 HLA-B4402 VQYTSLTGLGQ VQYTSLTGL 0 0 0 0 0 VQYTSLTGL NM_000094_3_COL 0.016 42180.50 85.00
7 HLA-B4402 QYTSLTGLGQP QYTSLTGLG 0 0 0 0 0 QYTSLTGLG NM_000094_3_COL 0.011 44293.17 95.00
8 HLA-B4402 YTSLTGLGQPL YTSLLGQPL 0 0 0 4 2 YTSLTGLGQPL NM_000094_3_COL 0.034 34547.04 44.00
9 HLA-B4402 TSLTGLGQPLP SLTGLGQPL 1 0 0 0 0 SLTGLGQPL NM_000094_3_COL 0.024 38475.10 65.00
10 HLA-B4402 SLTGLGQPLPS SLTGLGQPL 0 0 0 0 0 SLTGLGQPL NM_000094_3_COL 0.026 37575.76 60.00
11 HLA-B4402 LTGLGQPLPSX LLGQPLPSX 0 0 0 1 2 LTGLGQPLPSX NM_000094_3_COL 0.014 42874.84 90.00
0 HLA-B4402 FLRLLDLAQEE RLLDLAQEE 2 0 0 0 0 RLLDLAQEE NM_000106_5_CYP 0.014 42841.45 90.00
1 HLA-B4402 LRLLDLAQEEL RLLDLAQEL 1 0 0 7 1 RLLDLAQEEL NM_000106_5_CYP 0.029 36648.25 55.00
2 HLA-B4402 RLLDLAQEELK RLLDLAQEL 0 0 0 7 1 RLLDLAQEEL NM_000106_5_CYP 0.029 36350.87 55.00
3 HLA-B4402 LLDLAQEELKE LLDLAQEEL 0 0 0 0 0 LLDLAQEEL NM_000106_5_CYP 0.013 43487.79 95.00
4 HLA-B4402 LDLAQEELKEE LDQEELKEE 0 0 0 2 2 LDLAQEELKEE NM_000106_5_CYP 0.008 45629.40 99.00
5 HLA-B4402 DLAQEELKEES AQEELKEES 2 0 0 0 0 AQEELKEES NM_000106_5_CYP 0.009 45287.57 99.00
6 HLA-B4402 LAQEELKEESG AEELKEESG 1 0 0 1 1 AQEELKEESG NM_000106_5_CYP 0.013 43568.32 95.00
7 HLA-B4402 AQEELKEESGF AELKEESGF 0 0 0 1 2 AQEELKEESGF NM_000106_5_CYP 0.231 4113.65 2.50
8 HLA-B4402 QEELKEESGFL QELKEESGF 0 0 0 1 1 QEELKEESGF NM_000106_5_CYP 0.123 13202.71 6.00
9 HLA-B4402 EELKEESGFLR EELKEESGF 0 0 0 0 0 EELKEESGF NM_000106_5_CYP 0.076 21904.46 13.00
10 HLA-B4402 ELKEESGFLRE ELKEESGFL 0 0 0 0 0 ELKEESGFL NM_000106_5_CYP 0.030 36301.74 55.00
11 HLA-B4402 LKEESGFLREX KEESFLREX 1 0 0 4 1 KEESGFLREX NM_000106_5_CYP 0.060 26205.35 19.00
这就是我正在努力解决的问题:
1。如何将同一文件中的所有表连接到一个表中?
2。是否可以将所有文件中的所有表连接到一个表中?
如果有办法在 R 中做到这一点,也可以。
非常感谢!
PS:我浏览了类似问题部分,但在这一行中找不到任何解决方案。
【问题讨论】: