【发布时间】:2019-10-27 17:29:33
【问题描述】:
我需要使用关键字列表提取大型数据集的子集。 此处显示的大型数据集(gene_infoNCBI)包含关键字
> head(gene_infoNCBI)
X.tax_id GeneID Symbol LocusTag Synonyms dbXrefs chromosome map_location
1 7 5692769 NEWENTRY - - - - -
2 9 1246500 At1g00930 pLeuDn_01 - - - -
3 9 1246501 repA2 At1g13580 - - - -
4 9 1246502 leuA pLeuDn_04 - - - -
5 9 1246503 leuB pLeuDn_05 - - - -
6 9 1246504 leuC pLeuDn_06 - - - -
description
1 Record to support submission of GeneRIFs for a gene not in Gene (Azotirhizobium caulinodans. Use when strain, subtype, isolate, etc. is unspecified, or when different from all specified ones in Gene.).
2 putative replication-associated protein
3 putative replication-associated protein
4 2-isopropylmalate synthase
5 3-isopropylmalate dehydrogenase
6 isopropylmalate isomerase large subunit
type_of_gene Symbol_from_nomenclature_authority Full_name_from_nomenclature_authority
1 other - -
2 protein-coding - -
3 protein-coding - -
4 protein-coding - -
5 protein-coding - -
6 protein-coding - -
Nomenclature_status Other_designations Modification_date Feature_type
1 - - 20190202 -
2 - - 20180129 -
3 - - 20180129 -
4 - - 20180129 -
5 - - 20180129 -
6 - - 20180129 -
keyword.txt 具有与gene_infoNCBI 文件的“Symbol”和“LocusTag”值的列值匹配的关键字。
1 At1g00930 NA NA
2 At1g00930 NA NA
3 At1g00930 NA NA
4 At1g00930 NA NA
5 At1g00930 NA NA
6 At1g13580 NA NA
【问题讨论】:
-
请不要发布代码/数据/错误的图像:它不能被复制或搜索 (SEO),它会破坏屏幕阅读器,并且它可能不适合某些移动设备。参考:meta.stackoverflow.com/a/285557/3358272(和xkcd.com/2116)。请直接包含代码或数据(例如,
dput(head(x))或data.frame(...))。 -
此外,尚不清楚您的
Keyword.txt值应该如何与您的数据图像相匹配。请让这个问题可重现。这包括示例代码(包括列出非基础 R 包)、示例明确数据(例如,dput(head(x))或data.frame(x=...,y=...))和预期输出。参考:stackoverflow.com/questions/5963269、stackoverflow.com/help/mcve 和 stackoverflow.com/tags/r/info。
标签: r perl search merge extract