【问题标题】:R: (Pegas) problems with haplotypes - (error: 'h' must be of class 'haplotype')R:(Pegas)单倍型问题-(错误:'h'必须属于'haplotype'类)
【发布时间】:2016-05-03 22:56:41
【问题描述】:

我最近开始研究单倍型数据,我正在处理来自 1000 个基因组项目的数据,并尝试使用 R 中的 Pegas 包对其进行操作。到目前为止,我已经走到了这一步:

library(pegas)
a <- "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502"
b <- "ALL.chrY.phase3_integrated_v1b.20130502.genotypes.vcf.gz"
url <- paste(a, b, sep = "/")
download.file(url, "chrY.vcf.gz")
(info <- VCFloci("chrY.vcf.gz"))
SNP <- is.snp(info)
X.SNP <- read.vcf("chrY.vcf.gz", which.loci = which(SNP))
h <- haplotype(X.SNP, 6020:6030)
net <- haploNet(h)
plot(net)

我想绘制一个单倍型网络,但它不执行它。我收到以下消息:'h' must be of class 'haplotype'

如果我打印出 h 我得到:

> h
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
. "C"  "C"  "C"  "C"  "C"  "C"  "C"  "C"  "C"  "C"   "C"   "C"   "C"   "C"   "C"   "C"   "T"   "C"   "C"  
. "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"   "G"   "G"   "G"   "G"   "G"   "G"   "G"   "A"   "G"  
. "C"  "C"  "C"  "C"  "C"  "C"  "T"  "C"  "C"  "C"   "C"   "C"   "C"   "C"   "C"   "C"   "C"   "C"   "C"  
. "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"   "C"   "T"   "T"   "T"   "T"   "T"   "T"   "T"   "T"  
. "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"   "G"   "G"   "G"   "A"   "G"   "G"   "G"   "G"   "G"  
. "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"   "T"   "C"   "T"   "T"   "T"   "T"   "T"   "T"   "T"  
. "A"  "A"  "A"  "A"  "A"  "A"  "A"  "A"  "A"  "C"   "A"   "A"   "A"   "A"   "A"   "A"   "A"   "A"   "A"  
. "G"  "G"  "G"  "."  "G"  "G"  "G"  "G"  "G"  "G"   "G"   "G"   "A"   "G"   "G"   "G"   "G"   "G"   "G"  
. "."  "T"  "C"  "T"  "T"  "C"  "T"  "."  "."  "."   "T"   "T"   "T"   "T"   "C"   "T"   "T"   "T"   "T"  
. "."  "A"  "."  "A"  "."  "C"  "A"  "A"  "C"  "."   "A"   "A"   "A"   "A"   "A"   "C"   "A"   "A"   "A"  
. "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"   "T"   "T"   "T"   "T"   "T"   "T"   "T"   "T"   "C"  
attr(,"class")
[1] "haplotype.loci"
attr(,"freq")
 [1]   18 1142    2    5   25    6    1    4    2    1    2    5    1    9    1    3    1    4    1

它显然分配了 19 个单倍型。数据的呈现方式一定有问题。有什么建议吗?此外,关于 Pegas 以及如何使用 Pegas 操作 VCF 文件的资料也很少。有没有人知道一个很好的资源(网页或书籍)来获取有关如何从 VCF 文件中操作单倍型的信息,它甚至不一定适用于 Pegas,任何 R 库都可以,或者 Python ......真的。

谢谢你的帮助,彼得

【问题讨论】:

  • 我可能会建议直接联系包作者(联系信息CRAN)。无论出于何种原因,haplotype.loci 对象和haplotype 对象之间存在很多差异,haploNet 想要后者,但read.vcf 返回会触发基因座版本。

标签: r bioinformatics vcf-variant-call-format


【解决方案1】:

我知道这是一篇旧帖子,但如果其他人遇到同样的问题,我已经找到了解决该问题的方法。使用 pacakage "vcfR" 您可以使用 read.vcfR() 读取 vcf,然后使用 vcfR2DNAbin() 将其转换为 DNAbin。在 DNAbin 上使用 haplotype() 会导致类“haplotype”而不是“haplotype.loci”。

【讨论】:

    【解决方案2】:

    这是一个预期的结果:目前 haploNet() 仅适用于从 DNA 序列生成的“单倍型”类(“DNAbin”类)。 read.vcf() 的输出属于“基因座”类,而 haplotype() 是适用于这两个类的通用函数。

    如果您只处理 SNP,您可以通过以下方式避免这种情况:

    class(h) <- NULL
    h <- as.DNAbin(h)
    

    (最终)目标是让 haploNet() 也适用于“haplotype.loci”类(仍在开发中)以及其他类。

    干杯,伊曼纽尔

    【讨论】:

    • 感谢您的回答!我已经尝试了上面列出的解决方案,但它似乎错误地读取了 haplotype.loci 对象。 &gt; h 11 DNA sequences in binary format stored in a matrix. All sequences of same length: 19 Labels: . . . . . . ... Base composition: ` a c g t ` NaN NaN NaN NaN 它似乎是按行而不是按列读取序列。
    猜你喜欢
    • 1970-01-01
    • 2014-06-11
    • 2013-08-03
    • 2014-11-03
    • 1970-01-01
    • 2023-03-24
    • 1970-01-01
    • 2020-09-24
    • 1970-01-01
    相关资源
    最近更新 更多