【问题标题】:Split by paragraph in R在 R 中按段落分割
【发布时间】:2017-09-28 20:31:25
【问题描述】:

我正在尝试在 R 中按段落拆分文档

test.text <- c("First paragraph.  Second sentence of 1st paragraph.

           Second paragraph.")
# When we run the below, we see separation of \n\n between the 2nd and 3rd sentences
test.text

# This outputs the desired 2 blank lines in the console
writeLines("\n\n")

a <- strsplit(test.text, "\\n\\n")

它没有正确拆分。

【问题讨论】:

    标签: r regex


    【解决方案1】:

    strsplit 的输出是list。此外,\n\n 后面还有空格。所以,我们需要处理好这个问题,并使用[[unlisting 将其转换为vector

    a <- strsplit(test.text, "\n+\\s+")[[1]]
    a
    #[1] "First paragraph.  Second sentence of 1st paragraph." "Second paragraph."        
    

    【讨论】:

    • 为什么\n 没有双重\\ 工作?你有参考吗? (不是关于正则表达式,关于 R 的行为。)
    • @RuiBarradas 根据?regex Escaping non-metacharacters with a backslash is implementation-dependent. The current implementation interprets \a as BEL, \e as ESC, \f as FF, \n as LF, \r as CR and \t as TAB. (Note that these will be interpreted by R's parser in literal character strings.)
    猜你喜欢
    • 1970-01-01
    • 2020-05-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2010-09-12
    • 1970-01-01
    • 2017-06-07
    • 1970-01-01
    相关资源
    最近更新 更多