【问题标题】:RegEx: find and replace EOL between quotesRegEx:在引号之间查找和替换 EOL
【发布时间】:2017-07-25 13:38:20
【问题描述】:

在这样的多行字符串中:

She Loves You [Mono],"Past Masters, Vol. 1",4,"She loves you, yeah, yeah, yeah
She loves you, yeah, yeah, yeah
She loves you, yeah, yeah, yeah, yeah"
Eight Days A Week,Beatles For Sale,8,"Eight days a week
I love you.
Eight days a week
Is not enough to show I care."

我想用“¶”(ASCII 代码 182)之类的替换字符替换 EOL (\r\n)引号之间的,以使该字符串成为单行。

结果是:

She Loves You [Mono],"Past Masters, Vol. 1",4,"She loves you, yeah, yeah, yeah¶She loves you, yeah, yeah, yeah¶She loves you, yeah, yeah, yeah, yeah"
Eight Days A Week,Beatles For Sale,8,"Eight days a week¶I love you.¶Eight days a week¶Is not enough to show I care."

我尝试了 StackOverflow 上的各种 RegEx 相关解决方案,但无法将它们调整为我想要的。

我将在 AHK 函数中使用这个 RegEx 表达式:

RegExReplace(Haystack, NeedleRegEx [, Replacement = "", OutputVarCount = "", Limit = -1, StartingPosition = 1])

RegExReplace(MyText, NeedleRegEx???, "¶")

任何帮助表示赞赏。

【问题讨论】:

  • 不,它不会修复它。挑战在于仅在引号之间进行替换。
  • 我会让我的问题更清楚。
  • 我的问题不够清楚。我只在标题中提到了“引号之间”的要求,并没有再次说明问题本身。对于那个很抱歉。请查看已编辑的问题。
  • 知道你是否在引号之间是上下文感知的,正则表达式很糟糕,最好留给更聪明的解析器或愚蠢但定制的解析器。我根本不了解 AHK,但我认为您应该旨在提取引用的部分,在这些部分中进行搜索/替换,然后将它们更新回来(或重建整个字符串)。那个或定制的解析器(读取字符,如果遇到引号,则否定“inside-quotes”布尔值,将换行符更改为¶或不取决于布尔值,输出字符)
  • 是的。这就是我在当前脚本中所做的,解析整个文件并替换处理引号上下文。但我想知道 RegEx 表达式是否可以以更有效的方式完成。感谢亚伦的投入。

标签: regex autohotkey


【解决方案1】:

您可以解析字符串并以这种方式对其进行操作吗?

str = 
(
She Loves You [Mono],"Past Masters, Vol. 1",4,"She loves you, yeah, yeah, yeah
She loves you, yeah, yeah, yeah
She loves you, yeah, yeah, yeah, yeah"
Eight Days A Week,Beatles For Sale,8,"Eight days a week
I love you.
Eight days a week
Is not enough to show I care."
)
outStr := ""
Loop, Parse, str, `"
{
    field := A_LoopField
    StringReplace, field, field, `r,, All
    StringReplace, field, field, `n, ¶, All
    outStr .= field
}
MsgBox % outStr
ExitApp

【讨论】:

  • 我不是在寻找“简单的替代品”。挑战在于仅在引号之间进行替换,而不是像您的简单解决方案那样替换所有引号。
  • 您必须解析字符串。我修改了答案。
  • 谢谢fischgeek。这接近于需要的(如果没有 RegEx 可以在非常大的文件上更快地做到这一点。但是这个脚本有一个问题:不能删除值周围的引号。例如:“Past Masters, Vol. 1”和“她爱你,......是的”必须保留他们的报价封装。AHK论坛的一位朋友给我发了一个变体。我会尽快发布。
【解决方案2】:

由于似乎没有仅使用 RegEx 的解决方案,因此我在此处发布了 maestrith 编写的解决方案(在 AHK 论坛上)。它确实替换了引号内的 EOL,保留了引号封装器。它使用 StrSplit 读取和处理整个内容以隔离引用的部分,并使用 RegExReplace 和 StringReplace 的组合来处理它们。我仍然需要在一个非常大的文件上对其进行测试,看看它与我编写的另一个脚本相比如何执行,该脚本一次处理一个字符。

#SingleInstance,Force
info=
(
She Loves You [Mono],"Past Masters, Vol. 1",4,"She loves you, yeah, yeah, yeah
She loves you, yeah, yeah, yeah
She loves you, yeah, yeah, yeah, yeah"
Eight Days A Week,Beatles For Sale,8,"Eight days a week
I love you.
Eight days a week
Is not enough to show I care."
)
for a,b in StrSplit(info,Chr(34)){
    if(!Mod(A_Index,2)){
        replace:=RegExReplace(b,"\R",chr(182))
        StringReplace,info,info,%b%,%Replace%
    }
}
Gui,Font,s10
Gui,Add,Edit,w1000 h200 -Wrap,%Info%
Gui,Show

【讨论】:

    【解决方案3】:

    即使它没有回答我最初的问题,我也会将其添加为答案。这不使用 RegEx,但最终比早期答案中的暂定更快(在 3 megs csv 文件上大约快 3 到 5 倍)。

    #SingleInstance,Force
    info=
    (
    She Loves You [Mono],"Past Masters, Vol. 1",4,"She loves you, yeah, yeah, yeah
    She loves you, yeah, yeah, yeah
    She loves you, yeah, yeah, yeah, yeah"
    Eight Days A Week,Beatles For Sale,8,"Eight days a week
    I love you.
    Eight days a week
    Is not enough to show I care."
    )
    blnInsideEncapsulators := false
    Loop, Parse, info
        ; parsing on a temporary copy of info -  so we can update the original info inside the loop
    {
        if (A_Index = 1)
            info := ""
        if (blnInsideEncapsulators AND A_Loopfield = "`n")
            info := info . Chr(182)
        else
            info := info . A_Loopfield
        if (A_Loopfield = """")
            blnInsideEncapsulators := !blnInsideEncapsulators ; beginning or end of encapsulated text
    }
    Gui,Font,s10
    Gui,Add,Edit,w1000 h200 -Wrap,%Info%
    Gui,Show
    

    如果有人提供完整的 RegEx 解决方案,我将在没有接受答案的情况下离开此线程。永远不知道...

    感谢大家的意见。

    【讨论】:

      猜你喜欢
      • 2013-08-05
      • 1970-01-01
      • 2013-11-15
      • 2016-06-08
      • 2012-11-14
      • 2019-10-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多