【问题标题】:Convert RTF to Plain text Array将 RTF 转换为纯文本数组
【发布时间】:2015-05-10 22:57:57
【问题描述】:
$string = "
{\rtf1\ansi\ansicpg1252\uc1\htmautsp\deff2{\fonttbl{\f0\fcharset0 Times New Roman;}{\f2\fcharset0 Tahoma;}}{\colortbl\red0\green0\blue0;\red255\green255\blue255;\red0\green0\blue255;}\loch\hich\dbch\pard\plain\ltrpar\itap0{\lang1033\fs18\f2\cf0 \cf0\ql
{\f2\cf2 {\ltrch <- MBisono--2/13/2015 12:01:25 PM ->}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch How are you? Hope all is well.  Just wanted to drop you a note that our benefits seem to be getting screwed up every time we have a new employee or if someone changes something. We have certain rules set up for Class 1 and Class 2 and it does not seem like the benefits dept is following them. }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Payroll is great we love Christine. It just seems like there is always something wrong with our benefits.}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Alexis}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Met with Admin and Benefits to discuss MAcGuffin's benefits.  Admin has had no issues, Benefits advised that recently an employee was set up with contributions, when it should have been 100% employer paid. }\li0\ri0\sa0\sb0\fi0\ql\par}
}
}";

我有一个像上面那样的 RTF 字符串。如何将其转换为纯字符串?我希望它是这样的数组。

array(
    '<- MBisono--2/13/2015 12:01:25 PM ->',
    'How are you? Hope all is well.  Just wanted to drop you a note that our benefits seem to be getting screwed up every time we have a new employee or if someone changes something. We have certain rules set up for Class 1 and Class 2 and it does not seem like the benefits dept is following them.',
    'Payroll is great we love Christine. It just seems like there is always something wrong with our benefits.',
    'Alexis',
    'Met with Admin and Benefits to discuss MAcGuffin\'s benefits.  Admin has had no issues, Benefits advised that recently an employee was set up with contributions, when it should have been 100% employer paid.'
)

字符串始终以“\ltrch”开头并以“}\li0”结尾。希望有帮助。谢谢正则表达式专家!

【问题讨论】:

  • 也许你可以在这里找到答案,通过谷歌:webcheatsheet.com/php/reading_the_clean_text_from_rtf.php 可以在 stackoverflow 的答案中找到:stackoverflow.com/questions/9273937/rtf-to-plain-text:从这些函数中,您可以首先拆分字符串的每个部分并将其转换为纯文本...或者您可以直接使用它..sourceforge.net/projects/phprtf
  • 你有没有尝试过?看起来不是这样。
  • @l'L'l,我尝试使用 preg_replace。我只需要去掉以反斜杠 \ 开头的字符串。到目前为止,我只做了这个, preg_replace("/\\/", "", $input_lines);但它只删除了反斜杠而不是之后的字符串
  • 您需要在您的正则表达式中包含您想要或不想要的其他元素;这就是它的工作原理。

标签: php regex


【解决方案1】:

我不熟悉 RTF,但我设计了一个适合您输入的 sn-p。

存储目标子字符串之前和之后的子字符串,然后通过转义反斜杠为正则表达式引擎准备它们,然后通过调用 preg_quote() 要求在正则表达式中按字面意思处理特殊字符。

\S 要求匹配以非空白字符开头——这会省略空行。

\s* 在捕获组对rtrim() 执行任何不需要的尾随空格之后。

代码:(Demo)

$string = <<<'TEXT'
{\rtf1\ansi\ansicpg1252\uc1\htmautsp\deff2{\fonttbl{\f0\fcharset0 Times New Roman;}{\f2\fcharset0 Tahoma;}}{\colortbl\red0\green0\blue0;\red255\green255\blue255;\red0\green0\blue255;}\loch\hich\dbch\pard\plain\ltrpar\itap0{\lang1033\fs18\f2\cf0 \cf0\ql
{\f2\cf2 {\ltrch <- MBisono--2/13/2015 12:01:25 PM ->}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch How are you? Hope all is well.  Just wanted to drop you a note that our benefits seem to be getting screwed up every time we have a new employee or if someone changes something. We have certain rules set up for Class 1 and Class 2 and it does not seem like the benefits dept is following them. }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Payroll is great we love Christine. It just seems like there is always something wrong with our benefits.}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Alexis}\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch  }\li0\ri0\sa0\sb0\fi0\ql\par}
{\f2\cf2 {\ltrch Met with Admin and Benefits to discuss MAcGuffin's benefits.  Admin has had no issues, Benefits advised that recently an employee was set up with contributions, when it should have been 100% employer paid. }\li0\ri0\sa0\sb0\fi0\ql\par}
}
}
TEXT;

$start = preg_quote(addcslashes('{\ltrch ', '//'), '/');
$end = preg_quote(addcslashes('}\li0\ri0\sa0\sb0\fi0\ql\par}', '//'), '/');
var_export(
    preg_match_all(
        "/$start(\S.*?)\s*$end/",
        $string,
        $matches
    )
    ? $matches[1]
    : 'no matches'
);

输出:

array (
  0 => '<- MBisono--2/13/2015 12:01:25 PM ->',
  1 => 'How are you? Hope all is well.  Just wanted to drop you a note that our benefits seem to be getting screwed up every time we have a new employee or if someone changes something. We have certain rules set up for Class 1 and Class 2 and it does not seem like the benefits dept is following them.',
  2 => 'Payroll is great we love Christine. It just seems like there is always something wrong with our benefits.',
  3 => 'Alexis',
  4 => 'Met with Admin and Benefits to discuss MAcGuffin\'s benefits.  Admin has had no issues, Benefits advised that recently an employee was set up with contributions, when it should have been 100% employer paid.',
)

如果\ri0\sa0\sb0\fi0\ql\par 是可变文本,您可以从$end 声明中删除该部分。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-02-22
    • 2012-04-12
    相关资源
    最近更新 更多