【问题标题】:Sentence with specific size AND boundaries detection具有特定大小和边界检测的句子
【发布时间】:2012-07-08 11:55:53
【问题描述】:

这是我的问题:我有一个大字符串(近 8000 个字符),我想要两件事:

  1. 检测句子边界,如“.”和
  2. 句子长度不超过 600 个字符

我知道在某些情况下不可能两者兼得。在这种情况下,找到一个空格并拆分句子。

ridgerunner 为条件号 1 提供的这个解决方案非常有效,请参阅原始链接 (http://goo.gl/PqI6d),但它通常输出大于 600 个字符的句子.有光吗??提前致谢!

【问题讨论】:

  • 检查这个正则表达式是否是你想要的:/(?:[^.]{1,20}(?: |\.)|\w{20,}(?: |\.)?)/。您可以将 20 更改为 600 以适合您的情况。测试用例:This is a short sentence. This is a very very very very very very long long long long long long sentence. Andthisisaverylongwordwithoutspaces.

标签: php regex size boundary sentence


【解决方案1】:

你可能会更好地匹配字符串。您的匹配正则表达式可能如下所示:

(.{0,600}?\.)|(.{0,600}(?=\ ))

简而言之,您首先在句点之前寻找尽可能小的字符串。如果没有,则查找尽可能长的字符串,后跟空格。然后下一场比赛将从你离开的地方继续。

请注意,这是通用正则表达式。您的 php 实现可能会有所不同。

【讨论】:

    【解决方案2】:

    Tks nhahtdh。请看看我是否遗漏了什么。以下是我的字符串的摘录和使用您的建议的输出。

    <?php 
        $ptn = "/(?:[^.]{1,600}(?: |\.)|\w{600,}(?: |\.)?)/";
        $str = "Amblyopia occurs when the nerve pathway from one eye to the brain does not develop during childhood. This occurs because the abnormal eye sends a blurred image or the wrong image to the brain. This confuses the brain, and the brain may learn to ignore the image from the weaker eye. Strabismus is the most common cause of amblyopia. There is often a family history of this condition. The term "lazy eye" refers to amblyopia, which often occurs along with strabismus. However, amblyopia can occur without strabismus and people can have strabismus without amblyopia.First, any eye condition that is causing poor vision in the amblyopic eye (such as cataracts) needs to be corrected. Children with a refractive error (nearsightedness, farsightedness, or astigmatism) will need glasses. Next, a patch is placed on the normal eye. This forces the brain to recognize the image from the eye with amblyopia. Sometimes, drops are used to blur the vision of the normal eye instead of putting a patch on it. Children whose vision will not fully recover, and those with only good eye due to any disorder should wear glasses with protective polycarbonate lenses. Polycarbonate glasses are shatter- and scratch-resistant. Children who get treated before age 5 will usually recover almost completely normal vision, although they may continue to have problems with depth perception. Delaying treatment can result in permanent vision problems. After age 10, only a partial recovery of vision can be expected. Early recognition and treatment of the problem in children can help to prevent permanent visual loss. All children should have a complete eye examination at least once between ages 3 and 5. Special techniques are needed to measure visual acuity in a child who is too young to speak. Most eye care professionals can perform these techniques.";
        preg_split($ptn, $str, -1, PREG_SPLIT_NO_EMPTY);
        print_r($result);
        ?>
    

    结果:我需要小于 600 个字符的字符串中的句子

     Array
    (
    [0] => childhood.
    [1] => brain.
    [2] => eye.
    [3] => amblyopia.
    [4] => condition.
    [5] => strabismus.
    [6] => amblyopia.
    [7] => corrected.
    [8] => glasses.
    [9] => eye.
    [10] => amblyopia.
    [11] => it.
    [12] => lenses.
    [13] => scratch-resistant.
    [14] => perception.
    [15] => problems.
    [16] => expected.
    [17] => loss.
    [18] => 5.
    [19] => speak.
    [20] => techniques
    )
    

    【讨论】:

      猜你喜欢
      • 2012-07-24
      • 1970-01-01
      • 2011-06-29
      • 1970-01-01
      • 2017-12-20
      • 2015-10-30
      • 1970-01-01
      • 2015-10-09
      • 1970-01-01
      相关资源
      最近更新 更多