【问题标题】:Removing strings in brackets for unicode lines - python删除 unicode 行括号中的字符串 - python
【发布时间】:2011-09-23 23:15:32
【问题描述】:

我的正则表达式有一些问题,并删除了括号内的强项。

这是我的代码:

import sys, re
import codecs

reload(sys)
sys.setdefaultencoding('utf-8')

reader = codecs.open("input",'r','utf-8')
p = re.compile('s/[\[\(].+?[\]\)]//g', re.DOTALL)
# i've also tried several regex but it didn't work
# p = re.compile('\{\{*?.*?\}\}', re.DOTALL)
# p = re.compile('\{\{*.*?\}\}', re.DOTALL)

for row in reader:
    if ("(" in row) and (")" not in row):
        continue
    if row.count("(") != row.count(")"):
        continue
    else:
        row2 = p.sub('', row)
        print row2

对于输入文本文件,它看起来像这样:

가시 돋친(신랄한)평 spinosity
가장 완전한 (같은 종류의 것 중에서)   unabridged
(알코올이)표준강도(50%) 이하의 underproof
(암초 awash
치명적인(fatal) capital
열을) 전도하다    transmit

所需的输出应如下所示:

가시 돋친평  spinosity
가장 완전한  unabridged
표준강도 이하의    underproof
치명적인    capital

【问题讨论】:

标签: python regex string dictionary brackets


【解决方案1】:

这对你有用吗?

# -*- coding: utf-8 -*-
import sys, re
import codecs

#reload(sys)
#sys.setdefaultencoding('utf-8')

#prepareing the examples to work on
writer = codecs.open("input.txt",'w','utf-8')
examples = [u'가시 돋친(신랄한)평 spinosity',
            u'가장 완전한 (같은 종류의 것 중에서)',
            u'알코올이)표준강도(50%) 이하의 underproof',
            u'(암초 awash',
            u'치명적인(fatal) capital']
for exampl in examples:
    writer.write(exampl+"\n")
writer.write(exampl)
writer.close()

reader = codecs.open("input.txt",'r','utf-8')

#order of patterns is important,
#if you remove brackets first, the other won't find anything
patterns_to_remove = [r"\(.{1,}\)",r"[\(\)]"]

#one pattern would work just fine, with the loop is a bit more clear
#pat = r"(\(.{1,}\))|([\(\)])"    
#for row in reader:
#    row = re.sub(pat,'',row)#,re.U)
#    print row

reader.seek(0)
for row in reader:
    for pat in patterns_to_remove:
        row = re.sub(pat,'',row)#,re.U)
    print row
reader.close()

【讨论】:

  • 不错的正则表达式,它有效。所以只是一些澄清,“re.U”在 unicode 中表示 re 对吗?为什么它是 r"[()]",为什么不会 r"()" 呢?你能解释一下吗?{1,},我不太擅长正则表达式。非常感谢,代码有效。通常,我会跳过 seek(0) 和 close()。
  • 老实说,我觉得 re.U 根本不需要,它可能是我玩它的剩余部分 - 我会评论它。 r"[()]" 是删除所有括号,所以它会将 "some(thing)else" 或 "something)else" 转换为 "somethingelse" 而 r"()" 将纯粹寻找 "()" 所以" something()else" 到 "somethingelse"。根据doc,[]用来表示一组字符。
猜你喜欢
  • 2015-07-07
  • 2015-06-10
  • 2012-03-16
  • 1970-01-01
  • 2012-01-28
  • 1970-01-01
  • 2017-09-24
  • 1970-01-01
相关资源
最近更新 更多