【问题标题】:Python regex replace anchorsPython 正则表达式替换锚点
【发布时间】:2015-09-17 17:20:56
【问题描述】:

我正在尝试重写我在this 答案中看到的代码:

import re

pat1 = re.compile(r"(^|[\n ])(([\w]+?://[\w\#$%&~.\-;:=,?@\[\]+]*)(/[\w\#$%&~/.\-;:=,?@\[\]+]*)?)", re.IGNORECASE | re.DOTALL)

pat2 = re.compile(r"#(^|[\n ])(((www|ftp)\.[\w\#$%&~.\-;:=,?@\[\]+]*)(/[\w\#$%&~/.\-;:=,?@\[\]+]*)?)", re.IGNORECASE | re.DOTALL)


urlstr = 'http://www.example.com/foo/bar.html'

urlstr = pat1.sub(r'\1<a href="\2" target="_blank">\3</a>', urlstr)
urlstr = pat2.sub(r'\1<a href="http:/\2" target="_blank">\3</a>', urlstr)

print urlstr

具体来说,我试过这个:

pattern = re.compile('<a href="javascript:rt\(([0-9]+)\)">Download</a>');

rawtable = pattern.sub(r'\1', rawtable) 

我想在哪里替换这样的东西:

<a href="javascript:rt(2061)">Download</a>

用这个:

2061

我也想这样做:

<a href="#" onclick="javascript:ra('Name of object one')"
  title="Some title Text">Name of Object two</a>

只有

Name of Object two

通过做

pattern = re.compile('<a href="#" onclick="javascript:ra\('(:?[a-zA-Z0-9 +)'\)" title="Some title Text">([a-zA-Z0-9 ]+)</a>');

rawtable = pattern.sub(r'\1', rawtable) 

但它也不起作用。有什么建议吗?

【问题讨论】:

    标签: python regex


    【解决方案1】:

    我想在哪里替换这样的东西:

    <a href="javascript:rt(2061)">Download</a>
    

    您的第一个代码有效。 Test here



    我也想这样做:

    <a href="#" onclick="javascript:ra('Name of object one')" title="Some title Text">Name of Object two</a>`
    

    至于第二个,请检查我在这里标记的内容:

    pattern = re.compile('<a href="#" onclick="javascript:ra\('(:?[a-zA-Z0-9 +)'\)" title="Some title Text">([a-zA-Z0-9 ]+)</a>');
                                                              | | |         |  ^ unescaped quote (in the string passed to re.compile() )
                                                              | | |         |
                                                              | | ^---------^ you didn't close the character class (as in [a-z]).. add a "]"
                                                              | ^ correct syntax is (?: pattern ) ... However, no point in using it here
                                                              ^ another unescaped quote
    

    代码:

    #python 3.4.3
    import re;
    
    rawtable = '<a href="#" onclick="javascript:ra(\'Name of object one\')" title="Some title Text">Name of Object two</a>';
    
    pattern = re.compile('<a href="#" onclick="javascript:ra\(\'[a-zA-Z0-9 ]+\'\)" title="Some title Text">([a-zA-Z0-9 ]+)</a>');
    
    rawtable = pattern.sub(r'\1', rawtable);
    print(rawtable);
    

    Run this code

    输出:

    Name of Object two
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-06-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-04-29
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多