【问题标题】：How to add to the number near the end of each line如何添加到每行末尾附近的数字
【发布时间】：2011-07-15 12:29:29
【问题描述】：

假设文件中有一些文本：

(bookmarks
("Chapter 1 Introduction 1" "#1"
("1.1 Problem Statement and Basic Definitions 2" "#2")
("1.2 Illustrative Examples 4" "#4")
("1.3 Guidelines for Model Construction 26" "#26")
("Exercises 30" "#30")
("Notes and References 34" "#34"))
)

如果有的话，如何在每行的最后一个数字上加11，即

(bookmarks
("Chapter 1 Introduction 1" "#12"
("1.1 Problem Statement and Basic Definitions 2" "#13")
("1.2 Illustrative Examples 4" "#15")
("1.3 Guidelines for Model Construction 26" "#37")
("Exercises 30" "#41")
("Notes and References 34" "#45"))
)

通过使用 sed、awk、python、perl、正则表达式 ....

感谢和问候！

【问题讨论】：

标签： python regex perl sed awk

【解决方案1】：

Emacs Lisp

预赛

这里我们将使用来自 dash 和 s 第三方库的函数，您可以使用 Emacs 的包系统从 MELPA 安装到 Emacs 中。 How to install packages in Emacs。 dash 是一个列表和树操作库，它还包含使代码更简洁的各种函数和functional。 s 是一个字符串操作库。当您经常在 Elisp 中编写代码时，我强烈建议您安装这些软件包以简化编码。

-map 与mapcar 相同，它遍历一个列表，为每个元素调用一个函数，并返回一个所有元素都发生变化的列表。例如。 (-map '1+ '(1 2 3)) ; returns (2 3 4)。但是，-map 有一个 anaphoric macro 版本，它允许编写简洁的代码而不是传递 lambda。照应版本以 2 个破折号开头。例如。 (--map (+ 10 it) '(1 2 3)) 等价于(-map (lambda (x) (+ 10 x)) '(1 2 3))。

->> 是来自dash 的线程宏，类似于function composition，但顺序相反。例如。 (number-to-string (-sum (-map '1+ '(1 2 3))))，返回"9"，等价于(->> '(1 2 3) (-map '1+) -sum number-to-string)。

字符串方法

假设您将整个结构存储在字符串s 中。然后你必须找到每一次出现#[some_number]形式的字符序列，可能使用正则表达式，并将其替换为增加11的数字。

(let* ((old (-map 'car (s-match-strings-all "#[0-9]\\{1,3\\}" s)))
       (new (--map (->> it
                        (s-chop-prefix "#")
                        string-to-number
                        (+ 11)
                        number-to-string
                        (s-prepend "#"))
                   old)))
  (s-replace-all (-zip old new) s))

树方法

但是等一下，你的结构递归地括在括号中，它是一个s-expression！我们可以将它作为一棵树进行遍历，并用增加 11 的新值替换从 # 开始并包含数值的每个字符串。-tree-map-nodes 就像树的 -map。它仅在谓词返回 true 时应用函数。 IOW，如果谓词不适用，它会跳过一些未更改的元素。

-tree-map-nodes 以两种方式递归，包括广度和深度。这意味着，它将列表视为普通元素，第一个元素是整个列表。例如。 (-tree-map-nodes 'zerop '1+ '(0 (1 (0) 1) 0)) 不正确，会抛出这个错误：*** Eval error *** Wrong type argument: numberp, (0 (1 (0) 1) 0)。相反，您应该首先检查元素是否为数字。例如。 (--tree-map-nodes (and (numberp it) (zerop it)) (1+ it) '(0 (1 (0) 1) 0)) 将返回 (1 (1 (1) 1) 1)。假设您的树在变量q 中。那么解决方案如下，它将返回一个新的修改后的s-expression：

(--tree-map-nodes (and (stringp it)
                       (eq (elt it 0) ?#)
                       (s-numeric? (s-chop-prefix "#" it)))
                  (->> it
                       (s-chop-prefix "#")
                       string-to-number
                       (+ 11)
                       number-to-string
                       (s-prepend "#"))
                  q)

【讨论】：

【解决方案2】：

这种语法是s-expressions（简称sexps），在Lisp 和相关语言如Scheme 中最容易操作。最简单的复杂任务，即；如果您可以假设您的输入足够温和（例如，章节标题中没有"#，说明它们的换行符等），那么对于此任务，文本处理工具（如其他答案所示）是可取的。

在 Lisp 或 Scheme 中，将数据作为结构化数据进行读写就像 (read) 和 (write data) 一样简单。其他事情就不那么容易了，例如在 Lisp 或 Scheme 中没有标准的方式来读取命令行参数。

这是一个进行所需转换的 Lisp 程序。它将数据视为结构化数据，因此您不必担心演示。第一行，获取第一个命令行参数，为CLisp；其余的是可移植的 Common Lisp。

(setq delta (parse-integer (car ext:*args*)))
(defun shift-page (page)
  (format nil "#~D" (+ delta (parse-integer page :start 1))))
(defun shift-pages (entry)
  (let ((title (car entry))
        (page (cadr entry))
        (subentries (cddr entry)))
    (cons title (cons (shift-page page) (mapcar #'shift-pages subentries)))))
(let ((toc (read)))
  (write (cons 'bookmarks (mapcar #'shift-pages (cdr toc)))))

【讨论】：

【解决方案3】：

Python：

import re
file_name="bin/SO/bookmarks.txt"

print "unmodified file:"
with open(file_name) as f:
    for line in f:
        print line.rstrip()

print   

print "modified file:"
i=11
with open(file_name) as f:
    for line in f:
        m=re.match(r'(^.*"#)(\d+)(.*$)',line)
        if m:
            new_line=m.group(1)+str(int(m.group(2))+i)+m.group(3)
            print new_line
        else:
            print line.rstrip()

输出：

unmodified file:
(bookmarks
("Chapter 1 Introduction 1" "#1"
("1.1 Problem Statement and Basic Definitions 2" "#2")
("1.2 Illustrative Examples 4" "#4")
("1.3 Guidelines for Model Construction 26" "#26")
("Exercises 30" "#30")
("Notes and References 34" "#34"))
)

modified file:
(bookmarks
("Chapter 1 Introduction 1" "#12"
("1.1 Problem Statement and Basic Definitions 2" "#13")
("1.2 Illustrative Examples 4" "#15")
("1.3 Guidelines for Model Construction 26" "#37")
("Exercises 30" "#41")
("Notes and References 34" "#45"))
)

【讨论】：

【解决方案4】：

awk -F'#' 'NF>1{split($2,a,"[0-9]+");print $1 FS $2+11 a[2];next}1' infile

概念证明

$ awk -F'#' 'NF>1{split($2,a,"[0-9]+");print $1 FS $2+11 a[2];next}1' infile
(bookmarks
("Chapter 1 Introduction 1" "#12"
("1.1 Problem Statement and Basic Definitions 2" "#13")
("1.2 Illustrative Examples 4" "#15")
("1.3 Guidelines for Model Construction 26" "#37")
("Exercises 30" "#41")
("Notes and References 34" "#45"))
)

【讨论】：

谢谢！有一个问题，输出中的某些行缺少括号的右半部分。
@Tim 我的错，我已经更新了我的答案以适应你的括号。结果也比我之前的回答简单一点
SiegeX：谢谢！一些问题 (1) 为什么 "$2+11" 有效，因为 $2 不完全是数字？（2）“下一步”的目的是什么？如果没有“下一个”，它会一样吗？ (3) “next}”后面的“1”是什么意思？如果没有“1”，它会一样工作吗？
@Tim: (1) awk 知道您只能对数字进行数学运算，因此它会尽力而为并忽略其余部分 (2) 'next' 告诉 awk 立即接受下一个输入并重新开始处理，可能会跳过本来会发生的进一步处理 (3) 大括号外的“1”是告诉 awk 无条件打印 $0 的快捷方式。 (4) 'next' 和 '1' 协同工作。由于'1'，任何不匹配模式（NF>1）的内容都会被无条件打印。任何匹配的行都会被更改，然后通过“打印”显式打印。 'next' 防止重复打印。

【解决方案5】：

根据我对您之前问题的回答：

awk '{n = $NF + 11; print "(\"" $0 "\" \"#" n "\")"}' inputfile

或

awk 'BEGIN {q="\x22"} {n = $NF + 11; print "(" q $0 q " " q "#" n q ")"}' inputfile

这适用于您在上一个问题中提供的数据。我无法确定你是如何从那个到你在这个问题中发布的例子得到的，因为括号的嵌套方式有所不同。您也没有说 (bookmarks ) 包装器是否已经存在于原始输入中，或者我们看不到的某些代码是否在添加其他内容时添加了它。

您正在做的事情开始看起来有点像 XML。也许您应该使用真实的东西并使用适当的工具来操作它。

【讨论】：

谢谢！ (1) 我正在学习从此链接ubuntuforums.org/showthread.php?t=1522901#3 为 djvu 文件创建书签。 (2) 上一篇和当前一篇只是我制作书签文本文件的两个步骤。我手动添加“书签”作为两个步骤之间的第一行。如果有一个脚本可以完成所有步骤，那就太棒了！（3）我不了解XML，不知道djvu是否与XML有关。是吗？如果是，什么是“真实的东西并使用适当的工具来操纵它”？
@Tim：您可以在步骤中使用此答案中的命令之一，然后在其周围添加(bookmarks )。我不知道关于djvu 或djvused 的第一件事，但快速谷歌出现djvuxml。可以使用适当的 Python 或 Perl 模块或 shell 实用程序（例如 xmlstarlet）轻松操作 XML 文件。
@DennisWilliamson 你说的那些新奇的 XML 东西是什么？这些都是很好的性别！

【解决方案6】：

在 Python 中，尝试：

import re
m = re.search(r'(?<=#)([0-9]+)',txt)

查找下一个数字。然后设置：

txt = txt[:m.start()] + str(int(m.group())+11) + txt[m.end():]

只要search 没有找到任何进一步的匹配项，就重复该操作（例如在 while 循环中）。

注意：正则表达式 (?<=#)([0-9]+) 匹配 # 字符后面的任何数字序列。 start() 产生下一场比赛的开始位置； end() 产生结束位置，group() 产生实际匹配。表达式 str(int(m.group()) +11) 将匹配的数字转换为 int 值，加 11 并重新转换为字符串。

【讨论】：

【解决方案7】：

在 Python 中

dh = '''"Chapter 1 Introduction 1" "#1"
"1.1 Problem Statement and Basic Definitions 2" "#2"
"1.2 Illustrative Examples 4" "#4"
"1.3 Guidelines for Model Construction 26" "#26"
"Exercises 30" "#30"
"Notes and References 34" "#34"'''

pat = re.compile('^(".+?(\d+)" *"#)\\2" *$',re.M)

def zoo(mat):
    return '%s%s"' % (mat.group(1),str(int(mat.group(2))+11))

print dh
print
print pat.sub(zoo,dh)

结果

"Chapter 1 Introduction 1" "#1"
"1.1 Problem Statement and Basic Definitions 2" "#2"
"1.2 Illustrative Examples 4" "#4"
"1.3 Guidelines for Model Construction 26" "#26"
"Exercises 30" "#30"
"Notes and References 34" "#34"

"Chapter 1 Introduction 1" "#12"
"1.1 Problem Statement and Basic Definitions 2" "#13"
"1.2 Illustrative Examples 4" "#15"
"1.3 Guidelines for Model Construction 26" "#37"
"Exercises 30" "#41"
"Notes and References 34" "#45"

.

但是从您在其他消息中公开的前面的字符串开始：

eh = '''Chapter 3 Convex Functions 97 
3.1 Definitions 98  
3.2 Basic Properties 103'''

pat = re.compile('^(.+?(\d+)) *$',re.M)

def zaa(mat):
    return '"%s" "%s"' % (mat.group(1),str(int(mat.group(2))+11))

print eh
print
print pat.sub(zaa,eh)

结果

Chapter 3 Convex Functions 97 
3.1 Definitions 98  
3.2 Basic Properties 103

"Chapter 3 Convex Functions 97" "108"
"3.1 Definitions 98" "109"
"3.2 Basic Properties 103" "114"

所有这些都是功课吗？

.

编辑：

我更正了上面的第一个代码

dh = '''(bookmarks
("Chapter 1 Introduction 1" "#1")
("1.1 Problem Statement and Basic Definitions 2" "#2")
("1.2 Illustrative Examples 4" "#4")
("1.3 Guidelines for Model Construction 26" "#26")
("Exercises 30" "#30")
("Notes and References 34" "#34"))
)'''

pat = re.compile('^(\(".+?(\d+)" *"#)\\2" *(\)\)?)$',re.M)

def zoo(mat):
    return '%s%s"%s' % (mat.group(1),str(int(mat.group(2))+11),mat.group(3))

print dh
print
print pat.sub(zoo,dh)

结果

(bookmarks
("Chapter 1 Introduction 1" "#1")
("1.1 Problem Statement and Basic Definitions 2" "#2")
("1.2 Illustrative Examples 4" "#4")
("1.3 Guidelines for Model Construction 26" "#26")
("Exercises 30" "#30")
("Notes and References 34" "#34"))
)

(bookmarks
("Chapter 1 Introduction 1" "#12")
("1.1 Problem Statement and Basic Definitions 2" "#13")
("1.2 Illustrative Examples 4" "#15")
("1.3 Guidelines for Model Construction 26" "#37")
("Exercises 30" "#41")
("Notes and References 34" "#45"))
)

【讨论】：

@Siegex 好的，我更正了输入文本，我放了括号，我不相信文本真的和他们在一起。是不是现在？
@SiegeX：谢谢！ (1) 不是作业。没有家庭作业是这样的。我正在学习为 djvu 文件创建书签，但手动操作既繁琐又容易出错。 (2)我想从一个文件输入并输出到另一个文件。该怎么做？

【解决方案8】：

如果你可以使用 Ruby(1.9+)

$ ruby -ne 'puts $_=/#/?$_.gsub(/(.*#)(\d+)(.*)/){"#{$1}"+($2.to_i+11).to_s+"#{$3}"}:$_' file
(bookmarks
("Chapter 1 Introduction 1" "#12"
("1.1 Problem Statement and Basic Definitions 2" "#13")
("1.2 Illustrative Examples 4" "#15")
("1.3 Guidelines for Model Construction 26" "#37")
("Exercises 30" "#41")
("Notes and References 34" "#45"))
)

【讨论】：

【解决方案9】：

use strict;
use warnings;
while(my $line = <DATA>){
  $line =~ s/#(\d+)/'#'.($1 + 11)/e;
}
__DATA__
(bookmarks
("Chapter 1 Introduction 1" "#1"
("1.1 Problem Statement and Basic Definitions 2" "#2")
("1.2 Illustrative Examples 4" "#4")
("1.3 Guidelines for Model Construction 26" "#26")
("Exercises 30" "#30")
("Notes and References 34" "#34"))
)

输出：

(bookmarks
("Chapter 1 Introduction 1" "#12"
("1.1 Problem Statement and Basic Definitions 2" "#13")
("1.2 Illustrative Examples 4" "#15")
("1.3 Guidelines for Model Construction 26" "#37")
("Exercises 30" "#41")
("Notes and References 34" "#45"))
)

【讨论】：

@SiegeX：是的，你是对的，现在我已经解决了，感谢您指出问题。
谢谢！我想从一个文件输入并输出到另一个文件，所以我尝试 "perl -pe 's/#(\d+)/'#'.($1 + 11)/e' output.txt " 在我的终端中，但是它现在将最后一个数字留空，例如，在第二行中，"("Chapter 1 Introduction 1" "#12"" 变为 "("Chapter 1 Introduction 1" """。跨度>