用于 R 注释的正则表达式 Emacs答案

【问题标题】：Regexp Emacs for R comments用于 R 注释的正则表达式 Emacs
【发布时间】：2012-11-18 10:13:12
【问题描述】：

我想在 Emacs 中构建一个正则表达式来清理我的 R 代码。

我遇到的一个问题是有不同类型的 cmets：你有一定数量的空格（1），例如：

        # This is a comment:
# This is also a comment

或者你有这样的情况（2）：

require(lattice) # executable while the comment is informative

我的想法是，当 cmets 属于第二类（在可执行的内容之后）时，我想对齐它们，同时排除第一类的 cmets。

理想情况下，它会将所有 cmets 对齐在第一类而不是第一类之间。

例子：

funfun <- function(a, b) {
# This is a function
    if (a == b) { # if a equals b
      c <- 1 # c is 1 
    }
  }  
#

收件人：

funfun <- function(a, b) {
# This is a function
    if (a == b) { # if a equals b
      c <- 1      # c is 1 
    }
  }  
#

我找到了一个正则表达式来替代第一类，所以我能够按段落对齐它们（标记段落）。效果不错。

那么问题就是反向替换：

(replace-regexp "^\\s-+#+" "bla" nil (point-min) (point-max))

这将从行首替换为任意数量的空格和任意数量的注释字符，例如：

     #########

进入

bla

问题是我想将它们替换回原来的样子，所以“bla”必须回到相同数量的空格和相同数量的#。

希望有人了解我正在尝试做的事情，并且对方法有更好的想法或知道如何解决这个正则表达式部分。

【问题讨论】：

我很难准确理解您的要求。您想用一些文本替换初始空格和注释标记，然后在行上运行转换，然后恢复空格和注释标记？
@wvxvw 我告诉你 wvxcw。只是我意识到只有 2 种类型的 cmets。因此，这是唯一阻止编写能够真正满足任何用户需求的脚本的问题！
@user4815162342 我实际上只是想暂时更改那些类型 (1) cmets，以排除它们对齐每个段落的第二个类型。

标签： regex r emacs lisp elisp

【解决方案1】：

嗯，这是一些我认为你在做的疯狂尝试。它似乎有效，但需要大量测试和完善：

(defun has-face-at-point (face &optional position)
  (unless position (setq position (point)))
  (unless (consp face) (setq face (list face)))
  (let ((props (text-properties-at position)))
    (loop for (key value) on props by #'cddr
          do (when (and (eql key 'face) (member value face))
               (return t)))))

(defun face-start (face)
  (save-excursion
    (while (and (has-face-at-point face) (not (bolp)))
      (backward-char))
    (- (point) (save-excursion (move-beginning-of-line 1)) (if  (bolp) 0 -1))))

(defun beautify-side-comments ()
  (interactive)
  ;; Because this function does a lot of insertion, it would
  ;; be better to execute it in the temporary buffer, while
  ;; copying the original text of the file into it, such as
  ;; to prevent junk in the formatted buffer's history
  (let ((pos (cons (save-excursion
                     (beginning-of-line)
                     (count-lines (point-min) (point)))
                   (- (save-excursion (end-of-line) (point)) (point))))
        (content (buffer-string))
        (comments '(font-lock-comment-face font-lock-comment-delimiter-face)))
    (with-temp-buffer
      (insert content)
      (goto-char (point-min))
      ;; thingatpt breaks if there are overlays with their own faces
      (let* ((commentp (has-face-at-point comments))
             (margin
              (if commentp (face-start comments) 0))
             assumed-margin pre-comment commented-lines)
        (while (not (eobp))
          (move-end-of-line 1)
          (cond
           ((and (has-face-at-point comments)
                 commentp)            ; this is a comment continued from
                                        ; the previous line
            (setq assumed-margin (face-start comments)
                  pre-comment
                  (buffer-substring-no-properties
                   (save-excursion (move-beginning-of-line 1))
                   (save-excursion (beginning-of-line) 
                                   (forward-char assumed-margin) (point))))
            (if (every
                 (lambda (c) (or (char-equal c ?\ ) (char-equal c ?\t)))
                 pre-comment)
                ;; This is the comment preceded by whitespace
                (setq commentp nil margin 0 commented-lines 0)
              (if (<= assumed-margin margin)
                  ;; The comment found starts on the left of
                  ;; the margin of the comments found so far
                  (save-excursion
                    (beginning-of-line) 
                    (forward-char assumed-margin)
                    (insert (make-string (- margin assumed-margin) ?\ ))
                    (incf commented-lines))
                ;; This could be optimized by going forward and
                ;; collecting as many comments there are, but
                ;; it is simpler to return and re-indent comments
                ;; (assuming there won't be many such cases anyway.
                (setq margin assumed-margin)
                (move-end-of-line (1- (- commented-lines))))))
           ((has-face-at-point comments)
            ;; This is the fresh comment
            ;; This entire block needs refactoring, it is
            ;; a repetition of the half the previous blockp
            (setq assumed-margin (face-start comments)
                  pre-comment
                  (buffer-substring-no-properties
                   (save-excursion (move-beginning-of-line 1))
                   (save-excursion (beginning-of-line) 
                                   (forward-char assumed-margin) (point))))
            (unless (every
                     (lambda (c)
                       (or (char-equal c ?\ ) (char-equal c ?\t)))
                     pre-comment)
              (setq commentp t margin assumed-margin commented-lines 0)))
           (commentp
            ;; This is the line directly after a block of comments
            (setq commentp nil margin assumed-margin commented-lines 0)))
          (unless (eobp) (forward-char)))
        ;; Retrieve back the formatted contnent
        (setq content (buffer-string))))
    (erase-buffer)
    (insert content)
    (beginning-of-buffer)
    (forward-line (car pos))
    (end-of-line)
    (backward-char (cdr pos))))

我还在 pastebin 上复制了它以提高可读性：http://pastebin.com/C2L9PRDM

编辑：这应该会恢复鼠标位置，但不会恢复滚动位置（可以工作，也许，我只需要查找滚动的存储方式）。

【讨论】：

大量的代码。 :) 你真的需要通过他们的字体锁定面来找到 cmets 吗？ forward-comment 使用内置语法扫描器。
forward-comment 可以取负数，但是我没有用 R 测试过，所以我不知道它在实践中的效果如何。
最初我有一个解决方案，我只使用了一个while循环。问题是即使是少量代码，也需要大约 2 秒。我假设使用长缓冲区需要很长时间。至少，效率不高。有了这么多代码，我很难相信它很快？是吗？
不断重复：继续评论 34 34 前评论
我不知道forward-comment 不会后退，但我的测试似乎证实了这一点。无论哪种方式，您在答案上的努力使我的投票当之无愧。

【解决方案2】：

align-regexp 是你需要的令人敬畏的 emacs 魔法：

(defun align-comments ()
  "align R comments depending on whether at start or in the middle."
  (interactive)
  (align-regexp (point-min) (point-max)  
    "^\\(\\s-*?\\)\\([^[:space:]]+\\)\\(\\s-+\\)#" 3 1 nil) ;type 2 regex
  (align-regexp (point-min) (point-max)  
    "^\\(\\s-*\\)\\(\\s-*\\)#" 2 0 nil))                    ;type 1 regex

之前：

# a comment type 1
      ## another comment type 1
a=1 ###### and a comment type 2 with lots of #####'s
a.much.longer.variable.name=2          # and another, slightly longer type 2 comment    
      ## and a final type 1

之后：

      # a comment type 1
      ## another comment type 1
a=1                           ###### and a comment type 2 with lots of #####'s
a.much.longer.variable.name=2 # and another, slightly longer type 2 comment    
      ## and a final type 1

【讨论】：

接近了！如果你可以扩展它，使 1 型 cmets 之前仍然可以有空格，它就解决了（但我认为这就是问题所在）。
实际上，它在暂存缓冲区中完美运行，但在示例图片中，它还包括类型 1 cmets 并在类型 2 的级别对齐所有内容。
快到了。我们只需要类型 2 的正则表达式，它表示匹配任何内容，但如果它都是空格则不匹配。喜欢 (.+?)|~(\s-+)。
也许编辑你的答案并包括这个？可悲的是，我不知道如何/改变什么。
我进行了编辑以允许保留空白，但它仍会将所有类型 1 对齐到单个垂直点。

【解决方案3】：

试试

(replace-regexp "^\\(\\s-+\\)#" "\\1bla" nil (point-min) (point-max))

然后

(replace-regexp "^\\(\\s-+\\)bla+" "\\1#" nil (point-min) (point-max))

但如果我理解你的话，我可能会做类似的事情：

(align-string "\b\s-#" begin end)

【讨论】：

align-strings 不是函数？
问题是当你有#的代码块时它不计算需要的###的数量（那么你必须替换多个#，但更重要的是，如何你会把那些拿回来吗？
正确的函数名是align-string（没有s，我已经编辑了我的答案）
我的Emacs中不存在align-string，你知道它是否带有特定的包吗？
它在“emacs-goodies-el”debian 包中，也可以在pvv.org/~markusk/align-string.el 获得。也就是说，我相信使用 align-regexp 的解决方案可能更灵活。