【问题标题】:Recommend C front-end that preserves preprocessor directives推荐保留预处理器指令的 C 前端
【发布时间】:2010-01-26 23:41:42
【问题描述】:

我想启动一个涉及转换C 代码的项目,但我想包含 预处理器指令。我不想通过编写自己的 C 解析器来重新发明轮子,所以有人知道可以解析 C 预处理器和 C 代码,并生成可用于重新生成(或漂亮打印)原始源代码的 AST?

例如:

#define FILENAME "filename"
#include <stdio.h>

FILE *f=0;
...
if (file_is_open) {
#ifdef CAN_OPEN_IT
    f = fopen(FILENAME, "r");
#else
    printf("Unable to open file.\n");
#endif
}

上面的代码应该被解析成一些内存中的表示,可以用来重新生成源代码。换句话说,它不应该像普通的 C 那样分两个阶段处理,首先处理 PP 指令,然后解析纯 C 代码。相反,它应该代表整个编译时逻辑,包括预处理器变量。

【问题讨论】:

    标签: c compiler-construction c-preprocessor abstract-syntax-tree


    【解决方案1】:

    【讨论】:

    • 我不相信 Clang 在其 AST 中捕获预处理器指令。
    【解决方案2】:

    我们的DMS Software Reengineering Toolkita C front end(和一个C++ 前端):

    • 将各种方言中的 C 源代码解析(可编译)为 AST,
    • 在大多数情况下将预处理器指令保留为 AST 节点
    • 可以从 AST 重新生成可编译的 C 代码(使用 cmets 和预处理器指令)
    • 可以在单个图像中收集数千个文件以允许跨文件分析和转换
    • 提供完整的符号表构造和访问
    • 通过大型 AST 操作库提供对 AST 的过程访问,包括导航、检查、插入、删除、替换、匹配......
    • 使用与 AST 匹配的 C 表示法编写的模式提供源到源的转换

    对于 C(尚不支持 C++),DMS 还提供:

    • 控制和数据流分析
    • 本地和全局指向分析
    • 全局调用图构建

    DMS 已被用于处理超大型 C 应用程序,以提取事实并从原始源代码库生成新的派生代码。

    (编辑:2016 年 2 月)

    它可以处理 OP 的示例(稍作修正以使其有效)。 这是稍作修改的来源:

    #define FILENAME "filename"
    #include <stdio.h>
    
    FILE *f;
    main() {
      f=0;
    if (file_is_open) {
    #ifdef CAN_OPEN_IT
    f = fopen(FILENAME, "r");
    #else
    printf("Unable to open file.\n");
    #endif
    }
    
    }
    

    这是生成的 AST:

    C~GCC4 Domain Parser Version 3.0.1(28449)
    Copyright (C) 1996-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
    Powered by DMS (R) Software Reengineering Toolkit
    AST Optimizations: remove constant tokens, remove unary productions, compact sequences
    Using encoding Unicode-UTF-8?ANSI +CRLF +1 /^I
    (translation_unit@C~GCC4=2#4a7e0e0^0 Line 1 Column 1 File C:/temp/test.c
     (declaration_seq@C~GCC4=605#4a77580^1#4a7e0e0:1 {4} Line 1 Column 1 File C:/temp/test.c
      (control_line@C~GCC4=1094#4a775c0^1#4a77580:1 Line 1 Column 1 File C:/temp/test.c
       ('#'@C~GCC4=1548#4a771c0^1#4a775c0:1[Keyword:0] Line 1 Column 1 File C:/temp/test.c)'#'
       (IDENTIFIER@C~GCC4=1531#4a77200^1#4a775c0:2[`FILENAME'] Line 1 Column 9 File C:/temp/test.c)IDENTIFIER
       (<!MacroDefinition>@C~GCC4=1603#4a77180^2#4a775c0:3#4a7f300:1[`FILENAME'] Line 1 Column 18 File C:/temp/test.c
    $VOID$ [Child 1]
       |(STRING_LITERAL@C~GCC4=1525#4a77160^2#4a77180:2#4a7f300:2[`filename'] Line 1 Column 18 File C:/temp/test.c)STRING_LITERAL
    $VOID$ [Child 3]
       )<!MacroDefinition>#4a77180
       (new_line@C~GCC4=1578#4a77260^1#4a775c0:4[Keyword:0] Line 1 Column 28 File C:/temp/test.c)new_line
      )control_line#4a775c0
      (control_line@C~GCC4=1104#4a77460^1#4a77580:2 Line 2 Column 1 File C:/temp/test.c
       ('#'@C~GCC4=1548#4a77340^1#4a77460:1[Keyword:0] Line 2 Column 1 File C:/temp/test.c)'#'
       (ANGLED_HEADER_NAME@C~GCC4=1589#4a77380^1#4a77460:2[`stdio.h'] Line 2 Column 10 File C:/temp/test.c)ANGLED_HEADER_NAME
       (new_line@C~GCC4=1578#4a773c0^1#4a77460:3[Keyword:0] Line 2 Column 19 File C:/temp/test.c)new_line
      )control_line#4a77460
      (simple_declaration@C~GCC4=631#4a774c0^1#4a77580:3 Line 4 Column 1 File C:/temp/test.c
       (IDENTIFIER@C~GCC4=1531#4a77360^1#4a774c0:1[`FILE'] Line 4 Column 1 File C:/temp/test.c)IDENTIFIER
       (declarator@C~GCC4=850#4a77520^1#4a774c0:2 Line 4 Column 6 File C:/temp/test.c
       |(ptr_operator@C~GCC4=866#4a77560^1#4a77520:1 Line 4 Column 6 File C:/temp/test.c)ptr_operator
       |(IDENTIFIER@C~GCC4=1531#4a77480^1#4a77520:2[`f'] Line 4 Column 7 File C:/temp/test.c)IDENTIFIER
       )declarator#4a77520
      )simple_declaration#4a774c0
      (function_definition@C~GCC4=966#4a77be0^1#4a77580:4 Line 5 Column 1 File C:/temp/test.c
       (direct_declarator@C~GCC4=852#4a77440^1#4a77be0:1 Line 5 Column 1 File C:/temp/test.c
       |(IDENTIFIER@C~GCC4=1531#4a774e0^1#4a77440:1[`main'] Line 5 Column 1 File C:/temp/test.c)IDENTIFIER
       |(parameter_declaration_clause@C~GCC4=900#4a77220^1#4a77440:2 Line 5 Column 6 File C:/temp/test.c)parameter_declaration_clause
       )direct_declarator#4a77440
       (compound_statement@C~GCC4=507#4a77b20^1#4a77be0:2 Line 5 Column 8 File C:/temp/test.c
       |(statement_seq@C~GCC4=511#4a77d20^1#4a77b20:1 {2} Line 6 Column 3 File C:/temp/test.c
       | (AMBIGUITY<statement=358>@C~GCC4=1602#4a77680^1#4a77d20:1{2} Line 6 Column 3 File C:/temp/test.c
       |  (expression_statement@C~GCC4=503#4a7e040^1#4a77680:1 Line 6 Column 3 File C:/temp/test.c
       |   (assignment_expression@C~GCC4=457#4a77f00^1#4a7e040:1 Line 6 Column 3 File C:/temp/test.c
       |   |(assignment_target@C~GCC4=470#4a77a00^1#4a77f00:1 Line 6 Column 3 File C:/temp/test.c
       |   | (IDENTIFIER@C~GCC4=1531#4a77400^2#4a77a00:1#4a77fc0:1[`f'] Line 6 Column 3 File C:/temp/test.c)IDENTIFIER
       |   |)assignment_target#4a77a00
       |   |(INT_LITERAL@C~GCC4=1471#4a77a60^2#4a77f00:2#4a77f60:1[0] Line 6 Column 5 File C:/temp/test.c)INT_LITERAL
       |   )assignment_expression#4a77f00
       |  )expression_statement#4a7e040
       |  (simple_declaration@C~GCC4=630#4a7e060^1#4a77680:2 Line 6 Column 3 File C:/temp/test.c
       |   (init_declarator@C~GCC4=835#4a77fc0^1#4a7e060:1 Line 6 Column 3 File C:/temp/test.c
       |   |(IDENTIFIER@C~GCC4=1531#4a77400^2... [ALREADY PRINTED] ...)
       |   |(initializer@C~GCC4=983#4a77f60^1#4a77fc0:2 Line 6 Column 4 File C:/temp/test.c
       |   | (INT_LITERAL@C~GCC4=1471#4a77a60^2... [ALREADY PRINTED] ...)
       |   |)initializer#4a77f60
       |   )init_declarator#4a77fc0
       |  )simple_declaration#4a7e060
       | )AMBIGUITY#4a77680
       | (selection_statement@C~GCC4=527#4a77b40^1#4a77d20:2 Line 7 Column 1 File C:/temp/test.c
       |  (IDENTIFIER@C~GCC4=1531#4a7e0c0^1#4a77b40:1[`file_is_open'] Line 7 Column 5 File C:/temp/test.c)IDENTIFIER
       |  (compound_statement@C~GCC4=507#4a77ae0^1#4a77b40:2 Line 7 Column 19 File C:/temp/test.c
       |   (statement@C~GCC4=490#4a7f840^1#4a77ae0:1 Line 8 Column 1 File C:/temp/test.c
       |   |(if_directive@C~GCC4=1088#4a7f1c0^1#4a7f840:1 Line 8 Column 1 File C:/temp/test.c
       |   | ('#'@C~GCC4=1548#4a7f240^1#4a7f1c0:1[Keyword:0] Line 8 Column 1 File C:/temp/test.c)'#'
       |   | (IDENTIFIER@C~GCC4=1531#4a7ee60^1#4a7f1c0:2[`CAN_OPEN_IT'] Line 8 Column 8 File C:/temp/test.c)IDENTIFIER
       |   | (new_line@C~GCC4=1578#4a7f1e0^1#4a7f1c0:3[Keyword:0] Line 8 Column 19 File C:/temp/test.c)new_line
       |   |)if_directive#4a7f1c0
       |   |(AMBIGUITY<statement=358>@C~GCC4=1602#4a77d40^1#4a7f840:2{2} Line 9 Column 5 File C:/temp/test.c
       |   | (expression_statement@C~GCC4=503#4a7f4a0^1#4a77d40:1 Line 9 Column 5 File C:/temp/test.c
       |   |  (assignment_expression@C~GCC4=457#4a7f3c0^1#4a7f4a0:1 Line 9 Column 5 File C:/temp/test.c
       |   |   (assignment_target@C~GCC4=470#4a7eec0^1#4a7f3c0:1 Line 9 Column 5 File C:/temp/test.c
       |   |   |(IDENTIFIER@C~GCC4=1531#4a7eee0^2#4a7eec0:1#4a7f400:1[`f'] Line 9 Column 5 File C:/temp/test.c)IDENTIFIER
       |   |   )assignment_target#4a7eec0
       |   |   (postfix_expression@C~GCC4=201#4a7f2e0^1#4a7f3c0:2 Line 9 Column 9 File C:/temp/test.c
       |   |   |(IDENTIFIER@C~GCC4=1531#4a7f120^2#4a7f2e0:1#4a7f160:1[`fopen'] Line 9 Column 9 File C:/temp/test.c)IDENTIFIER
       |   |   |(expression_list@C~GCC4=228#4a7f260^2#4a7f2e0:2#4a7f160:2 Line 9 Column 15 File C:/temp/test.c
       |   |   | (<!MacroCall>@C~GCC4=1607#4a7f300^1#4a7f260:1[`FILENAME'] Line 9 Column 15 File C:/temp/test.c
       |   |   |  (<!MacroDefinition>@C~GCC4=1603#4a77180^2... [ALREADY PRINTED] ...)
       |   |   |  (STRING_LITERAL@C~GCC4=1525#4a77160^2... [ALREADY PRINTED] ...)
       |   |   |  $VOID$ [Child 3]
       |   |   |  (STRING_LITERAL@C~GCC4=1525#4a7f2c0^1#4a7f300:4[`filename'] Line 1 Column 18 File C:/temp/test.c)STRING_LITERAL
       |   |   |  $VOID$ [Child 5]
       |   |   | )<!MacroCall>#4a7f300
       |   |   | (STRING_LITERAL@C~GCC4=1525#4a7f140^1#4a7f260:2[`r'] Line 9 Column 25 File C:/temp/test.c)STRING_LITERAL
       |   |   |)expression_list#4a7f260
       |   |   )postfix_expression#4a7f2e0
       |   |  )assignment_expression#4a7f3c0
       |   | )expression_statement#4a7f4a0
       |   | (simple_declaration@C~GCC4=630#4a7f480^1#4a77d40:2 Line 9 Column 5 File C:/temp/test.c
       |   |  (init_declarator@C~GCC4=835#4a7f400^1#4a7f480:1 Line 9 Column 5 File C:/temp/test.c
       |   |   (IDENTIFIER@C~GCC4=1531#4a7eee0^2... [ALREADY PRINTED] ...)
       |   |   (initializer@C~GCC4=983#4a7f3e0^1#4a7f400:2 Line 9 Column 7 File C:/temp/test.c
       |   |   |(postfix_expression@C~GCC4=201#4a7f160^1#4a7f3e0:1 Line 9 Column 9 File C:/temp/test.c
       |   |   | (IDENTIFIER@C~GCC4=1531#4a7f120^2... [ALREADY PRINTED] ...)
       |   |   | (expression_list@C~GCC4=228#4a7f260^2... [ALREADY PRINTED] ...)
       |   |   |)postfix_expression#4a7f160
       |   |   )initializer#4a7f3e0
       |   |  )init_declarator#4a7f400
       |   | )simple_declaration#4a7f480
       |   |)AMBIGUITY#4a77d40
       |   |(else_directive@C~GCC4=1091#4a7f4c0^1#4a7f840:3 Line 10 Column 1 File C:/temp/test.c
       |   | ('#'@C~GCC4=1548#4a7f500^1#4a7f4c0:1[Keyword:0] Line 10 Column 1 File C:/temp/test.c)'#'
       |   | (new_line@C~GCC4=1578#4a7f4e0^1#4a7f4c0:2[Keyword:0] Line 10 Column 6 File C:/temp/test.c)new_line
       |   |)else_directive#4a7f4c0
       |   |(expression_statement@C~GCC4=503#4a7f7c0^1#4a7f840:4 Line 11 Column 5 File C:/temp/test.c
       |   | (postfix_expression@C~GCC4=201#4a77ba0^1#4a7f7c0:1 Line 11 Column 5 File C:/temp/test.c
       |   |  (IDENTIFIER@C~GCC4=1531#4a7f640^1#4a77ba0:1[`printf'] Line 11 Column 5 File C:/temp/test.c)IDENTIFIER
       |   |  (STRING_LITERAL@C~GCC4=1525#4a77c20^1#4a77ba0:2[`Unable to open file.
    '] Line 11 Column 12 File C:/temp/test.c)STRING_LITERAL
       |   | )postfix_expression#4a77ba0
       |   |)expression_statement#4a7f7c0
       |   |(endif_directive@C~GCC4=1092#4a7f7e0^1#4a7f840:5 Line 12 Column 1 File C:/temp/test.c
       |   | ('#'@C~GCC4=1548#4a7f720^1#4a7f7e0:1[Keyword:0] Line 12 Column 1 File C:/temp/test.c)'#'
       |   | (new_line@C~GCC4=1578#4a7f700^1#4a7f7e0:2[Keyword:0] Line 12 Column 7 File C:/temp/test.c)new_line
       |   |)endif_directive#4a7f7e0
       |   )statement#4a7f840
       |  )compound_statement#4a77ae0
       | )selection_statement#4a77b40
       |)statement_seq#4a77d20
       )compound_statement#4a77b20
      )function_definition#4a77be0
     )declaration_seq#4a77580
    )translation_unit#4a7e0e0
    

    您可以在第 8 行看到预处理器指令为“if_directive”。

    是的,DMS 也可以漂亮地打印这棵树。以下命令运行解析器以生成 AST,然后运行 ​​DMS 漂亮打印机以仅从树中重新生成源。往返准确;您可以重新编译并获得相同的结果。评论也会被保留。

    C:\DMS\Domains\C\GCC4\Tools\PrettyPrinter>run domainprettyprinter \temp\test.c
    C~GCC4 PrettyPrinter Version 1.2.13
    Copyright (C) 2004-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
    Powered by DMS (R) Software Reengineering Toolkit
    
    #define FILENAME "filename"
    #include <stdio.h>
    FILE *f;
    
    main()
    {
      f = 0;
      if (file_is_open)
        {
          #ifdef CAN_OPEN_IT
            f = fopen(FILENAME, "r");
          #else
            printf("Unable to open file.\n");
          #endif
        }
    }
    

    You can see how DMS handles C++。在这一点上,它处理了 GCC 和 MS 方言的所有 C++14。

    【讨论】:

      【解决方案3】:

      以 GNU gcc 编译器为例,预处理源代码所需的标志是gcc -E mysource.c,更多信息请参见here。至于漂亮的打印,有indent,这解释了here的用法,这有点旧,但仍然值得一提。还有cflow可以生成源图。

      抱歉,如果我误解了您要查找的内容...

      【讨论】:

      • 为什么投反对票?我提到了缩进和 cflow ......但是当问题的上下文包括“漂亮的打印”时,为什么需要 AST 的问题完全不清楚。如果投反对票,最好发表评论解释为什么而不是忽略它,这违背了 SO 的精神。
      • 投反对票;他们很讨厌。通常,它们不会对您的声誉造成无法弥补的损害。
      • @Jonathan:快速提问,之前我对stackoverflow.com/questions/2142796/… 投了 3 票,但现在显示为 5,而不是 30,为什么?
      • 对不起,如果不清楚,我正在寻找可以解析 C 和预处理器代码的东西,不一定是漂亮的打印机,但我提到这个的原因是漂亮的打印机可能会解析 CPP代码。我想要的是能够生成包含 CPP 逻辑的 AST 的东西。我不关心漂亮的印刷本身。
      • @Steve:好的,我能给出的最好的答案是看Antlr的语法在这里解析...antlr.org/grammar/list...usingAntlr 你可以生成一个AST并且有多种语言接口,即C#、C , CPP, Java 可以使用 Antlr 库进行解析,如果这就是你要找的... :)
      【解决方案4】:

      【讨论】:

      • 这似乎是关于(ANTLR)解析器生成器,它产生用 C 实现的解析器。OP 想要 parses C 的东西。我错过了什么吗?
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-05
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多