操纵 V8 ast答案

【问题标题】：Manipulating the V8 ast操纵 V8 ast
【发布时间】：2013-03-25 00:18:23
【问题描述】：

我打算直接在 v8 代码中实现一个 js 代码覆盖。我最初的目标是为抽象语法树中的每个语句添加一个简单的打印。我看到有一个 AstVisitor 类，它允许您遍历 AST。所以我的问题是如何在访问者当前访问的语句之后向 AST 添加语句？

【问题讨论】：

基本块是控制流图的构造，而不是 AST。您是否打算从 AST 创建 CFG？
我可能将两者混合，但我认为 ast 的节点也是基本块？
哪些节点？无论如何，我不知道任何与基本块匹配的常见 AST 节点（尽管当然有可能拥有一个也维护 CFG-ish 信息并将其称为“AST”的数据结构）。例如，一个循环通常是一个 AST 节点，但许多循环由几个 BB 组成。循环节点可能包含一个语句节点列表，但其中一些语句对应于一个 BB 的部分（例如简单赋值），而另一些则扩展为几个 BB（例如任何内联条件或嵌套循环）。也许你误用了“基本块”这个词？
你是对的，我把两者混合了，ast 不会帮助我添加命令。 v8 在解析时使用 CFG 吗？
我不知道，其实我从来没有看过V8的内部结构。

标签： javascript compilation v8 abstract-syntax-tree

【解决方案1】：

好的，我会总结一下我的实验。首先，我写的内容适用于 V8，因为它在 Chromium 版本 r157275 中使用，因此可能不再适用 - 但我仍然会链接到当前版本中的位置。

如前所述，您需要自己的 AST 访问者，例如 MyAstVisior，它继承自 AstVisitor，并且必须从那里实现一堆 VisitXYZ 方法。唯一需要检测/检查执行代码的是VisitFunctionLiteral。执行的代码要么是一个函数，要么是源（文件）中的一组松散语句，V8 将其包装在一个函数中，然后执行。

然后，就在已解析的 AST 转换为代码之前，here（从松散语句中编译的函数）和there（在运行时编译，当第一次执行预定义函数时），您将访问者传递给函数文字，它将在访问者上调用 VisitFunctionLiteral：

MyAstVisitor myAV(info);
info->function()->Accept(&myAV);
// next line is the V8 compile call
if (!MakeCode(info)) {

我将 CompilationInfo 指针 info 传递给自定义访问者，因为需要它来修改 AST。构造函数如下所示：

MyAstVisitor(CompilationInfo* compInfo) :
    _ci(compInfo), _nf(compInfo->isolate(), compInfo->zone()), _z(compInfo->zone()){};

_ci、_nf 和 _z 是指向 CompilationInfo、AstNodeFactory<AstNullVisitor> 和 Zone 的指针。

现在在VisitFunctionLiteral 中，您可以遍历函数体并根据需要插入语句。

void MyAstVisitor::VisitFunctionLiteral(FunctionLiteral* funLit){
    // fetch the function body
    ZoneList<Statement*>* body = funLit->body();
    // create a statement list used to collect the instrumented statements
    ZoneList<Statement*>* _stmts = new (_z) ZoneList<Statement*>(body->length(), _z);
    // iterate over the function body and rewrite each statement
    for (int i = 0; i < body->length(); i++) {
       // the rewritten statements are put into the collector
       rewriteStatement(body->at(i), _stmts);
    }
    // replace the original function body with the instrumented one
    body->Clear();
    body->AddAll(_stmts->ToVector(), _z);
}

在rewriteStatement 方法中，您现在可以检查语句。 _stmts 指针包含一个语句列表，最终将替换原始函数体。因此，要在每个语句之后添加一个打印语句，您首先添加原始语句，然后添加您自己的打印语句：

void MyAstVisitor::rewriteStatement(Statement* stmt, ZoneList<Statement*>* collector){
    // add original statement
    collector->Add(stmt, _z);

    // create and add print statement, assuming you define print somewhere in JS:

    // 1) create handle (VariableProxy) for print function
    Vector<const char> fName("print", 5);
    Handle<String> fNameStr = Isolate::Current()->factory()->NewStringFromAscii(fName, TENURED);
    fNameStr = Isolate::Current()->factory()->SymbolFromString(fNameStr);
    // create the proxy - (it is vital to use _ci->function()->scope(), _ci->scope() crashes)
    VariableProxy* _printVP = _ci->function()->scope()->NewUnresolved(&_nf, fNameStr, Interface::NewUnknown(_z), 0);

    // 2) create message
    Vector<const char> tmp("Hello World!", 12);
    Handle<String> v8String = Isolate::Current()->factory()->NewStringFromAscii(tmp, TENURED);
    Literal* msg = _nf.NewLiteral(v8String);

    // 3) create argument list, call expression, expression statement and add the latter to the collector
    ZoneList<Expression*>* args = new (_z) ZoneList<Expression*>(1, _z);
    args->Add(msg);
    Call* printCall = _nf.NewCall(_printVP, args, 0);
    ExpressionStatement* printStmt = _nf.NewExpressionStatement(printCall);
    collector->Add(printStmt, _z);   
}

NewCall 和NewUnresolved 的最后一个参数是指定脚本中位置的数字。我假设这用于调试/错误消息，以告知错误发生的位置。至少我从来没有遇到过将其设置为 0 的问题（在某处 kNoPosition 也有一个常数）。

最后一句话：这实际上不会在每个语句之后添加打印语句，因为Blocks（例如循环体）是表示语句列表的语句，而循环是具有条件表达式和主体块的语句。因此，您需要检查当前处理的是哪种语句并递归地查看它。重写块与重写函数体几乎相同。

但是当您开始替换或修改现有语句时，您会遇到问题，因为 AST 还携带有关分支的信息。因此，如果您在某些情况下替换跳转目标，则会破坏您的代码。我想如果直接将重写功能添加到单个表达式和语句类型而不是创建新的来替换它们，我想这可能会被覆盖。

到目前为止，我希望它有所帮助。

【讨论】：

加德。作为替代方案，请考虑我在“轻松实现任意语言的分支覆盖”semdesigns.com/Company/Publications/TestCoverage.pdf 中概述的方法