Boost Spirit：解析布尔表达式并简化为规范范式答案

【问题标题】：Boost Spirit: parse boolean expression and reduce to canonical normal formBoost Spirit：解析布尔表达式并简化为规范范式
【发布时间】：2020-02-25 03:36:56
【问题描述】：

我想用or、and 和not 运算符解析一个常见的布尔值，我想我已经使用下面的 Boost Spirit 完成了。在第 2 阶段（或者可能是解析本身的一部分），我希望将布尔值的 AST 转换为 disjunctive canonical normal form，这实质上“扁平化”了表达式并删除了所有分组运算符。

在我的一次尝试中，我创建了下面的 Boost static_visitor，命名为 Transformer。如果子节点和父节点都不是运算符，我首先尝试通过将子节点分配给它的祖父节点来消除双重非运算符。我的问题是指当前节点的父节点。似乎没有办法引用当前节点的父节点，因为一旦访问了一个节点，访问函数就会重载“变体”的内部类型，从而丢弃对象的variant 性质。任何帮助表示赞赏。

struct op_or  {};
struct op_and {};
struct op_not {};

typedef std::string var;
template <typename tag> struct binop;
template <typename tag> struct uniop;

typedef boost::variant
    <
        var,
        boost::recursive_wrapper<uniop<op_not>>,
        boost::recursive_wrapper<binop<op_and>>,
        boost::recursive_wrapper<binop<op_or>>
    >
    expr;

template <typename tag> struct uniop
{
    explicit uniop(expr const& o) : exp_u(o) { }
    expr exp_u;
};

template <typename tag> struct binop
{
    explicit binop(expr const& l, expr const& r) : exp_l(l), exp_r(r) { }
    expr exp_l, exp_r;
};

struct transformer : boost::static_visitor<void>
{
    std::deque<std::reference_wrapper<expr>> stk;

    transformer(expr & e)
    {
        stk.push_back(e);
    }

    void operator()(var const& v) const { }

    void operator()(uniop<op_not> & u)
    {
        if (boost::get<uniop<op_not>>(&stk.back().get()) != nullptr)
        {
            stk.back() = u.exp_u;
        }
        else
        {
            stk.push_back(std::ref(u));  // <<=== Fails with "no matching function for call"
            boost::apply_visitor(*this, u.exp_u);
            stk.pop_back();
        }
    }
    void operator()(binop<op_and> & b)
    {
        stk.push_back(std::ref(u));
        boost::apply_visitor(*this, b.exp_l);
        boost::apply_visitor(*this, b.exp_r);
        stk.pop_back();
    }
    void operator()(binop<op_or> & b)
    {
        stk.push_back(std::ref(u));
        boost::apply_visitor(*this, b.exp_l);
        boost::apply_visitor(*this, b.exp_r);
        stk.pop_back();
    }
};

template <typename It, typename Skipper = boost::spirit::qi::space_type>
struct parser : boost::spirit::qi::grammar<It, expr(), Skipper>
{
    parser() : parser::base_type(expr_)
    {
        using namespace boost::phoenix;
        using namespace boost::spirit::qi;

        using boost::spirit::qi::_1;

        expr_  = or_.alias();

        or_  = and_ [ _val = _1 ] >> *("or" >> and_ [ _val = construct<binop<op_or>>(_val, _1) ]);
        and_ = not_ [ _val = _1 ] >> *("and" >> not_ [ _val = construct<binop<op_and>>(_val, _1) ]);
        not_ = "not" > simple [ _val = construct<uniop<op_not>>(_1) ] | simple [ _val = _1 ];

        simple =  '(' > expr_ > ')' | var_;
        var_ = lexeme[ +alpha ];
    }

private:
    boost::spirit::qi::rule<It, var() , Skipper> var_;
    boost::spirit::qi::rule<It, expr(), Skipper> not_, and_, or_, simple, expr_;
};

【问题讨论】：

标签： c++ grammar boost-spirit

【解决方案1】：

看来到 DCNF 的转换是 NP 完全的。因此，您可以期望做出让步。

您的高度简化的子任务只是消除了双重否定。看起来您试图保留一堆父表达式引用 (stk) 但是：

您没有明确展示提取或返回简化表达式的方法（原始表达式不会改变）
您尝试推送 uniop<> 节点作为对类型不匹配的 expr 节点的引用：
```
stk.push_back(std::ref(u));  // <<=== Fails with "no matching function for call"
```
对我来说，这只是
```
transformer(expr & e)        {
    stk.push_back(e);
}
```
无法递归到子表达式。如果是这样，您可以相信周围的expr& 已经在堆栈上。 binop/unop 处理程序也是如此，它们都试图推送对 u 的引用，而 u 当时甚至不存在于范围内，并且可能是为了推送当前节点，该节点运行到相同类型的类型不匹配。

第一：`simplify`

我认为以函数式风格编写这些内容要容易得多：与其“操作”对象图，不如让转换返回转换后的结果。

这意味着您可以保持所有节点类型不变，除非您的节点类型是嵌套否定。下面是它的外观：

struct simplify {
    typedef expr result_type;

    // in general, just identity transform
    template <typename E> auto operator()(E const& e) const { return e; }

    // only handle these:
    auto operator()(expr const& e) const { return apply_visitor(*this, e); }
    expr operator()(unop<op_not> const& e) const {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            return nested_negation->exp_u;
        }
        return e;
    }
};

执行它的简单测试程序是：

Live On Coliru

std::vector<expr> tests {
    "a",
    NOT{"a"},
    AND{"a", "b"},
    OR{"a","b"},
    AND{NOT{"a"},NOT{"b"}},
    NOT{{NOT{"a"}}},
};

const simplifier simplify{};

for (expr const& expr : tests) {
    std::cout << std::setw(30) << str(expr) << " -> " << simplify(expr) << "\n";
}

印刷：

                       "a" -> "a"
                  NOT{"a"} -> NOT{"a"}
              AND{"a","b"} -> AND{"a","b"}
               OR{"a","b"} -> OR{"a","b"}
    AND{NOT{"a"},NOT{"b"}} -> AND{NOT{"a"},NOT{"b"}}
             NOT{NOT{"a"}} -> "a"

使用堆栈/变异

类似地使用堆栈**似乎*同样容易：

这里有龙

struct stack_simplifier {
    typedef void result_type;
    std::deque<std::reference_wrapper<expr>> stk;

    void operator()(expr& e) {
        stk.push_back(e);
        apply_visitor(*this, e);
        stk.pop_back();
    }

    template <typename Other>
    void operator()(Other&) {}

    void operator()(unop<op_not>& e) {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            stk.back().get() = nested_negation->exp_u;
        }
    }
};

用法将不再是const（因为函数不纯），expr 参数也是如此（它将被改变）：

for (expr expr : tests) {
    std::cout << std::setw(30) << str(expr);

    stack_simplifier{}(expr);
    std::cout << " -> " << expr << "\n";
}

它 /does/ 似乎有效 (Live On Coliru)，但也有明显的缺点：

堆栈没有实际用途，只检查顶部元素（您可以replace it with a pointer to the current expression node）
仿函数对象是非纯的/非常量的
表达式树在遍历时发生了变异。这只是你调用Undefined Behaviour的定时炸弹：在
```
void operator()(unop<op_not>& e) {
    if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
        stk.back().get() = nested_negation->exp_u;
    }
}
```
在赋值给栈顶的表达式之后，对e 的引用是悬空的。 nested_negation 也是如此。在此之后取消引用是UB。

现在在这个简单的场景中（折叠双重否定），似乎并不难在心里检查这是否真的没问题。错误

事实证明，变体上的operator= 调用variant_assign，如下所示：

void variant_assign(const variant& rhs)
{
    // If the contained types are EXACTLY the same...
    if (which_ == rhs.which_)
    {
        // ...then assign rhs's storage to lhs's content:
        detail::variant::assign_storage visitor(rhs.storage_.address());
        this->internal_apply_visitor(visitor);
    }
    else
    {
        // Otherwise, perform general (copy-based) variant assignment:
        assigner visitor(*this, rhs.which());
        rhs.internal_apply_visitor(visitor); 
    }
}

assigner 访问者有一个致命的细节（选择了 nothrow-aware 重载之一）：

template <typename RhsT, typename B1, typename B2>
void assign_impl(
      const RhsT& rhs_content
    , mpl::true_ // has_nothrow_copy
    , B1 // is_nothrow_move_constructible
    , B2 // has_fallback_type
    ) const BOOST_NOEXCEPT
{
    // Destroy lhs's content...
    lhs_.destroy_content(); // nothrow

    // ...copy rhs content into lhs's storage...
    new(lhs_.storage_.address())
        RhsT( rhs_content ); // nothrow

    // ...and indicate new content type:
    lhs_.indicate_which(rhs_which_); // nothrow
}

OOPS 原来左手边先被破坏了。然而在

    stk.back().get() = nested_negation->exp_u;

右侧是左侧的子对象 (!!!)。避免UB 的不直观方法是临时复制¹：

    expr tmp = nested_negation->exp_u;
    stk.back().get() = tmp;

假设您正在应用像德摩根定律这样的变换。如果子表达式中（也）存在嵌套否定怎么办？

在我看来，变异方法只是不必要地容易出错。

递归的、不可变的转换又名 Joy

到目前为止，这些方法还存在另一个问题。嵌套的子表达式在此处不进行转换。例如

  NOT{NOT{AND{"a",NOT{NOT{"b"}}}}} -> AND{"a",NOT{NOT{"b"}}}

而不是所需的AND{"a","b"}。这在纯功能方法中很容易解决：

struct simplifier {
    typedef expr result_type;

    template <typename T> auto operator()(T const& v) const { return call(v); }

  private:
    auto call(var const& e) const { return e; }
    auto call(expr const& e) const {
        auto s = apply_visitor(*this, e);
        return s;
    }
    expr call(unop<op_not> const& e) const {
        if (auto nested_negation = boost::strict_get<unop<op_not>>(&e.exp_u)) {
            return call(nested_negation->exp_u);
        }

        return unop<op_not> {call(e.exp_u)};
    }
    template <typename Op> auto call(binop<Op> const& e) const {
        return binop<Op> {call(e.exp_l), call(e.exp_r)};
    }
};

一切仍然是不可变的，但我们处理所有类型的表达式以递归它们的子表达式。现在它打印：

Live On Coliru

                               "a" -> "a"
                          NOT{"a"} -> NOT{"a"}
                      AND{"a","b"} -> AND{"a","b"}
                       OR{"a","b"} -> OR{"a","b"}
            AND{NOT{"a"},NOT{"b"}} -> AND{NOT{"a"},NOT{"b"}}
                     NOT{NOT{"a"}} -> "a"
  NOT{NOT{AND{"a",NOT{NOT{"b"}}}}} -> AND{"a","b"}

为了完整起见，对“stack_simplifier”进行了类似的转换：http://coliru.stacked-crooked.com/a/cc5627aa37f0c969

¹实际上可能会使用移动语义，但为了清楚起见我忽略了

【讨论】：

感谢您的详细回答。我对simplifier 类中的auto call(expr const& e) const 方法有点困惑，因为这个类不是从boost::static_visitor 派生的。似乎它可能只被调用一次，就在初始operator() 调用之后。
从static_visitor 派生的价值不大：唯一相关的细节是嵌套的result_type 类型，所以这就是我使用的。此外，在最近的编译器+boost 版本中，可以在许多情况下推导出结果类型，这允许将直接（多态）lambda 用作访问者。（C++ 一直在改进，而且总是需要相当长的时间来消除旧的代码习惯）

第一：simplify

使用堆栈/变异

递归的、不可变的转换又名 Joy

第一：`simplify`