【问题标题】:Handling malformed XML with serde / serde-xml-rs使用 serde / serde-xml-rs 处理格式错误的 XML
【发布时间】:2021-09-09 16:59:31
【问题描述】:

如果您曾经使用过serdeserde-xml-rs,那么您一定见过the sample code。他们所有的示例代码总是upwrap()s from_reader() 函数调用。但是,如果我们需要像下面的扩展示例代码那样实际处理错误,会发生什么?

Cargo.toml

[package]
name = "so-help"
version = "0.1.0"
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
serde = "1.0.130"
serde_derive = "1.0.130"
serde-xml-rs = "0.5.0"

src/main.rs

#[macro_use]
extern crate serde_derive;
extern crate serde;
extern crate serde_xml_rs;

use serde_xml_rs::from_reader;

#[derive(Debug, Deserialize)]
struct Item {
    pub name: String,
    pub source: String
}

#[derive(Debug, Deserialize)]
struct Project {
    pub name: String,

    #[serde(rename = "Item", default)]
    pub items: Vec<Item>
}

fn main() {
    let correct = r##"
        <Project name="my_project">
            <Item name="hello" source="world.rs" />
        </Project>
    "##;
    let project: Project = from_reader(correct.as_bytes()).unwrap();
    println!("{:#?}", project);

    let malformed = r##"
        <Project name="malformed">
            <malformed name="Hello" source="world.rs />
            <WeDontClose This>
        </Project>
    "##;
    let messedup: Project = from_reader(malformed.as_bytes()).unwrap();
    println!("{:#?}", messedup);
}

malformed 变量包含导致from_reader() 返回错误的格式错误的 XML 数据,但由于示例始终使用 unwrap(),因此从未说明如何处理此错误状态。所以当我们运行我们的代码时,我们得到...

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/so-help`
Project {
    name: "my_project",
    items: [
        Item {
            name: "hello",
            source: "world.rs",
        },
    ],
}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Syntax { source: Error { pos: 4:13, kind: Syntax("Unexpected token inside attribute value: <") } }', src/main.rs:37:63
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

我想要做的是使用惯用的 rust 语义通过 match 语句处理错误。所以我试图通过替换这一行来处理这个错误......

    let messedup: Project = from_reader(malformed.as_bytes()).unwrap();

...用这些线...

    let messedup: Project = match from_reader(malformed.as_bytes())
    {
        Ok(v) => v,
        Err(e) => println!("Error reading malformed xml {:?}", e),
    };

...但是我得到这个编译时错误...

$ cargo run
   Compiling so-help v0.1.0 (/home/dygear/so-help)
error[E0308]: mismatched types
  --> src/main.rs:40:19
   |
40 |         Err(e) => println!("Error reading malformed xml {:?}", e),
   |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `Project`, found `()`
   |
   = note: this error originates in the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)

For more information about this error, try `rustc --explain E0308`.
error: could not compile `so-help` due to previous error

那么我应该如何处理解析错误呢?我已经询问了有关锈迹斑斑的问题,但似乎没有答案。问题似乎是由于使用的数据类型,但我希望返回该数据类型,否则会出错。

示例代码也是available on github,因此您可以确切地看到我在哪里。


尝试@Lagerbaer 的代码实际上不起作用。

fn main() {
    let correct = r##"
        <Project name="my_project">
            <Item name="hello" source="world.rs" />
        </Project>
    "##;
    let project: Project = from_reader(correct.as_bytes()).unwrap();
    println!("{:#?}", project);

    let malformed = r##"
        <Project name="malformed">
            <malformed name="Hello" source="world.rs />
            <WeDontClose This>
        </Project>
    "##;
    let potentially_messed_up: Result<Project, serde_xml_rs::Error> = from_reader(malformed.as_bytes());
    if let Err(e) = potentially_messed_up {
        println!("Error reading malformed xml {:?}", e);
    } else {
        // now here we do stuff that we _only_ do if there's no error
        potentially_messed_up.unwrap();
        // here we can unwrap without ever causing a panic, because the 
        // if let Err(e) part made sure that we don't enter this branch if 
        // there was an error
    }
    println!("{:#?}", potentially_messed_up);
}

产生这个错误:

error[E0282]: type annotations needed for `Result<T, serde_xml_rs::Error>`
  --> src/main.rs:42:13
   |
37 |     let potentially_messed_up = from_reader(malformed.as_bytes());
   |         --------------------- consider giving `potentially_messed_up` the explicit type `Result<T, serde_xml_rs::Error>`, with the type parameters specified
...
42 |         let v = potentially_messed_up.unwrap();
   |             ^ cannot infer type

error: aborting due to previous error

所以我们现在将第 37 行更改为此 ...

    let potentially_messed_up: Result<Project, serde_xml_rs::Error> = from_reader(malformed.as_bytes());

...产生这个错误...

error[E0382]: borrow of moved value: `potentially_messed_up`
    --> src/main.rs:47:23
     |
37   |     let potentially_messed_up: Result<Project, serde_xml_rs::Error> = from_reader(malformed.as_bytes());
     |         --------------------- move occurs because `potentially_messed_up` has type `Result<Project, serde_xml_rs::Error>`, which does not implement the `Copy` trait
...
42   |         potentially_messed_up.unwrap();
     |                               -------- `potentially_messed_up` moved due to this method call
...
47   |     println!("{:#?}", potentially_messed_up);
     |                       ^^^^^^^^^^^^^^^^^^^^^ value borrowed here after move
     |
note: this function takes ownership of the receiver `self`, which moves `potentially_messed_up`
help: consider calling `.as_ref()` to borrow the type's contents
     |
42   |         potentially_messed_up.as_ref().unwrap();
     |                               ^^^^^^^^^

error: aborting due to previous error

所以我们在这里更改语句...

    let potentially_messed_up: Result<Project, serde_xml_rs::Error> = from_reader(malformed.as_bytes());
    if let Err(ref e) = potentially_messed_up {
        println!("Error reading malformed xml {:?}", e);
    } else {
        // now here we do stuff that we _only_ do if there's no error
        potentially_messed_up.as_ref().unwrap();
        // here we can unwrap without ever causing a panic, because the 
        // if let Err(e) part made sure that we don't enter this branch if 
        // there was an error
    }
    println!("{:#?}", potentially_messed_up);

有点工作,当有一个时处理Err,但当没有一个时返回一个Ok包装Project。这也不完全正确。

Project {
    name: "my_project",
    items: [
        Item {
            name: "hello",
            source: "world.rs",
        },
    ],
}
Ok(
    Project {
        name: "my_project",
        items: [
            Item {
                name: "hello",
                source: "world.rs",
            },
        ],
    },
)

@Jmb 下面的评论包括我遵循的一个要点,以接近惯用的 rust,但我似乎也无法让它发挥作用。它处理格式错误的输入错误,以及它仍然在Err 匹配臂中处理的正确输入。这很奇怪,因为它在上面直接被正确读取。

fn main() {
    let correct = r##"
        <Project name="my_project">
            <Item name="hello" source="world.rs" />
        </Project>
    "##;
    let project: Project = from_reader(correct.as_bytes()).unwrap();
    println!("{:#?}", project);

    let malformed = r##"
        <Project name="malformed">
            <malformed name="Hello" source="world.rs />
            <WeDontClose This>
        </Project>
    "##;

    let correct = r##"
        <Project name="my_project">
            <Item name="hello" source="world.rs" />
        </Project>
    "##;
    let xml = match from_reader(correct.as_bytes())
    {
        Err (e) =>
        {
            println!("Error reading malformed xml {:?}", e);
            return
        }
        Ok(xml) =>
        {
            xml
        }
    };
    println!("{:?}", xml);
}
warning: unused variable: `malformed`
  --> src/main.rs:31:9
   |
31 |     let malformed = r##"
   |         ^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_malformed`
   |
   = note: `#[warn(unused_variables)]` on by default

warning: 1 warning emitted

    Finished dev [unoptimized + debuginfo] target(s) in 0.22s
     Running `target/debug/so-help`
Project {
    name: "my_project",
    items: [
        Item {
            name: "hello",
            source: "world.rs",
        },
    ],
}
Error reading malformed xml UnexpectedToken { token: "&XmlEvent::EndElement { .. }", found: "StartElement(Item, {\"\": \"\", \"xml\": \"http://www.w3.org/XML/1998/namespace\", \"xmlns\": \"http://www.w3.org/2000/xmlns/\"}, [name -> hello, source -> world.rs])" }

【问题讨论】:

    标签: rust error-handling match serde


    【解决方案1】:

    我很惊讶你没有得到答案,因为它应该相对简单。

    您的代码无法编译,因为您使用匹配表达式为类型为 Project 的变量赋值,但只有一个匹配臂实际上返回了该类型的值,而另一个匹配臂没有'不返回任何东西,它只是打印东西。

    让我们慢慢来,一步一步来。 unwrapResult 枚举的一个方法,如您所知,其行为是在值存在时返回值,如果出现错误则恐慌。

    因此,如果我们删除 unwrap,剩下的就是 Result 枚举。这就是您处理错误所需的全部内容,但当然现在您必须决定要对错误做什么。在您的示例中,您正在打印一条错误消息。但那之后应该发生什么?你可以这样做:

    let potentially_messed_up = from_reader(malformed.as_byes());
    if let Err(e) = potentially_messed_up {
      println!("Error reading malformed xml {:?}", e);
    } else {
      // now here we do stuff that we _only_ do if there's no error
      let v = potentially_messed_up.unwrap();
      // here we can unwrap without ever causing a panic, because the 
      // if let Err(e) part made sure that we don't enter this branch if 
      // there was an error
    }
    

    【讨论】:

    • 大脸掌 -- 对不起,我还在习惯语法。甚至if let Err(e) = potentially_messed_up 也让我感到惊讶。我们在这里分配错误吗?编译器必须做一些体操才能使该语法起作用。我现在想知道处理此问题的正确方法是否实际上是将该错误冒泡,以便在它不会发生之后发生的一连串事情。
    • 这里的语法是模式匹配的一种。发生的情况是 if potentially_messed_up 具有 Err 变体,我们将包含的错误值分配给变量 e。无论如何,您当然可以使用? 语法将错误冒泡(然后将您的返回类型更改为适当的Result。选择取决于您的程序的需要。
    • @Legerbaer 感谢您抽出宝贵时间回答这个问题。非常感谢!
    • 而不是if let + unwrap,保留match 会更惯用,但将处理移到其中:gist
    • @Lagerbaer ...所以...有趣的故事...实际上有时间实现这一点。 `|' '37 |让可能_messed_up = from_reader(malformed.as_bytes());` ` | ---------------------考虑给potentially_messed_up`显式类型Result&lt;T, serde_xml_rs::Error&gt;,指定类型参数`...42 | let v = potentially_messed_up.unwrap();`| ^ 无法推断类型`
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-08-28
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多