如何从 Rust 中的文件中读取结构？答案

【问题标题】：How to read a struct from a file in Rust?如何从 Rust 中的文件中读取结构？
【发布时间】：2014-08-20 16:36:09
【问题描述】：

有没有一种方法可以直接从 Rust 中的文件中读取结构？我的代码是：

use std::fs::File;

struct Configuration {
    item1: u8,
    item2: u16,
    item3: i32,
    item4: [char; 8],
}

fn main() {
    let file = File::open("config_file").unwrap();

    let mut config: Configuration;
    // How to read struct from file?
}

如何将我的配置从文件中直接读入config？这甚至可能吗？

【问题讨论】：

你的文件是什么格式的？正确答案很大程度上取决于文件中的实际数据表示。
@VladimirMatveev 二进制格式，我不想从文件中读取并复制到我的结构中；我想使用我的结构作为缓冲区来读取文件。
啊，我现在明白你需要什么了。如果没有一些不安全的代码，您将无法做到这一点。我现在将尝试编写概念证明。
这个箱子似乎完全符合您的要求：github.com/TyOverby/bincode

标签： io rust

【解决方案1】：

给你：

use std::io::Read;
use std::mem;
use std::slice;

#[repr(C, packed)]
#[derive(Debug, Copy, Clone)]
struct Configuration {
    item1: u8,
    item2: u16,
    item3: i32,
    item4: [char; 8],
}

const CONFIG_DATA: &[u8] = &[
    0xfd, // u8
    0xb4, 0x50, // u16
    0x45, 0xcd, 0x3c, 0x15, // i32
    0x71, 0x3c, 0x87, 0xff, // char
    0xe8, 0x5d, 0x20, 0xe7, // char
    0x5f, 0x38, 0x05, 0x4a, // char
    0xc4, 0x58, 0x8f, 0xdc, // char
    0x67, 0x1d, 0xb4, 0x64, // char
    0xf2, 0xc5, 0x2c, 0x15, // char
    0xd8, 0x9a, 0xae, 0x23, // char
    0x7d, 0xce, 0x4b, 0xeb, // char
];

fn main() {
    let mut buffer = CONFIG_DATA;

    let mut config: Configuration = unsafe { mem::zeroed() };

    let config_size = mem::size_of::<Configuration>();
    unsafe {
        let config_slice = slice::from_raw_parts_mut(&mut config as *mut _ as *mut u8, config_size);
        // `read_exact()` comes from `Read` impl for `&[u8]`
        buffer.read_exact(config_slice).unwrap();
    }

    println!("Read structure: {:#?}", config);
}

Try it here（针对 Rust 1.38 更新）

但是，您需要小心，因为不安全的代码是不安全的。在slice::from_raw_parts_mut() 调用之后，同一数据同时存在两个可变句柄，这违反了Rust 别名规则。因此，您可能希望将在结构之外创建的可变切片保留尽可能短的时间。我还假设您了解字节序问题 - 上面的代码绝不是可移植的，如果在不同类型的机器（例如 ARM 与 x86）上编译和运行，将返回不同的结果。

如果您可以选择格式并且想要紧凑的二进制格式，请考虑使用bincode。否则，如果您需要，例如要解析一些预定义的二进制结构，byteorder crate 是要走的路。

【讨论】：

是的，我知道字节序问题 - 但它只是我正在编写的一个快速工具，可以在大约 3 台计算机上运行。
@A.B.，this，我相信。它现在位于here。
最后我选择了“mem::uninitialized”，而不是mem::zeroed。如果无论如何都会被覆盖，将内存初始化为 0 似乎没有多大意义。
这给了我一个“警告，这个警告将变成一个错误”的消息，github.com/rust-lang/rust/issues/46043
虽然这段代码的大致轮廓很好，但这个特定实例违反了 Rust 的安全性。字符数据的值无效，超出了当前支持的字符边界。

【解决方案2】：

作为Vladimir Matveev mentions，使用the byteorder crate 通常是最好的解决方案。这样，您就可以解决字节顺序问题，不必处理任何不安全的代码，也不必担心对齐或填充：

use byteorder::{LittleEndian, ReadBytesExt}; // 1.2.7
use std::{
    fs::File,
    io::{self, Read},
};

struct Configuration {
    item1: u8,
    item2: u16,
    item3: i32,
}

impl Configuration {
    fn from_reader(mut rdr: impl Read) -> io::Result<Self> {
        let item1 = rdr.read_u8()?;
        let item2 = rdr.read_u16::<LittleEndian>()?;
        let item3 = rdr.read_i32::<LittleEndian>()?;

        Ok(Configuration {
            item1,
            item2,
            item3,
        })
    }
}

fn main() {
    let file = File::open("/dev/random").unwrap();

    let config = Configuration::from_reader(file);
    // How to read struct from file?
}

我忽略了[char; 8] 有几个原因：

Rust 的 char 是 32 位类型，不清楚您的文件是否具有实际的 Unicode 代码点或 C 样式的 8 位值。
你不能轻易地用字节序解析数组，你必须先解析 N 个值，然后自己构建数组。

【讨论】：

我想这些read_u8 和其他read_X 调用可能会调用系统调用。所以可能效率不高。我们能否以某种字节顺序读取整个结构而不是整数类型的一小部分？
@VictorPolevoy 这是一个缓冲阅读器的工作来修复。请参阅What's the de-facto way of reading and writing files in Rust 1.x?，从“缓冲 I/O”开始。但是是的，您可以不安全地获取任何随机的字节块并将其转换为任何给定的类型。这就是这里其他两个答案的重点。
如果我想读取 10 GB 的文件怎么办？性能损失会很高。使用 from_raw_parts 是 IMO 的唯一方法。
@mishmashru 我没有立即明白为什么它的性能会低于from_raw_parts。这不是您需要意见的事情。两者都写并进行基准测试——然后你就肯定知道了。

【解决方案3】：

以下代码未考虑任何endianness 或padding 问题，旨在与POD types 一起使用。 struct Configuration 在这种情况下应该是安全的。

这是一个可以从文件中读取结构（POD 类型）的函数：

use std::io::{self, Read};
use std::slice;

fn read_struct<T, R: Read>(mut read: R) -> io::Result<T> {
    let num_bytes = ::std::mem::size_of::<T>();
    unsafe {
        let mut s = ::std::mem::uninitialized();
        let buffer = slice::from_raw_parts_mut(&mut s as *mut T as *mut u8, num_bytes);
        match read.read_exact(buffer) {
            Ok(()) => Ok(s),
            Err(e) => {
                ::std::mem::forget(s);
                Err(e)
            }
        }
    }
}

// use
// read_struct::<Configuration>(reader)

如果要从文件中读取一系列结构体，可以多次执行read_struct或一次读取所有文件：

use std::fs::{self, File};
use std::io::BufReader;
use std::path::Path;

fn read_structs<T, P: AsRef<Path>>(path: P) -> io::Result<Vec<T>> {
    let path = path.as_ref();
    let struct_size = ::std::mem::size_of::<T>();
    let num_bytes = fs::metadata(path)?.len() as usize;
    let num_structs = num_bytes / struct_size;
    let mut reader = BufReader::new(File::open(path)?);
    let mut r = Vec::<T>::with_capacity(num_structs);
    unsafe {
        let buffer = slice::from_raw_parts_mut(r.as_mut_ptr() as *mut u8, num_bytes);
        reader.read_exact(buffer)?;
        r.set_len(num_structs);
    }
    Ok(r)
}

// use
// read_structs::<StructName, _>("path/to/file"))

【讨论】：

为什么 ::std::mem... 而不是 std::mem？有什么区别吗？
以:: 开头的路径是绝对路径。如果函数放在模块上，使用绝对路径将确保代码能够编译。在doc.rust-lang.org/book/crates-and-modules.html 中搜索绝对以了解更多信息。
谢谢马尔巴博
@Knight 防止析构函数在s 上运行（s 未初始化）。这是forget 文档中描述的一个用例。
虽然这个答案暗示了潜在的问题，但它不正确地使用了不安全的 Rust。提议的函数可以在安全的 Rust 代码中引入内存不安全。 One example shows it causing a segfault。这段代码应该不被使用。