是的,如果您有固定大小的格式,那么索引为 10 个字符,长度为 3 个字符,您的示例将被编码为:
" 1 1c 2 2cc 3 3ccc"。
您还谈到fstream,但看起来您正在追求文本(人类可读)序列化,而不是二进制序列化。如果是这种情况,但您不需要真正人类可读的形式,您可以用一些位标记长度的第一个字节(ASCII 中的数字编码为 0x30 到 0x39 值,因此您可以例如设置 @ 987654326@ 位而不破坏数据字节。那么您的示例将如下所示:
1qc2rcc3sccc (q = 0x71 = 0x40|0x31 = 0x40|'1')
对于更长的值,它看起来像:113q00123456789 ... ARGH 我想序列化 10 个字符长的字符串“0123456789”,看看发生了什么,我得到了长度 100 10 (或者更糟的是 100123456789,如果你不限制的话),所以长度的开始和结束都必须以某种方式被污染,可能使用位 0x80 来标记长度的结束。
1\361c2\362cc3\363ccc (\361 = 0xF1 = 0x40|0x80|0x31 = 0x40|0x80|'1')
长值第二次尝试:
113q°0123456789(索引 113,长度 10,数据“0123456789”,q = 0x40|'1',° = 0x80|'0')。
你不想要二进制形式吗?会更短。
顺便说一句,如果您不介意污染值,但您想保留 7 位 ASCII,您可以污染不是长度的开始和结束,而是索引和长度的结束,并且只能使用 0x40。所以11c 会变成qqc。 113 10 0123456789 将是 11s1p0123456789。
使用与平台无关的字节序进行二进制写入/读取(即,在 little-endian 上写入的文件将在具有 big-endian 的其他平台上工作)。
#include <iostream>
#include <cstdint>
#include <vector>
/**
* Writes index+length+data in binary form to "out" stream.
*
* Returns number of bytes written to out stream.
*
* Does no data validation (the variable types are only limits for input data).
*
* writeData and readData are done in endiannes agnostic way.
* So file saved at big-endian platform will be restored correctly on little-endian platform.
**/
size_t writeData(std::ostream & out,
const uint32_t index, const uint16_t length, const uint8_t *data) {
// Write index and length bytes to out stream, resolve endiannes of host platform.
out.put((char)((index>>0)&0xFF));
out.put((char)((index>>8)&0xFF));
out.put((char)((index>>16)&0xFF));
out.put((char)((index>>24)&0xFF));
out.put((char)((length>>0)&0xFF));
out.put((char)((length>>8)&0xFF));
// If any data, write them to stream
if (0 < length) out.write(reinterpret_cast<const char *>(data), length);
return 4 + 2 + length;
}
/**
* Read data from stream "in" stream into variables index, length and data.
*
* If "in" doesn't contain enough bytes for index+length, zero index/length is returned
*
* If "in" contains more than index+length bytes, but the data are shorter than length,
* then "repaired" shorter data are returned with shorter "length" (not the read one).
**/
void readData(std::istream & in,
uint32_t & index, uint16_t & length, std::vector<uint8_t> & data) {
// clear current values in index, length, data
index = length = 0; data.clear();
// read index+length header from stream
uint8_t buffer[6];
in.read(reinterpret_cast<char *>(buffer), 6);
if (6 != in.gcount()) return; // header data (index+legth) not found
// Reassemble read bytes together to index/length numbers in host endiannes.
index = (buffer[0]<<0) | (buffer[1]<<8) | (buffer[2]<<16) | (buffer[3]<<24);
length = (buffer[4]<<0) | (buffer[5]<<8);
if (0 == length) return; // zero length, nothing more to read
// Read the binary data of expected length
data.resize(length); // reserve memory for read
in.read(reinterpret_cast<char *>(data.data()), length);
if (length != in.gcount()) { // data read didn't have expected length, damaged file?
// TODO you may want to handle damaged data in other way, like returning index 0
// This code will simply accept shorter data, and "repair" length
length = in.gcount();
data.resize(length);
}
}
要查看它的实际效果,您可以在cpp.sh 上试用。