【发布时间】:2010-12-18 22:07:23
【问题描述】:
好的,我正在尝试使用 UTF8 文本文件。我一直在与编写器为 UTF8 插入的 BOM 字符作斗争,这会炸毁几乎我需要用来读取文件的任何东西,包括序列化程序和其他文本阅读器。
我得到前六字节的数据:
0xEF
0xBB
0xBF
0xEF
0xBB
0xBF
(现在我正在看它,我意识到那里有两个字符。那是 UTF8 BOM 标记吗?我是双重编码吗)?
注意序列化器编码为 UTF8,然后内存流得到一个字符串为 UTF8,然后我用 UTF8 将字符串写入文件......似乎有很多冗余。想法?
//I'm storing this xml result to a database field. (this one includes the BOF chars)
using (MemoryStream ms = new MemoryStream())
{
Utility.SerializeXml(ms, root);
xml = Encoding.UTF8.GetString(ms.ToArray());
}
//later on, I would take that xml and then write it out to a file like this:
File.WriteAllText(path, xml, Encoding.UTF8);
public static void SerializeXml(Stream output, object data)
{
XmlSerializer xs = new XmlSerializer(data.GetType());
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = "\t";
settings.Encoding = Encoding.UTF8;
XmlWriter writer = XmlTextWriter.Create(output, settings);
xs.Serialize(writer, data);
writer.Flush();
writer.Close();
}
【问题讨论】:
标签: c# unicode utf-8 xml-serialization