【问题标题】:Why is the XmlWriter always outputting utf-16 encoding?为什么 XmlWriter 总是输出 utf-16 编码?
【发布时间】:2012-03-16 13:57:26
【问题描述】:

我有这个扩展方法

    public static string SerializeObject<T>(this T value)
    {
        var serializer = new XmlSerializer(typeof(T));           
        var settings = new XmlWriterSettings
                       {
                        Encoding = new UTF8Encoding(true), 
                        Indent = false, 
                        OmitXmlDeclaration = false,
                        NewLineHandling = NewLineHandling.None
                       };

        using(var stringWriter = new StringWriter()) 
        {
            using(var xmlWriter = XmlWriter.Create(stringWriter, settings)) 
            {
                serializer.Serialize(xmlWriter, value);
            }

            return stringWriter.ToString();
        }
    }

但每当我调用它时,它都会指定utf-16 的编码,即&lt;?xml version="1.0" encoding="utf-16"?&gt;。我做错了什么?

【问题讨论】:

标签: c# xml-serialization xmlwriter


【解决方案1】:

字符串是 UTF-16,因此写入 StringWriter 将始终使用 UTF-16。如果这不是你想要的,那么使用其他一些TextWriter 派生类,使用你喜欢的编码。

【讨论】:

  • 摇头。那么在 XmlWriterSettings 上具有编码属性有什么意义。是的,字符串是 UTF-16,但是如果我们要序列化为字符串,那是因为我们要写入文件或其他东西,并且我们希望 xml 声明的编码属性与我们的文件的真实编码相匹配创建,这不太可能是 UTF-16。
  • @user 不序列化为字符串直接转到 Stream。
  • 好的。这更有意义。
  • 是的,您使用的是StringWriter,所以默认情况下它是 Unicode (UTF-16) 。如果我执行using(var xmlWriter = XmlWriter.Create("MyFile.xml", settings) 并手动执行xmlWriter.WriteStartElement("SomeRootElement"); xmlWriter.WriteEndElement();,然后将其重新加载:XmlDocument xml = new XmlDocument(); xml.Load("MyFile.xml"); byte[] bytes = Encoding.Default.GetBytes(xml.OuterXml); string xmlDoc = Encoding.Default.GetString(bytes);,它是 UTF-8 - 仅供参考,您可以将其转储到文件中并重新读取,然后删除该文件.或者,使用StringWriter,只需指定Encoding.UTF8
【解决方案2】:

据我所知,StringWriter 类在序列化为字符串时将始终使用 UTF 16 编码。您可以编写自己的覆盖类来接受不同的编码:

public class StringWriterWithEncoding : StringWriter
{
    private readonly Encoding _encoding;

    public StringWriterWithEncoding()
    {
    }

    public StringWriterWithEncoding(IFormatProvider formatProvider)
        : base(formatProvider)
    {
    }

    public StringWriterWithEncoding(StringBuilder sb)
        : base(sb)
    {
    }

    public StringWriterWithEncoding(StringBuilder sb, IFormatProvider formatProvider)
        : base(sb, formatProvider)
    {
    }


    public StringWriterWithEncoding(Encoding encoding)
    {
        _encoding = encoding;
    }

    public StringWriterWithEncoding(IFormatProvider formatProvider, Encoding encoding)
        : base(formatProvider)
    {
        _encoding = encoding;
    }

    public StringWriterWithEncoding(StringBuilder sb, Encoding encoding)
        : base(sb)
    {
        _encoding = encoding;
    }

    public StringWriterWithEncoding(StringBuilder sb, IFormatProvider formatProvider, Encoding encoding)
        : base(sb, formatProvider)
    {
        _encoding = encoding;
    }

    public override Encoding Encoding
    {
        get { return (null == _encoding) ? base.Encoding : _encoding; }
    }
}

所以你可以改用这个:

using(var stringWriter = new StringWriterWithEncoding( Encoding.UTF8))
{
   ...
}

【讨论】:

    【解决方案3】:

    正如@john-saunders 在他的回答中提到的那样:

    StringWriter 将始终使用 UTF-16

    所以我为此使用了 MemoryStream。

    在我的情况下,我使用 windows-1251 编码。

    var xmlSstring = "";
    using (var ms = new MemoryStream())
    {
        var encoding = Encoding.GetEncoding(1251);
        var settings = new XmlWriterSettings
        {
            Indent = true,
            Encoding = encoding
        };
    
        using (var xmlTextWriter = XmlWriter.Create(ms, settings))
        {
            doc.Save(xmlTextWriter);
            xmlString = encoding.GetString(ms.ToArray());
        }
    }
    

    【讨论】:

      【解决方案4】:

      您应该从 StringWriter 派生一个新类,该类具有覆盖的编码属性。

      【讨论】:

        【解决方案5】:

        正如公认的答案所说,StringWriter 默认和设计是 UTF-16 (Unicode)。如果你想通过最后得到一个 UTF-8 字符串来做到这一点,我可以给你两种方法来完成它:

        解决方案 #1(不是很有效,不好的做法,但可以完成工作): 将其转储到文本文件并读回,删除文件 (可能只适合对于小文件,如果您甚至想这样做 - 只是想表明它可以做到!)

        public static string SerializeObject<T>(this T value)
        {
            var serializer = new XmlSerializer(typeof(T));           
            var settings = new XmlWriterSettings
                           {
                            Encoding = new UTF8Encoding(true), 
                            Indent = false, 
                            OmitXmlDeclaration = false,
                            NewLineHandling = NewLineHandling.None
                           };
        
        
            using(var xmlWriter = XmlWriter.Create("MyFile.xml", settings)) 
            {
                serializer.Serialize(xmlWriter, value);
            }
        
            XmlDocument xml = new XmlDocument();
            xml.Load("MyFile.xml");
            byte[] bytes = Encoding.UTF8.GetBytes(xml.OuterXml);        
            File.Delete("MyFile.xml");
        
            return Encoding.UTF8.GetString(bytes);
        
        }
        

        解决方案 #2(更好、更简单、更优雅的解决方案!): 使用StringWriter,但使用其Encoding 属性将其设置为UTF-8 :

        public static string SerializeObject<T>(this T value)
        {
            var serializer = new XmlSerializer(typeof(T));           
            var settings = new XmlWriterSettings
                           {
                            Encoding = new UTF8Encoding(true), 
                            Indent = false, 
                            OmitXmlDeclaration = false,
                            NewLineHandling = NewLineHandling.None
                           };
        
            using(var stringWriter = new UTF8StringWriter())
            {
                using(var xmlWriter = XmlWriter.Create(stringWriter, settings)) 
                {
                    serializer.Serialize(xmlWriter, value);
                }
        
                return stringWriter.ToString();
            }
        }
        
        public class UTF8StringWriter : StringWriter
        {
            public override Encoding Encoding
            {
                get
                {
                    return Encoding.UTF8;
                }
            }
        }
        

        【讨论】:

          【解决方案6】:

          如果您不想使用派生自 StringWriter 的类,那么在您的情况下,您可以简单地将 OmitXmlDeclaration 设置为 false 并声明您自己的,就像我在下面做的那样:

           public static string Serialize<T>(this T value, string xmlDeclaration = "<?xml version=\"1.0\"?>") where T : class, new()
                  {
                      if (value == null) return string.Empty;
          
                      using (var stringWriter = new StringWriter())
                      {
                          var settings = new XmlWriterSettings
                          {
                              Indent = true,
                              OmitXmlDeclaration = xmlDeclaration != null,
                          };
          
                          using (var xmlWriter = XmlWriter.Create(stringWriter, settings))
                          {
                              var xmlSerializer = new XmlSerializer(typeof(T));
          
                              xmlSerializer.Serialize(xmlWriter, value);
          
                              var sb = new StringBuilder($"{Environment.NewLine}{stringWriter}");
          
                              sb.Insert(0, xmlDeclaration);
          
                              return sb.ToString();
                          }
                      }
          

          【讨论】:

            猜你喜欢
            • 2010-12-24
            • 2022-01-23
            • 1970-01-01
            • 2012-03-13
            • 1970-01-01
            • 2010-10-26
            • 2010-10-05
            • 2020-05-07
            • 2011-10-02
            相关资源
            最近更新 更多