【问题标题】:Using UTF-8 identifier使用 UTF-8 标识符
【发布时间】:2025-12-07 03:10:01
【问题描述】:

我得到一个字符串流形式的 HTTP 请求。流看起来像:

<?xml version="1.0" encoding="utf-8"?>

前三个标记表示字符串被编码为 UTF-8。

我正在使用字符串制作文件。在阅读它们时我得到一个错误:

使用这种方法,我正在使用该字符串制作文件:

private void writeToFile(String data, String fileName) {
    try {
        String UTF8 = "UTF-8";
        int BUFFER_SIZE = 8192;

        String xmlCut = data.substring(3);

        File sdCard = Environment.getExternalStorageDirectory();
        File dir = new File (sdCard.getAbsolutePath()+"/example/Test");
        dir.mkdirs();
        File file = new File(dir,fileName);

        FileOutputStream f = new FileOutputStream(file);
        FileOutputStream fileOutputStream = openFileOutput(fileName, Context.MODE_PRIVATE);
        BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(fileOutputStream,UTF8),BUFFER_SIZE);
        bufferedWriter.write(String.valueOf(data.getBytes("UTF-8")));
        f.write(data.getBytes("UTF-8"));
        f.close();
        bufferedWriter.close();
    } catch (IOException e) {
        Log.e("writeToFile: ", "Datei-Erstellung fehlgeschlagen: " + e.toString());
    }

}

如您所见,我添加了 substring 方法来剪切前三个标记,因为这会导致崩溃。问题是文件是用 ASCI 编码的。

文件读取方法:

 private String readFromFile(String fileName) {
    String ret = "";
    String UTF8 = "UTF-8";
    int BUFFER_SIZE = 8192;

    try {
        InputStream inputStream = openFileInput(fileName);

        if (inputStream != null) {


            BufferedReader bufferedReader1 = new BufferedReader(new InputStreamReader(inputStream,UTF8),BUFFER_SIZE);
            String receiveString = "";
            StringBuilder stringBuilder = new StringBuilder();

            while ((receiveString = bufferedReader1.readLine()) != null) {
                stringBuilder.append(receiveString);
            }

            inputStream.close();
            ret = stringBuilder.toString();
        }
    } catch (FileNotFoundException e) {
        Log.e("readFromFile: ", "Datei nicht gefunden: " + e.toString());
    } catch (IOException e) {
        Log.e("readFromFile: ", "Kann Datei nicht lesen: " + e.toString());
    }
    return ret;
}

如果我不剪切 UTF-8 令牌,那么我会从堆栈跟踪中收到此错误:

Caused by: java.lang.NullPointerException: Attempt to invoke interface method 'org.w3c.dom.NodeList org.w3c.dom.Document.getElementsByTagName(java.lang.String)' on a null object reference
        at de.example.app.ListViewActivity.setListProjectData(ListViewActivity.java:226)

在这里:

public void setListProjectData(String filename) {

    XMLParser parser = new XMLParser();
    String xmlData = readFromFile(filename);
    String xmlCut = xmlData.substring(3);
    Document doc = parser.getDomElement(filename);

    NodeList nodeListProject = doc.getElementsByTagName(KEY_PROJECT);


    for (int i = 0; i < nodeListProject.getLength(); i++) {

        HashMap<String, String> map = new HashMap<String, String>();
        Element e = (Element) nodeListProject.item(i);

        map.put(KEY_UUID, parser.getValue(e, KEY_UUID));
        map.put(KEY_NAME, parser.getValue(e, KEY_NAME));
        map.put(KEY_JOBTITLE, parser.getValue(e, KEY_JOBTITLE));
        map.put(KEY_JOBINFO, parser.getValue(e, KEY_JOBINFO));
        map.put(KEY_PROJECTIMAGE, parser.getValue(e, KEY_PROJECTIMAGE));


        projectItems.add(map);
    }
}

我通过这里从 HTTP 获取数据:

public String getXMLFromUrl(String url) {
    String xml = null;

    if (cd.isConnectingToInternet()) {
        try {
            //defaultHttpClient
            DefaultHttpClient httpClient = new DefaultHttpClient();
            HttpPost httpPost = new HttpPost(url);

            HttpResponse httpResponse = httpClient.execute(httpPost);
            HttpEntity httpEntity = httpResponse.getEntity();
            /*
            final InputStream in = httpEntity.getContent();
            Reader reader = new InputStreamReader(in,"UTF-8");
            InputSource is = new InputSource(reader);
            is.setEncoding("UTF-8");

*/ xml = EntityUtils.toString(httpEntity);

        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    } else {
        return null;
    }

    return xml;

那么,我如何将它们编码为 UTF-8?我做得对吗?

【问题讨论】:

  • 它是 ANSI,而不是 ANSII。你正在混合ANSIASCII
  • 您到底遇到了什么错误?你能从 LogCat 发布堆栈跟踪吗?
  • 我想你应该检查文件是否以 BOM 开头。如果是这样,剪掉它。 *.com/questions/1772321/…
  • 如果我剪切了标记,那么文件是用 ANSI 编码的
  • 您必须将其保存为 Unicode UTF-8,without BOM

标签: java android encoding utf-8


【解决方案1】:

您的问题不在于您发布的代码,而在于从 HTTP 请求获取数据的代码。

您正在将String data 传递给writeToFile 方法。 Java 中的字符串是UTF-16 编码的。如果您在该字符串中有UTF-8 编码数据,那么任何进一步的编码-解码都无法修复已经损坏的数据。

您应该使用xml = EntityUtils.toString(httpEntity, HTTP.UTF_8) 正确解码数据。

如果返回的数据包含UTF-8 BOM,则会出现其他问题。以上行将正确解码数据,但会留下多余(和错误)BOM

为了解决任一服务器必须返回没有BOM 的数据,或者BOM 必须被剥离。为此,请使用以下代码(或类似代码)

public static String stripBOM(InputStream stream)
{
    try
    {
        byte[] buffer = new byte[1024];
        ByteArrayOutputStream os = new ByteArrayOutputStream(1024);
        byte[] bom = new byte[3];
        stream.read(bom);
        int bytesRead;
        while ((bytesRead = stream.read(buffer)) != -1)
        {
            os.write(buffer, 0, bytesRead);
        }
        os.close();
        return os.toString("UTF-8");
    }
    catch (IOException e)
    {
        return "";
    }
}

所以xml = EntityUtils.toString(httpEntity, HTTP.UTF_8)可以替换为

 InputStream is = httpEntity.getContent();
 xml = stripBOM(is);

【讨论】: