使用 Adobe AIR 解析大型文本文件答案

【问题标题】：Parsing large text files with Adobe AIR使用 Adobe AIR 解析大型文本文件
【发布时间】：2009-09-03 04:16:23
【问题描述】：

我正在尝试在 AIR 中执行以下操作：

浏览到文本文件
读取文本文件并将其存储在字符串中（最终存储在数组中）
用分隔符 \n 分割字符串并将生成的字符串放入数组中
在将数据发送到网站（mysql 数据库）之前对其进行操作

我正在处理的文本文件大小在 100-500mb 之间。到目前为止，我已经能够完成步骤 1 和 2，这是我的代码：

<mx:Script>
    <![CDATA[
    import mx.collections.ArrayCollection;
    import flash.filesystem.*;
    import flash.events.*;
    import mx.controls.*;

    private var fileOpened:File = File.desktopDirectory;
    private var fileContents:String;
    private var stream:FileStream;

    private function selectFile(root:File):void {
        var filter:FileFilter = new FileFilter("Text", "*.txt");
        root.browseForOpen("Open", [filter]);
        root.addEventListener(Event.SELECT, fileSelected);
    }

    private function fileSelected(e:Event):void {
        var path:String = fileOpened.nativePath;
        filePath.text = path;

        stream = new FileStream();
        stream.addEventListener(ProgressEvent.PROGRESS, fileProgress);
        stream.addEventListener(Event.COMPLETE, fileComplete);
        stream.openAsync(fileOpened, FileMode.READ);
    }

    private function fileProgress(p_evt:ProgressEvent):void {
        fileContents += stream.readMultiByte(stream.bytesAvailable, File.systemCharset); 
        readProgress.text = ((p_evt.bytesLoaded/1048576).toFixed(2)) + "MB out of " + ((p_evt.bytesTotal/1048576).toFixed(2)) + "MB read";
    }

    private function fileComplete(p_evt:Event):void {
        stream.close();
        //fileText.text = fileContents;
    }

    private function process(c:String):void {
        if(!c.length > 0) {
            Alert.show("File contents empty!", "Error");
        }
        //var array:Array = c.split(/\n/);

    }

    ]]>
</mx:Script>

这是 MXML

<mx:Text x="10" y="10" id="filePath" text="Select a file..." width="678" height="22" color="#FFFFFF"  fontWeight="bold"/>
<mx:Button x="10" y="40" label="Browse" click="selectFile(fileOpened)" color="#FFFFFF" fontWeight="bold" fillAlphas="[1.0, 1.0]" fillColors="[#E2E2E2, #484848]"/>
<mx:Button x="86" y="40" label="Process" click="process(fileContents)" color="#FFFFFF" fontWeight="bold"  fillAlphas="[1.0, 1.0]" fillColors="[#E2E2E2, #484848]"/>
<mx:TextArea x="10" y="70" id="fileText" width="678" height="333" editable="false"/>
<mx:Label x="10" y="411" id="readProgress" text="" width="678" height="19" color="#FFFFFF"/>

第 3 步是我遇到一些麻烦的地方。我的代码中有 2 行被注释掉，这两行都会导致程序冻结。

fileText.text = 文件内容；尝试将字符串的内容放入 textarea
var array:Array = c.split(/\n/);尝试通过分隔符换行符分割字符串

此时可以使用一些输入... 我什至会以正确的方式解决这个问题吗？ flex/air 可以处理这么大的文件吗？（我假设是这样）这是我第一次尝试做任何类型的弹性工作，如果你发现我做错了其他事情或者可以做得更好，我会很感激提醒！

谢谢！

【问题讨论】：

标签： apache-flex actionscript-3 air parsing text-files

【解决方案1】：

对 500MB 的文件执行split 可能不是一个好主意。您可以编写自己的解析器来处理文件，但它可能也不是很快：

private function fileComplete(p_evt:Event):void 
{
    var array:Array = [];

    var char:String;
    var line:String = "";
    while(stream.position < stream.bytesAvailable)
    {
        char = stream.readUTFBytes(1);
        if(char == "\n")
        {
            array.push(line);
            line = "";
        }
        else
        {
            line += char;
        }
    }

    // catch the last line if the file isn't terminated by a \n
    if(line != "")
    {
        array.push(line);
    }

    stream.close();
}

我还没有测试过它，但它应该只是逐个字符地遍历文件。如果字符是新行，则将旧行推入数组，否则将其添加到当前行。

如果您不希望它在执行此操作时阻塞您的 UI，则需要将其抽象为基于计时器的想法：

// pseudo code
private function fileComplete(p_evt:Event):void 
{
    var array:Array = [];
    processFileChunk();
}

private function processFileChunk(event:TimerEvent=null):void
{
    var MAX_PER_FRAME:int = 1024;
    var bytesThisFrame:int = 0;
    var char:String;
    var line:String = "";
    while(   (stream.position < stream.bytesAvailable)
          && (bytesThisFrame < MAX_PER_FRAME))
    {
        char = stream.readUTFBytes(1);
        if(char == "\n")
        {
            array.push(line);
            line = "";
        }
        else
        {
            line += char;
        }
        bytesThisFrame++;
    }

    // if we aren't done
    if(stream.position < stream.bytesAvailable)
    {
        // declare this in the class
        timer = new Timer(100, 1);
        timer.addEventListener(TimerEvent.TIMER_COMPLETE, processFileChunk);
        timer.start();
    }
    // we're done
    else
    {
        // catch the last line if the file isn't terminated by a \n
        if(line != "")
        {
            array.push(line);
        }

        stream.close();

        // maybe dispatchEvent(new Event(Event.COMPLETE)); here
        // or call an internal function to deal with the complete array
    }
}

基本上，您选择处理每个帧的文件数量 (MAX_PER_FRAME)，然后处理那么多字节。如果你超过了字节数，那么只需制作一个计时器以在几帧时间内再次调用进程函数，它应该从中断的地方继续。确定完成后，您可以调度调用另一个函数的事件。

【讨论】：

非常感谢您的洞察力。出于好奇，我一直想知道其他人是如何学习 flex/as3 的。你介意分享一下你的经验吗？
我从一周内的 flex adobe.com/devnet/flex/videotraining 开始，然后是 flex 3 in action amazon.com/Flex-3-Action-Tariq-Ahmed/dp/1933988746，然后是 Flex Examples blog.flexexamples.com

【解决方案2】：

我同意。

在从流中读取文本时尝试将文本拆分成块。

这样您就不必将文本存储在 fileContents 字符串中（减少 50% 的内存使用）

【讨论】：

【解决方案3】：

尝试分段处理。

【讨论】：

【解决方案4】：

关于 James 的原生解析器，如果文本文件包含任何多字节 UTF 字符，就会出现问题（当我遇到这个线程时，我正试图以类似的方式解析 UTF 文件）。将每个字节转换为单个字符串会分解多字节字符，所以我做了一些修改。

为了使这个解析器对多字节友好，您可以将增长的行存储在 ByteArray 而不是字符串中。然后，当您到达一行（或一个块，或文件）的末尾时，您可以毫无问题地将其解析为 UTF 字符串（如果需要）：

var 
    out :ByteArray,
    line_out :String,
    line_end :Number,
    char :int,
    line:ByteArray;

out = new ByteArray();
line = new ByteArray();

while( file_stream.bytesAvailable > 0 )
{
    char = file_stream.readByte();
    if( (String.fromCharCode( char ) == "\n") )
    {
        // Do some processing on a line-by-line basis
        line_out = ProcessLine( line );
        line_out += "\n";
        out.writeUTFBytes( line_out );
        line = new ByteArray();
    }
    else
    {
        line.writeByte( char );
    }
}
//Get the last line in there
out.writeBytes( line );

【讨论】：

【解决方案5】：

stream.position

【讨论】：