如何用c语言从顺序创建的数据文件创建csv文件答案

【问题标题】：how to create csv file from a sequentially created data file with c language如何用c语言从顺序创建的数据文件创建csv文件
【发布时间】：2022-10-20 17:36:28
【问题描述】：

我有一个关于：如何使用 C 语言从顺序创建的数据文件创建 csv 文件的问题。

使用 C 程序，我制作了几个 printf 值。程序的输出被重定向到一个文件：./myprog >> file.txt

所以文件是这样的：

0.8952
0.89647
0.3658
!!!
0.258633
0.233655
0.25475
!!!
0.5895
0.54785
0.695555
!!!

等等

不同的维度用“！！！”分隔

我想要的结果是：

0.8952;0.258633;0.5895
0.89647;0.233655;0.54785
0.3658;0.25475;0.695555

我尝试使用二维数组来这样做，但因为我在 evevy 之间有大约 100 000 行“！！！”我有一个分段错误。 double myTab[100000] [100000]。

如果您有想法，非常感谢。此致

【问题讨论】：

你对malloc的指针和动态分配了解多少，最重要的是，关于重新分配realloc?
@Gerardh - 啊，好地方！
顺便说一句，输入文件中有多少个!!!-delimited“节”？它是变化的还是总是一样的？有没有办法事先知道可能有多少个部分？与“列”的数量相同，所有文件是否总是相同的？在单个文件中总是相同？
谈论实现：是否有理由需要将所有表元素累积在一个数组中？因为只要我有数据，我就会简单地写入文件：换行符？ --> 在分号后面加上数字。包含!!! 的行？ --> 输出文件中的换行符。
@RobertoCaboni 如果您仔细查看预期结果，您将无法做到这一点。 !!!...!!! 之间的行进入同一列。不在同一行。

标签： arrays c csv file

【解决方案1】：

不要试图缓冲一切。只要记住每个段的开始位置，并明智地使用fseek()。

在这里，我使用一个固定数组，假设最多 10 个段。您可能必须增加它，或者可能使其具有动态性和“可增长性”。

（这只是一个“粗略”，但可能会引导您找到解决方案。）

编辑：澄清：上循环索引每个“节”的开头，存储每个节的第一个数据条目的偏移量。使用ftell() 和fseek() 的组合，下部循环驱动单个输入FILE 就像许多同时的 FILE 缓冲区从同一文件但在不同位置读取单行一样。（类似于顺序地从多个堆栈中“弹出顶部项目”直到堆栈为空。所有堆栈一开始都假定是同样满的。）

size_t offsets[ 10 ] = { 0 };
int nStored = 0;
char buf[ 128 ]; // be generous; don't scrimp.

// first pass, just remember position of 1st number
while( fgets( buf, sizeof buf, infp ) )
    if( strncmp( buf, "!!!", 3 ) )
        offsets[ nStored++ ] = ftell( infp );

// Now, make 100,000 passes until exhaust (equal sized) sections.
while( true ) {
    for( int i = 0; i < nStored; i++ ) {
        fseek( infp, offsets[ i ], SEEK_SET );
        fgets( buf, sizeof buf, infp ) )

        // Been 'chewing through' data lines so far.
        if( strncmp( buf, "!!!", 3 ) ) // section boundary?
            return; // finished with all equal sized data rows.

        *strpbrk( buf, "
" ) = ','; // replace NL with ','
        fprintf( outfp, "%s", buf );
        offsets[ i ] = ftell( infp ); // update for next pass
    }
    fprintf( outfp, "
" ); // Yeah, trailing comma and null field. Life, eh?
    
}

如果“！！！”之间的“节”标记的大小不同，那么不要“更新”短节的偏移值......当小节用尽时，将“，”输出到输出文件以指示“空列”。需要一个标志来指示“在此扫描期间未找到新数据；所有部分都已用尽”，这就是工作完成的线索。

【讨论】：

您做出了与我在 cmets 中所做的相同的错误假设 - 行/列需要交换！
@Gerhardh 谢谢...撤回答案...
@Andrew 太快了！典型...谢谢...删除此答案
@Gerhardh 评论？（我的手指交叉...）:-)
return 应该在那里做什么？

【解决方案2】：

据我测试过，这是可行的。
通过发布的输入文件，我得到了一个匹配的 cvs 文件。
它很慢。它使用了相当多的磁盘空间。使用 100000 x 10000 数据集进行测试，并创建了一个 8 GB 的二进制文件和至少一个 12 GB 的 cvs 文件。
这假设!!! 之间总是有相同数量的double 值，并且所有这些值都将被sscanf 正确解析

#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main ( void) {
    char *filename = "afile.txt";
    char *binname = "adouble.bin";
    char *outname = "adouble.cvs";
    char in[100] = "";
    double dbl = 0.0;
    size_t lines = 0;
    size_t maxlines = 0;
    size_t epl = 0; // elements per line
    size_t maxepl = 0;
    size_t count = 0;
    FILE *pf = NULL;
    FILE *pfbin = NULL;
    FILE *pfout = NULL;

    if ( NULL == ( pf = fopen ( filename, "r"))) {
        perror ( filename);
        return 1;
    }

    epl = 0;
    lines = 0;
    count = 0;

    // read the file to get lines and elements per line
    while ( fgets ( in, sizeof in, pf)) {
        if ( ! strcmp ( in, "!!!
")) {
            ++epl;
            if ( count > lines) {
                lines = count;
            }
            count = 0;
        }
        else {
            ++count;
        }
    }

    printf ( "first read complete
");
    printf ( "lines %zu
", lines);
    printf ( "elements per line %zu
", epl);
    maxlines = lines;
    maxepl = epl;

    rewind ( pf); // to read the file a second time

    lines = 0;
    epl = 0;

    if ( NULL == ( pfbin = fopen ( binname, "wb"))) { // open bin file for writing
        perror ( binname);
        return 2;
    }

    while ( fgets ( in, sizeof in, pf)) {
        if ( ! strcmp ( in, "!!!
")) {
            ++epl;
            lines = 0;
        }
        else {
            sscanf ( in, "%lf", &dbl);
            off_t pos = ( lines * maxepl + epl) * sizeof dbl; // offset into bin file
            fseeko ( pfbin, pos, SEEK_SET); // seek to offset
            fwrite ( &dbl, 1, sizeof dbl, pfbin);
            ++lines;
        }
    }

    fclose ( pf);
    fclose ( pfbin);
    printf ( "second read and create binary file complete
");

    if ( NULL != ( pfbin = fopen ( binname, "rb"))) { // open bin file for reading
        if ( NULL != ( pfout = fopen ( outname, "w"))) { // open out file for writing
            for ( int ln = 0; ln < maxlines; ++ln) {
                for ( int ele = 0; ele < maxepl; ++ele) {
                    fread ( &dbl, 1, sizeof dbl, pfbin); // read from bin file
                    if ( ele) {
                        fprintf ( pfout, ";");
                    }
                    fprintf ( pfout, "%f", dbl);
                }
                fprintf ( pfout, "
");
            }
            fclose ( pfout);
        }
        else {
            perror ( outname);
            return 4;
        }
        fclose ( pfbin);
    }
    else {
        perror ( binname);
        return 4;
    }

    return 0;
}

【讨论】：

【解决方案3】：

为您的所有答案和建议提供了很多帮助。这更像是一个通用的设计问题，而不是一种解决方法。我找到的解决方案是在计算期间将文件写入直接排序的值。祝你今天过得愉快。

【讨论】：