Apache commons csv跳过行答案

【问题标题】：Apache commons csv skip linesApache commons csv跳过行
【发布时间】：2015-11-28 14:09:28
【问题描述】：

如何使用apache commons csv 跳过输入文件中的行。在我的文件中，前几行是 ~~garbage~~ 有用的元信息，如日期等。找不到任何选项。

private void parse() throws Exception {
    Iterable<CSVRecord> records = CSVFormat.EXCEL
            .withQuote('"').withDelimiter(';').parse(new FileReader("example.csv"));
    for (CSVRecord csvRecord : records) {
        //do something            
    }
}

【问题讨论】：

标签： java csv

【解决方案1】：

所以CSVParser.iterator() 绝对不应该在iterator.hasNext() 上抛出异常，因为它几乎不可能在错误情况下恢复。

但有志者事竟成，我提出了一个很有效的可怕想法™

    public void runOnFile(Path file) {
        try {
            BufferedReader in = fixHeaders(file);
            CSVParser parsed = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(in);
            Map<String, Integer> headerMap = parsed.getHeaderMap();

            String line;
            while ((line = in.readLine()) != null) {
                try {
                    CSVRecord record = CSVFormat.DEFAULT.withHeader(headerMap.keySet().toArray(new String[headerMap.keySet().size()]))
                            .parse(new StringReader(line)).getRecords().get(0);
                    // do something with your record
                } catch (Exception e) {
                    System.out.println("ignoring line:" + line);
                }
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

【讨论】：

【解决方案2】：

在启动for-loop之前使用FileReader.readLine()。

你的例子：

private void parse() throws Exception {
  FileReader reader = new FileReader("example.csv");
  reader.readLine(); // Read the first/current line.

  Iterable <CSVRecord> records = CSVFormat.EXCEL.withQuote('"').withDelimiter(';').parse(reader);
  for (CSVRecord csvRecord: records) {
    // do something
  }
}

【讨论】：

谢谢！很简单。如果我不知道要跳过多少行并且必须检查该行的内容怎么办。我会“松开”从流中读取的行
这个答案的方向是正确的，但是这段代码不会编译，因为你应该使用BufferedReader reader = new BufferedReader(new FileReader(fileName));

【解决方案3】：

没有用于跳过未知行数的内置工具。

如果您只想跳过第一行（标题行），您可以在构建解析器时调用withSkipHeaderRecord()。

更通用的解决方案是在迭代器上调用next()：

Iterable<CSVRecord> parser = CSVFormat.DEFAULT.parse(new FileReader("example.csv"));
Iterator<CSVRecord> iterator = parser.iterator();

for (int i = 0; i < amountToSkip; i++) {
    if (iterator.hasNext()) {
        iterator.next();
    }
}

while (iterator.hasNext()) {
    CSVRecord record = iterator.next();
    System.out.println(record);
}

【讨论】：

单独设置withSkipHeaderRecord() 并不能完成这项工作。您必须先调用withFirstRecordAsHeader() 来告知解析器第一条记录是标头。

【解决方案4】：

你可以使用这个跳过标题行

        Reader excelInput = new FileReader("example.csv");

        CSVFormat csvFormat = CSVFormat.EXCEL.withSkipHeaderRecord(true).withHeader("Arm1", "Arm2", "Arm3", "Arm4",
            "Arm5", "Arm6");

        CSVParser csvParser = new CSVParser(excelInput, csvFormat);

关键是将withSkipHeaderRecord()设置为true，并在withHeader()中指定要跳过的标头。

如果您知道要跳过的行号，可以执行以下操作：

for(CVSRecord csvRecord: CSVParser){
   if(csvRecord.getRecordNumber() == 1){
      continue;
  } 
}

第 1 行是您要跳过的内容。

【讨论】：