【问题标题】:Error: Data between close double quote (") and field separator错误:右双引号 (") 和字段分隔符之间的数据
【发布时间】:2021-03-14 22:39:47
【问题描述】:

我正在尝试使用 Google Apps 脚本从 Google Drive 中获取 CSV 并将其放入 Big Query。上传时出现此错误:

“读取数据时出错,错误消息:解析从位置开始的行时检测到错误:560550。错误:关闭双引号(“)和字段分隔符之间的数据。”

我尝试查看文件的该字节位置及其在 CSV 范围之外的位置(它仅达到 ~501500 字节)。

这是我正在使用的 CSV 的链接,它是网站的抓取:https://drive.google.com/file/d/1k3cGlTSA_zPQCtUkt20vn6XKiLPJ7mFB/view?usp=sharing

这是我的相关代码:

 function csvToBigQuery(exportFolder, csvName, bqDatasetId){
  try{
    //get most recent export from Screaming Frog
    var mostRecentFolder = [];    
    while(exportFolder.hasNext()){
      var folder = exportFolder.next();
      var lastUpdated = folder.getLastUpdated();
      if(mostRecentFolder.length == 0)
        mostRecentFolder = [folder.getLastUpdated(),folder.getId()];
      else if(lastUpdated > mostRecentFolder[0])
        mostRecentFolder = [lastUpdated, folder.getId()];
    }    
    var folderId = mostRecentFolder[1];
    var file = DriveApp.getFolderById(folderId).getFilesByName(csvName + '.csv').next();
    
    if(!file)
      throw "File doesn't exist";
    
    //get csv and add date column.
    //getBlob().getDataAsString().replace(/(["'])(?:(?=(\\?))\2[\s\S])*?\1/g, function(e){return e.replace(/\r?\n|\r/g, ' ')})
    var rows = Utilities.parseCsv(file.getBlob().getDataAsString());
    Logger.log(rows);
    var numColumns = rows[0].length;    
    
    rows.forEach(function(row){
      row[numColumns] = date;
    });
    rows[0][numColumns] = 'Date';
    
    let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""')));
    let csvData = csvRows.map(values => values.join(',')).join('\n');
    //log(csvData)
    var blob = Utilities.newBlob(csvData, 'application/octet-stream');
    
    
    //create job for inserting to BQ.
    var loadJob = {
      configuration: {
        load: {
          destinationTable: {
            projectId: bqProjectId,
            datasetId: bqDatasetId,
            tableId: csvName
          },
          autodetect: true,  // Infer schema from contents.
          writeDisposition: 'WRITE_APPEND',
        }
      }
    };
    
    //append to table in BQ.
    BigQuery.Jobs.insert(loadJob, bqProjectId, blob);
    
    
  }catch(e){
    Logger.log(e); 
  }
}

【问题讨论】:

  • 从您的错误消息中,我建议了您脚本的修改点。你能确认一下吗?但是,不幸的是,我无法检查它。我为此道歉。因此,如果这不是您问题的直接解决方案,我深表歉意。

标签: javascript csv google-apps-script google-bigquery


【解决方案1】:

修改点:

从您的错误消息中,我认为可能存在未包含在双重配额中的部分。所以,我搜索了当我看到你的 CSV 数据并且你的 CSV 数据用下面的脚本替换为"" 时,发现第 711 行有值。

function sample() {
  var id = "###";  // File ID of your CSV file.

  // This is your script.
  var file = DriveApp.getFileById(id);
  var rows = Utilities.parseCsv(file.getBlob().getDataAsString());
  var numColumns = rows[0].length;
  var date = "sample";
  rows.forEach(function(row){
    row[numColumns] = date;
  });
  rows[0][numColumns] = 'Date';
  let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""')));
  let csvData = csvRows.map(values => values.join(',')).join('\n');
  
  // I added below script for checking your CSV data.
  var res = csvData.replace(/\"(|.+?)\"/g, "");
  DriveApp.createFile("sample.txt", res);
}

第711行如下。

"https://supergoop.com/products/lip-shield-trio/?utm_source=Gorgias&utm_medium=CustomerCare&utm_campaign=crosssellhello\","text/html; charset=utf-8","200","OK","Non-Indexable","Canonicalised","Lip Shield Trio - Restores, Protects + Water-resistant – Supergoop!","67","595","Moisturizing lip protection made from antioxidant-rich coconut, avocado, and grape seed oil.","92","576","","0","Lip Shield Trio","15","Lip Shield Trio","15","Why We Love It","14","Ingredients","11","","","","https://supergoop.com/products/lip-shield-trio","","","","","451488","754","1.686","5","","12","4","0.590","205","80","8","5","","","","","f6d1476960d22b1c5964581e161bdd49","0.064","","","","","HTTP/1.1","https://supergoop.com/products/lip-shield-trio/?utm_source=Gorgias&utm_medium=CustomerCare&utm_campaign=crosssellhello%5C"

从这个值,我发现\"用在"https://supergoop.com/products/lip-shield-trio/?utm_source=Gorgias&utm_medium=CustomerCare&utm_campaign=crosssellhello\"上。我认为您的问题的原因可能是由于这个。

那么为了避免这个问题,下面的修改怎么样?

修改脚本:

从:
let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""')));
到:
let csvRows = rows.map(values =>values.map(value => JSON.stringify(value).replace(/\\"/g, '""').replace(/\\"/g, '')));

从:
var rows = Utilities.parseCsv(file.getBlob().getDataAsString());
到:
var rows = Utilities.parseCsv(file.getBlob().getDataAsString().replace(/\\/g, ''));
  • 通过此修改,我可以确认文件大小在您的脚本和修改后的脚本之间减少了 2 个字节。此外,当使用修改后的脚本对 CSV 数据使用上述检查脚本时,我可以确认所有行都没有值。

【讨论】:

  • @George 感谢您的回复。很高兴您的问题得到解决。
猜你喜欢
  • 2012-09-07
  • 1970-01-01
  • 2022-11-24
  • 2017-06-14
  • 1970-01-01
  • 1970-01-01
  • 2012-09-03
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多