【发布时间】:2019-02-06 03:35:33
【问题描述】:
我需要经常生成 CSV 格式的输出。该文件有 25 列。列顺序需要保持不变,因为文件将用作 ETL 过程的输入。请注意,无法将 ETL 配置为查找列标题。
提供给我的数据也是 CSV 格式。列数可能会改变。它可以在 15 到 50 列之间变化。列顺序也可能改变。一个文件可能有 Col A、Col B、Col C,然后另一个文件可能有 Col B、Col A、Col D。
输入数据如下:
标题:
"Employee","Date","Start","End","Brk","Ord","WEND","TRV","PH200","PH250","T1x5","T1x5P","T2","T2x5","SB2","AL","LL175","COM","LWOP","PERS","PHNW","WCOMP","MKUP","AS","NS","CI","NPAY","COC","KMS","LAFHA","LAFHI","MEAL","MPA","OMA","SMA","Allowances","Approval Status","Branch","Branch Cost Code","Branch Ref. No","Contract","Contract Hours Per Cycle","Detail","Detail Cost Code","Detail Ref. No","Employee Ref. No","Employment Type","Location","Location Cost Code","Location Ref. No","Max Hours Per Period","Pay Group","Pay Level","Role","Role Cost Code","Roster","Shift","Timesheet Comments","Total Bill","Total Cost","Total Hours","Work Type"
数据:
"Smith, John","04/11/2017","12:00","05:00","",10.00,"","","",,"",,"",,"",,"",,"",,"",,3.00,7.00,"",,"",,"",1.00,"",,"",,"MEAL","Approved","Melbourne","","","Admin Officer","70.00","JX526","","1469","948","AT","Melbourne","633","","70.00","","Base","Admin Officer Level 1","7847000","1900-0500","DS","","0.00","351.95","10.00",""
作为表格:
| Employee | Date | Start | End | Brk | Ord | WEND | TRV | PH200 | PH250 | T1x5 | T1x5P | T2 | T2x5 | SB2 | AL | LL175 | COM | LWOP | PERS | PHNW | WCOMP | MKUP | AS | NS | CI | NPAY | COC | KMS | LAFHA | LAFHI | MEAL | MPA | OMA | SMA | Allowances | Approval Status | Branch | Branch Cost Code | Branch Ref. No | Contract | Contract Hours Per Cycle | Detail | Detail Cost Code | Detail Ref. No | Employee Ref. No | Employment Type | Location | Location Cost Code | Location Ref. No | Max Hours Per Period | Pay Group | Pay Level | Role | Role Cost Code | Roster | Shift | Timesheet Comments | Total Bill | Total Cost | Total Hours | Work Type |
|------------- |------------ |------- |------- |----- |------- |------ |----- |------- |------- |------ |------- |---- |------ |----- |---- |------- |----- |------ |------ |------ |------- |------ |------ |------ |---- |------ |----- |----- |------- |------- |------ |----- |----- |----- |------------ |----------------- |----------- |------------------ |---------------- |--------------- |-------------------------- |-------- |------------------ |---------------- |------------------ |----------------- |----------- |-------------------- |------------------ |---------------------- |----------- |----------- |----------------------- |---------------- |----------- |------- |-------------------- |------------ |------------ |------------- |----------- |
| Smith, John | 04/11/2017 | 12:00 | 05:00 | | 10.00 | | | | | | | | | | | | | | | | | | 3.00 | 7.00 | | | | | | 1.00 | | | | | MEAL | Approved | Melbourne | | | Admin Officer | 70.00 | JX526 | | 1469 | 948 | AT | Melbourne | 633 | | 70.00 | | Base | Admin Officer Level 1 | 7847000 | 1900-0500 | DS | | 0.00 | 351.95 | 10.00 | |
由于我找不到任何现成的解决方案,因此我正在尝试使用 Access VBA 创建一个小工具来完成此任务。
在导入端,我尝试了两种标准方法:
1. DoCmd.TransferText
2. CurrentDb.Execute "INSERT INTO " & TableName & " SELECT * FROM " _
& "[TEXT;FMT=Delimited;HDR=YES;database=" & FolderOnly & "].[" & FileOnly & "]")
两者都没有很好地工作。数字四舍五入到最接近的小数。我发现没有办法强制将数据作为文本导入。所以现在我正在生成 SQL,所以我可以完全控制数据类型。使用文件系统对象,我可以读取文件。第一步是循环并生成 CREATE TABLE 脚本。 VBA Split 函数可以很好地使用逗号作为分隔符:
While Not objTextStream.AtEndOfStream
strLine = objTextStream.ReadLine
'regex.pattern =
If objTextStream.line = 2 And Len(strLine) > 0 Then
strSQL = "CREATE TABLE " & TableName & " ("
header = Split(strLine, ",")
For i = LBound(header) To UBound(header)
strSQL = strSQL & "[" & Replace(Replace(header(i), Chr(34), ""), ".", "") & "] TEXT(255)"
headerLine = headerLine & "[" & Replace(Replace(header(i), Chr(34), ""), ".", "") & "]"
If i <> UBound(header) Then
strSQL = strSQL & ","
headerLine = headerLine & ","
End If
Next i
strSQL = strSQL & ")"
'Debug.Print strSql
DBEngine(0)(0).Execute strSQL
End If
Wend
第二步是生成 INSERT 语句。类似于以下内容:
While Not objTextStream.AtEndOfStream
If objTextStream.line > 2 And Len(strLine) > 0 Then
strSQL = "INSERT INTO " & TableName & " (" & headerLine & ") VALUES ("
line = Split(strLine, """,""") 'Regex??
For i = LBound(line) To UBound(line)
If Nz(line(i)) <> "" Then
strSQL = strSQL & "'" & Replace(Replace(line(i), Chr(34), ""), "'", "''") & "'"
Else
strSQL = strSQL & "''"
End If
If i <> UBound(line) Then strSQL = strSQL & ","
Next i
strSQL = strSQL & ")"
'Debug.Print strSQL
CurrentDb.Execute strSQL
End If
Wend
我被困在这里,因为我不能使用以逗号作为分隔符的 Split 函数。某些字段(例如 Employee)包含逗号,因为名称以 Family_Name、First_Name 格式输出。我想过正则表达式,但不确定如何在 VBA 中使用它。有人可以提出解决方案吗?
【问题讨论】:
-
导入文件不能有不同的列。
-
如果您的代码能够使用动态导入方案将数据导入到具有不同列集的表中,您希望如何创建一个包含修复列的 CSV 文件?
-
@WolfgangKais 导入不同的列是这里的全部想法。一旦数据以正确的格式在 Access 中,它实际上很容易导出。有几种方法,其中之一是在以正确顺序显示数据的 SELECT 查询上使用 DoCmd.TransferText acExportDelim。 SELECT 查询还可以为任何缺失的列添加虚拟数据
-
那么有“完整的列集”吗?为什么您的供应商不能遵守该规范?如果我是你,我宁愿退出他的合同,而不是调整调整并创建一个必须通过动态更改设计为基于静态规范的对象来动态处理他的动态输出的解决方案。
标签: regex vba csv ms-access import