【问题标题】:How can I reformat a CSV file into a Google Calendar Format?如何将 CSV 文件重新格式化为 Google 日历格式?
【发布时间】:2011-09-01 12:30:23
【问题描述】:

所以在做了一些研究之后,我能够找到将 CSV 文件转换为所需的格式

Subject,Start Date,Start Time,End Date,End Time,All Day Event,Description,Location,Private

问题是,我使用的 CSV 导出格式或顺序不正确,收集该信息的最佳方法是什么?这是我的一点来源。

名称、用户名、行类型、开始日期、开始时间、结束时间、结束日期、段开始日期、类型

"Smith, John J",jjs,Shift,5/29/2011,9:30,17:30,5/29/2011,5/29/2011,常规

"Smith, John J",jjs,Shift,5/30/2011,13:30,17:30,5/30/2011,5/30/2011,常规

    Dim Name As String = ""
    Dim UserName As String = ""

    Dim Data As String = """Smith, John J"",jj802b,Shift,5/29/2011,9:30,17:30,5/29/2011,5/29/2011,Transfer"

    For r As Integer = 1 To 10
        Name = Data.Substring(0, Data.LastIndexOf(""""))
        Data = Data.Remove(0, Data.LastIndexOf(""""))
        UserName = Data.Substring(Data.LastIndexOf(""""), ",")
    Next

【问题讨论】:

  • 您的输入数据似乎没有 Google 格式要求的所有字段。您打算从哪里获取丢失的数据?
  • 导出数据时要小心,因为您的输出必须是可接受的。例如,如果您读取的名称是 John "Smarty" Pants,您可能需要生成输出为 ""John ""Smartie"" Pants"", ... .

标签: vb.net parsing csv google-calendar-api text-parsing


【解决方案1】:

以下是解决方法

Dim Name As String = ""
Dim UserName As String = ""

Dim Data As String = """Smith, John J"",jj802b,Shift,5/29/2011,9:30,17:30,5/29/2011,5/29/2011,Transfer"

For r As Integer = 1 To 10
    Dim DataArr() As String = DecodeCSV(Data) 'Use DecodeCSV function to regex split the string 
    Name = DataArr(0) 'Get First item of array as Name
    UserName = DataArr(1)  'Get Second item of array as UserName 
Next

DecodeCSV by Tim 的好代码

Public Shared Function DecodeCSV(ByVal strLine As String) As String()

    Dim strPattern As String
    Dim objMatch As Match

    ' build a pattern
    strPattern = "^" ' anchor to start of the string
    strPattern += "(?:""(?<value>(?:""""|[^""\f\r])*)""|(?<value>[^,\f\r""]*))"
    strPattern += "(?:,(?:[ \t]*""(?<value>(?:""""|[^""\f\r])*)""|(?<value>[^,\f\r""]*)))*"
    strPattern += "$" ' anchor to the end of the string

    ' get the match
    objMatch = Regex.Match(strLine, strPattern)

    ' if RegEx match was ok
    If objMatch.Success Then
        Dim objGroup As Group = objMatch.Groups("value")
        Dim intCount As Integer = objGroup.Captures.Count
        Dim arrOutput(intCount - 1) As String

        ' transfer data to array
        For i As Integer = 0 To intCount - 1
            Dim objCapture As Capture = objGroup.Captures.Item(i)
            arrOutput(i) = objCapture.Value

            ' replace double-escaped quotes
            arrOutput(i) = arrOutput(i).Replace("""""", """")
        Next

        ' return the array
        Return arrOutput
    Else
        Throw New ApplicationException("Bad CSV line: " & strLine)
    End If

End Function

【讨论】:

    【解决方案2】:

    根据 CSV 文件的确切内容和格式保证,为了速度和易用性,有时在 , 上使用 split 是解析文件的最简单和最快的方法。你的名称 col 包含一个不是分隔符的 ,,虽然假设名称始终包含 1 ,,但处理这种情况仍然很简单,但这会增加一点复杂性。

    有一些库可以解析 CSV 文件,这很有用。假设您不需要处理所有符合 CSV 规范的文件,我觉得它们太过分了。综上所述,您可以使用以下 regular expression 轻松解析具有命名组的 CSV 文件以说服:

    "(?&lt;Name&gt;[^"]+?)",(?&lt;UserName&gt;[^,]+?),(?&lt;RowType&gt;[^,]+?),(?&lt;StartDate&gt;[^,]+?),(?&lt;StartTime&gt;[^,]+?),(?&lt;EndTime&gt;[^,]+?),(?&lt;EndDate&gt;[^,]+?),(?&lt;SegmentStartDate&gt;[^,]+?),(?&lt;Type&gt;\w+)

    这将创建命名的捕获组,然后您可以使用它来输出到新的 CSV 文件,如下所示:

    Dim ResultList As StringCollection = New StringCollection()
    Try
        Dim RegexObj As New Regex("""(?<Name>[^""]+?)"",(?<UserName>[^,]+?),(?<RowType>[^,]+?),(?<StartDate>[^,]+?),(?<StartTime>[^,]+?),(?<EndTime>[^,]+?),(?<EndDate>[^,]+?),(?<SegmentStartDate>[^,]+?),(?<Type>\w+)", RegexOptions.IgnoreCase)
        Dim MatchResult As Match = RegexObj.Match(SubjectString)
        While MatchResult.Success
            'Append to new CSV file - MatchResult.Groups("groupname").Value
    
            'Name = MatchResult.Groups("Name").Value
            'Start Time = MatchResult.Groups("StartTime").Value         
            'End Time = MatchResult.Groups("EndTime").Value
            'Etc...
        End While
    Catch ex As ArgumentException
        'Syntax error in the regular expression
    End Try
    

    有关详细信息,请参阅 MSDN 上的 .NET Framework Regular Expressions

    【讨论】:

      【解决方案3】:

      我想说明几点:

      • 一个是我正在使用 TextFieldParser,你可以找到 在FileIO 命名空间下,工作 使用输入 CSV。这使得 读取分隔文件更容易 而不是试图处理常规 表达式和你自己的解析, 等
      • 另一个是存储 我正在使用List(Of Dictionary(Of String, String)) 或 相关词典列表 字符串到其他字符串。本质上 这与 DataTable 的访问模式和 如果您对此更满意 构造,欢迎使用 反而。词典列表 行为完全相同且需要 少了很多设置,所以在这里使用 取而代之。

      我承认其中一些是硬编码的,但如果您需要概括该过程,您可以将某些方面移出应用程序设置和/或更好地分解功能。这里的重点是给你一个大致的想法。代码在下面内联注释:

          ' Create a text parser object
          Dim theParser As New FileIO.TextFieldParser("C:\Path\To\theInput.csv")
      
          ' Specify that fields are delimited by commas
          theParser.Delimiters = {","}
      
          ' Specify that strings containing the delimiter are wrapped by quotes
          theParser.HasFieldsEnclosedInQuotes = True
      
          ' Dimension containers for the field names and the list of data rows
          ' Initialize the field names with the first row r
          Dim theInputFields As String() = theParser.ReadFields(),
              theInputRows As New List(Of Dictionary(Of String, String))()
      
          ' While there is data to parse
          Do While Not theParser.EndOfData
      
              ' Dimension a counter and a row container
              Dim i As Integer = 0,
                  theRow As New Dictionary(Of String, String)()
      
              ' For each field
              For Each value In theParser.ReadFields()
      
                  ' Associate the value of that field for the row
                  theRow(theInputFields(i)) = value
      
                  ' Increment the count
                  i += 1
              Next
      
              ' Add the row to the list
              theInputRows.Add(theRow)
          Loop
      
          ' Close the input file for reading
          theParser.Close()
      
          ' Dimension the list of output field names and a container for the list of formatted output rows
          Dim theOutputFields As New List(Of String) From {"Subject", "Start Date", "Start Time", "End Date", "End Time", "All Day Event", "Description", "Location", "Private"},
              theOutputRows As New List(Of Dictionary(Of String, String))()
      
          ' For each data row we've extracted from the CSV
          For Each theRow In theInputRows
      
              ' Dimension a new formatted row for the output
              Dim thisRow As New Dictionary(Of String, String)()
      
              ' For each field name of the output rows
              For Each theField In theOutputFields
      
                  ' Dimension a container for the value of this field
                  Dim theValue As String = String.Empty
      
                  ' Specify ways to get the value of the field based on its name
                  ' These are just examples; choose your own method for formatting the output
                  Select Case theField
      
                      Case "Subject"
                          ' Output a subject "[Row Type]: [Name]"
                          theValue = theRow("Row Type") & ": " & theRow("Name")
      
                      Case "Description"
                          ' Output a description from the input field [Type]
                          theValue = theRow("Type")
      
                      Case "Start Date", "Start Time", "End Date", "End Time"
                          ' Output the value of the field with a correlated name
                          theValue = theRow(theField)
      
                      Case "All Day Event", "Private"
                          ' Output False by default (you might want to change the case for Private
                          theValue = "False"
      
                      Case "Location"
                          ' Can probably be safely left empty unless you'd like a default value
                  End Select
      
                  ' Relate the value we've created to the column in this row
                  thisRow(theField) = theValue
              Next
      
              ' Add the formatted row to the output data
              theOutputRows.Add(thisRow)
          Next
      
          ' Start building the first line by retriving the name of the first output field
          Dim theHeader As String = theOutputFields.First
      
          ' For each of the remaining output fields
          For Each theField In (From s In theOutputFields Skip 1)
      
              ' Append a comma and then the field name
              theHeader = theHeader & "," & theField
          Next
      
          ' Create a string builder to store the text for the output file, initialized with the header line and a line break
          Dim theOutput As New System.Text.StringBuilder(theHeader & vbNewLine)
      
          ' For each row in the formatted output rows
          For Each theRow In theOutputRows
      
              ' Dimension a container for this line of the file, beginning with the value of the column associated with the first output field
              Dim theLine As String = theRow(theOutputFields.First)
      
              ' Wrap the first value if necessary
              If theLine.Contains(",") Then theLine = """" & theLine & """"
      
              ' For each remaining output field
              For Each theField In (From s In theOutputFields Skip 1)
      
                  ' Dereference and store the associated column value
                  Dim theValue As String = theRow(theField)
      
                  ' Add a comma and the value to the line, wrapped in quotations as needed
                  theLine = theLine & "," & If(theValue.Contains(","), """" & theValue & """", theValue)
              Next
      
              ' Append the line to the output string
              theOutput.AppendLine(theLine)
          Next
      
          ' Write the formatted output to file
          IO.File.WriteAllText("C:\output.csv", theOutput.ToString)
      

      不管怎样,使用您的示例数据似乎可以在 OpenOffice.org Calc 中使用此代码很好地打开输出文件。您希望为字段输出的格式由您决定,因此请修改 Select 中相应的 Case 语句以执行此操作,祝您编码愉快!

      【讨论】:

        猜你喜欢
        • 2020-10-28
        • 2015-12-07
        • 1970-01-01
        • 1970-01-01
        • 2018-12-08
        • 1970-01-01
        • 1970-01-01
        • 2016-10-04
        • 2016-01-16
        相关资源
        最近更新 更多