【问题标题】:Parse the OUTLOOK.HOL file into CSV将 OUTLOOK.HOL 文件解析为 CSV
【发布时间】:2021-02-22 19:51:52
【问题描述】:

OUTLOOK.HOL(假日)文件结构如下:

[Portugal] 207
All Saints' Day,2021/11/1
All Saints' Day,2022/11/1
Assumption,2021/8/15
Assumption,2022/8/15
Carnival,2021/2/16
Carnival,2022/3/1

[Puerto Rico] 489
Birthday of Eugenio María de Hostos,2021/1/11
Birthday of Eugenio María de Hostos,2022/1/10
Birthday of José de Diego,2021/4/19
Birthday of José de Diego,2022/4/18
Birthday of Don Luis Muñoz Rivera,2021/7/19
Birthday of Don Luis Muñoz Rivera,2022/7/18

[Qatar] 118
...

如何使用 PowerShell 将文件解析为结构化数据以获取带有标题的 CSV 文件:

国家;编号;假日名称;日期

/迈克尔

【问题讨论】:

  • 您尝试过什么,您尝试过什么失败了?理想情况下,您应该提供一个minimal reproducible example 来说明您的尝试,并包含有关它如何失败的具体信息,以及错误消息和/或错误输出。 Stack Overflow 不是代码编写服务;最好的问题是那些提供有用信息的问题,以便回答的人可以指导您设计自己的正确答案。请参阅How to Ask 一个好问题。
  • 我们还没有收到您的来信.. 我的回答解决了您的问题吗?如果是这样,请通过单击左侧的 图标来考虑accepting。这将帮助其他有类似问题的人更轻松地找到它。

标签: powershell csv parsing structure


【解决方案1】:

您需要逐一遍历文件中的所有行并使用正则表达式解析不同的“字段”。

$result = switch -Regex -File 'D:\Test\outlook.hol' {
    '^\[([^\]]+)\]\s+(\d+)' { 
        $country = $matches[1]
        $number = $matches[2]
    }
    '^([^,]+),(\d{4}/\d{1,2}/\d{1,2})$' { 
        # found a data line, output a PSObject
        [PsCustomObject]@{
            Country      = $country
            Number       = $number
            Holiday_name = $matches[1]
            Date         = $matches[2]
        }
    }
}

# output on screen
$result | Format-Table -AutoSize

# output to CSV file
$result | Export-Csv -Path 'D:\Test\OutlookHolidays.csv' -NoTypeInformation -Encoding UTF8

输出(在屏幕上)

Country     Number Holiday_name                        Date     
-------     ------ ------------                        ----     
Portugal    207    All Saints' Day                     2021/11/1
Portugal    207    All Saints' Day                     2022/11/1
Portugal    207    Assumption                          2021/8/15
Portugal    207    Assumption                          2022/8/15
Portugal    207    Carnival                            2021/2/16
Portugal    207    Carnival                            2022/3/1 
Puerto Rico 489    Birthday of Eugenio María de Hostos 2021/1/11
Puerto Rico 489    Birthday of Eugenio María de Hostos 2022/1/10
Puerto Rico 489    Birthday of José de Diego           2021/4/19
Puerto Rico 489    Birthday of José de Diego           2022/4/18
Puerto Rico 489    Birthday of Don Luis Muñoz Rivera   2021/7/19
Puerto Rico 489    Birthday of Don Luis Muñoz Rivera   2022/7/18

正则表达式 1 详细信息:

^                  Assert position at the beginning of the string
\[                 Match the character “[” literally
(                  Match the regular expression below and capture its match into backreference number 1
   [^\]]           Match any character that is NOT a “A ] character”
      +            Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)                 
\]                 Match the character “]” literally
\s                 Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
   +               Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(                  Match the regular expression below and capture its match into backreference number 2
   \d              Match a single digit 0..9
      +            Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)

正则表达式 2 详细信息:

^                 Assert position at the beginning of the string
(                 Match the regular expression below and capture its match into backreference number 1
   [^,]           Match any character that is NOT a “,”
      +           Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)                
,                 Match the character “,” literally
(                 Match the regular expression below and capture its match into backreference number 2
   \d             Match a single digit 0..9
      {4}         Exactly 4 times
   /              Match the character “/” literally
   \d             Match a single digit 0..9
      {1,2}       Between one and 2 times, as many times as possible, giving back as needed (greedy)
   /              Match the character “/” literally
   \d             Match a single digit 0..9
      {1,2}       Between one and 2 times, as many times as possible, giving back as needed (greedy)
)                
$                 Assert position at the end of the string (or before the line break at the end of the string, if any)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2014-10-19
    • 2023-03-23
    • 2011-11-17
    • 2016-10-29
    • 2018-10-01
    • 1970-01-01
    • 2022-01-26
    相关资源
    最近更新 更多