【问题标题】:Power Query. Merge duplicated lines in a row with collapsing values电源查询。合并具有折叠值的行中的重复行
【发布时间】:2021-09-12 14:59:00
【问题描述】:

我有一个包含 Stops、Time_in 和 Time_out 的巴士时刻表。有时在我的数据中 Stops 重复(连续),我需要合并它们,只留下第一个 Time_in 和最后一个 Time_out。

下面是一个例子:

Stop Time_in Time_out
23rd Street 15:23 15:27
42nd Street 15:35 15:40
42nd Street 15:42 15:48
47th Street 15:56 16:10
42nd Street 16:14 16:19

想要的结果:

Stop Time_in Time_out
23rd Street 15:23 15:27
42nd Street 15:35 15:48
47th Street 15:56 16:10
42nd Street 16:14 16:19

不胜感激,提前致谢。

【问题讨论】:

    标签: excel merge duplicates concatenation powerquery


    【解决方案1】:

    在 powerquery 中,右键单击列 Stop,然后 Group By....

    选择添加分组

    对于 Time_in 列的第一行选取操作最小值

    对于第二行,选择 Time_out 列上的操作最大值

    如果需要,将类型编号更改为在公式栏中或主页中输入时间...高级编辑器..

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Stop", type text}, {"Time_in", type time}, {"Time_out", type time}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Stop"}, {{"Time_in", each List.Min([Time_in]), type time}, {"Time_out", each List.Max([Time_out]), type time}})
    in  #"Grouped Rows"
    

    对于 Stops 可以重复的新要求,我们首先创建一个组号,以确保 Stops 在合并之前位于相邻行中

    添加列索引列

    添加列,带公式的自定义列

    = try if #"Added Index"{[Index]}[Stop] = #"Added Index"{[Index]-1}[Stop] then null else [Index] otherwise [Index]
    

    右键单击新列并向下填充

    单击停止和自定义列并在其上分组

    选择添加聚合

    对于 Time_in 列的第一行选取操作最小值

    对于第二行,选择 Time_out 列上的操作最大值。

    示例代码:

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Stop", type text}, {"Time_in", type time}, {"Time_out", type time}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try if #"Added Index"{[Index]}[Stop] = #"Added Index"{[Index]-1}[Stop] then null else [Index] otherwise [Index]),
    #"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
    #"Grouped Rows" = Table.Group(#"Filled Down", {"Stop", "Custom"}, {{"Time_in", each List.Min([Time_in]), type time}, {"Time_out", each List.Max([Time_out]), type time}}),
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Custom"})
    in #"Removed Columns"
    

    【讨论】:

    • 你好骑马,非常感谢!您能否提出一个解决方案,其中可以重复 Stop 列中的值(因此我们不能使用 group by,否则进一步的停止将消失)。我需要从仅连续重复的 Stops 中删除值。公共汽车可以访问相同的站点,我不想失去它们
    • 很抱歉没有立即澄清这一点,我修复了最初的帖子
    • 这正是我所需要的!!非常优雅,非常感谢!
    • 谢谢。然后请切换箭头之间的复选标记以接受答案
    【解决方案2】:

    电源查询

        let
        Source = Web.BrowserContents("https://stackoverflow.com/questions/68194967/power-query-merge-duplicated-rows-with-collapsing-values"),
        #"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(1)"}, {"Column2", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(2)"}, {"Column3", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(3)"}}, [RowSelector="DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR"]),
        #"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
        #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Stop", type text}, {"Time_in", type time}, {"Time_out", type time}}),
        #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Time_out"}),
        #"Grouped Rows" = Table.Group(#"Removed Columns", {"Stop"}, {{"ad_1", each _, type table [Stop=nullable text, Time_in=nullable time]}}),
        #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each let x= [ad_1],
     #"Removed Columns1" = Table.RemoveColumns(x,{"Stop"}),
        #"Sorted Rows" = Table.Sort(#"Removed Columns1",{{"Time_in", Order.Ascending}}),
        #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
        #"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Index] = 1)),
        #"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
    in
        #"Removed Columns2"),
        #"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"ad_1"}),
        #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns1", "Custom", {"Time_in"}, {"Time_in"}),
        Custom1 = Table.RemoveColumns(#"Changed Type",{"Time_in"}),
        #"Grouped Rows1" = Table.Group(Custom1, {"Stop"}, {{"ad_2", each _, type table [Stop=nullable text, Time_out=nullable time]}}),
        Custom2 = Table.AddColumn(#"Grouped Rows1", "Custom", each let x= [ad_2],
     #"Removed Columns1" = Table.RemoveColumns(x,{"Stop"}),
        #"Sorted Rows" = Table.Sort(#"Removed Columns1",{{"Time_out", Order.Descending}}),
        #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
        #"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Index] = 1)),
        #"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
    in
        #"Removed Columns2"),
        #"Removed Columns2" = Table.RemoveColumns(Custom2,{"ad_2"}),
        #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", {"Time_out"}, {"Time_out"}),
        #"Merged Queries" = Table.NestedJoin(#"Expanded Custom", {"Stop"}, #"Expanded Custom1", {"Stop"}, "Expanded Custom1", JoinKind.LeftOuter),
        #"Expanded Expanded Custom1" = Table.ExpandTableColumn(#"Merged Queries", "Expanded Custom1", {"Time_out"}, {"Time_out"})
    in
        #"Expanded Expanded Custom1"
    

    DAX

    min:= MIN('Table 1'[Time_in])
    max:= MAX('Table 1'[Time_out])
    

    DAX 结果

    【讨论】:

    • 您好smpa01,非常感谢!我尝试了您的解决方案,并意识到我错过了所有重复的停止,但我需要从仅连续重复的停止中删除值。公共汽车可以访问相同的站点,我不想失去它们,只需要折叠其中一些。很抱歉没有立即澄清这一点
    猜你喜欢
    • 2023-03-12
    • 2016-04-27
    • 1970-01-01
    • 2017-06-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多