【问题标题】:How do I merge multiple rows with partial duplicated data into one row but keep non-duplicated data?如何将具有部分重复数据的多行合并为一行但保留非重复数据?
【发布时间】:2018-05-27 03:33:15
【问题描述】:

我有一个包含数据的大数据表(超过 300K 行,其中包含 40 列),像这样的片段(所有值都是字符串):

colA colB colC ColD ColdE ColF ColG ColH
-------------------------------------------------- ------
A01 B01 C01 DA1 EA1 FA1 GA1 HA1
A01 B01 C01 DA2 EA2 FA2 GA2 HA2
A02 B02 C02 DA3 EA3 FA3 GA3 HA3
A02 B02 C02 DA4 EA4 FA4 GA4 HA4
A03 B03 C03 DA5 EA5 FA5 GA5 HA5
A04 B04 C04 DA6 EA6 FA6 GA6 HA6

有部分数据重复,我想通过使用 colA + ColB + ColC 作为键合并重复数据并保留 ColD ColE ColF,将第一行用于其他列。预期结果如下:

colA colB colC ColD1 colE1 colF1 colG1 ColD2 colE2 colF2 colG2 ColH
------------------------------------------------
A01 B01 C01 DA1 EA1 FA1 GA1 DA2 EA2 FA2 GA2 HA1
A02 B02 C02 DA3 EA3 FA3 GA3 DA4 EA4 FA4 GA4 HA3
A03 B03 C03 DA5 EA5 FA5 GA5 null null null null HA5
A04 B04 C04 DA6 EA6 FA6 GA6 null null null null HA6

类似于pivot,但有一些区别,我尝试在C#中使用T-SQL或LINQ,但不知道怎么做,请有人帮忙,非常感谢。

【问题讨论】:

    标签: linq datatable merge


    【解决方案1】:

    请注意,这不是通用解决方案,但可以在给定示例中使用。

    List<string[]> input = new List<string[]>()
    {
        new string[] {"A01","B01","CA1","DA1"},
        new string[] {"A01","B01","CA2","DA2"},
        new string[] {"A02","B02","CA3","DA3"},
        new string[] {"A02","B02","CA4","DA4"},
        new string[] {"A03","B03","CA5","DA5"},
        new string[] {"A04","B04","CA6","DA6"},
    };
    
    var grouped = input.GroupBy(x => new { key1 = x[0], key2 = x[1] }, (keys, group) => new
    {
        Key1 = keys.key1,
        Key2 = keys.key2,
        // skip(2) to prevent the keys to be added in the list
        Result = group.SelectMany(x => x.Skip(2)).ToList()
    });
    

    输出:

    { Key1 = "A01", Key2 = "B01", 结果 = ["CA1", "DA1, "CA2", "DA2"] }

    { Key1 => "A02", Key2 = "B02", 结果 = ["CA3", "DA3, "CA4", "DA4"] }

    { Key1 = "A03", Key2 = "B03", 结果 = ["CA5", "DA5"] }

    { Key1 = "A04", Key2 = "B04", 结果 = ["CA6", "DA6"] }

    【讨论】:

    • 感谢您的帮助,我已经更新了我的问题,力求更清楚,您能帮忙吗?
    【解决方案2】:

    听起来像是 ExpandoObject 的工作

    依靠您提供的输入记录

    var input = new DataTable();
    input.Columns.Add("ColA");
    input.Columns.Add("ColB");
    input.Columns.Add("ColC");
    input.Columns.Add("ColD");
    input.Rows.Add("A01", "B01", "CA1", "DA1");
    input.Rows.Add("A01", "B01", "CA2", "DA2");
    input.Rows.Add("A02", "B02", "CA3", "DA3");
    input.Rows.Add("A02", "B02", "CA4", "DA4");
    input.Rows.Add("A03", "B03", "CA5", "DA5");
    input.Rows.Add("A04", "B04", "CA6", "DA6");
    

    您可以将记录转换为动态可扩展对象

    public IDictionary<string, Object> Map(DataRow row)
    {
        var columns = row.Table.Columns;
        var result = new ExpandoObject() as IDictionary<string, Object>;
        for (var index = 0; index < row.ItemArray.Count(); index++)
        {
            result.Add($"{columns[index]}", row[index]);
        }
        return result;
    }
    

    然后是一些逻辑,旨在按标记元素对输入进行分组并在需要的地方展开

    var seed = new List<IDictionary<string, Object>>();
    var output = input
        .AsEnumerable()
        .Select(Map)
        .Aggregate(seed, (results, current)=>
        {
            // Check if the current values match any of the first element in the results
            var query = from result in results
                        let marker = result
                            .Select(p => p.Value)
                            .FirstOrDefault()
                        where current.Values.Contains(marker)
                        select result;
    
            var found = query.SingleOrDefault();
            if (found == null)
            {
                // None were found then simply append the current values
                results.Add(current);
            }
            else
            {
                // Some were found then isolate the new ones
                var others = from value in current.Values
                             where !found.Values.Contains(value)
                             select value;
    
                // Append the new ones to the found result
                foreach (var value in others)
                {
                    var index = found.Values.Count;
                    found.Add($"Col{index}".ToString(), value);
                }
            }
    
            return results;
        });
    

    最终的结果会是这样的

    查看gist 查看完整图片

    【讨论】:

    • 感谢您的帮助,我已经更新了我的问题,试图更清楚,您能帮忙吗?
    • 你能说得更具体点吗?该解决方案产生了预期的结果。不是吗?
    • 我无法将样本中的输入替换为数据表,出现“无法从用法中推断出。尝试显式指定类型参数”之类的错误,我需要保留列名以进行隐蔽结果返回数据表。感谢您的帮助。
    • 您已经劫持了问题的范围,并通过您的编辑将其转移到数据表中:) stackoverflow.com/posts/47789723/revisions
    • 尽管如此,请查看我使用数据表作为数据输入的更新答案
    猜你喜欢
    • 2017-06-08
    • 2020-04-30
    • 2021-12-29
    • 1970-01-01
    • 1970-01-01
    • 2015-11-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多