【问题标题】:python ordered dict issuepython有序dict问题
【发布时间】:2015-08-07 08:40:35
【问题描述】:

如果我有一个 CSV 文件,其中每一行都有一个字典值(列是 ["Location"]、["MovieDate"]、["Formatted_Address"]、["Lat"]、["Lng"] ),如果我想按Location 分组并附加到共享相同Location 值的所有MovieDate 值上,我被告知使用OrderDict。

数据前:

Location,MovieDate,Formatted_Address,Lat,Lng
    "Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
    "Edgebrook Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

对于具有相同位置的每一行(如本例中的 ^),我想进行这样的输出,以便没有重复的位置。

 "Edgebrook Park, Chicago ","Jun-7 A League of Their Own Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

我的代码使用ordereddict 执行此操作有什么问题?

from collections import OrderedDict

od = OrderedDict()
import csv
with open("MovieDictFormatted.csv") as f,open("MoviesCombined.csv" ,"w") as out:
    r = csv.reader(f)
    wr = csv.writer(out)
    header = next(r)
    for row in r:
        loc,rest = row[0], row[1]
        od.setdefault(loc, []).append(rest)
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc]+vals)

我最终得到的是这样的:

['Edgebrook Park, Chicago ', 'Jun-7 A League of Their Own']
['Gage Park, Chicago ', "Jun-9 It's a Mad, Mad, Mad, Mad World"]
['Jefferson Memorial Park, Chicago ', 'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers ']
['Commercial Club Playground, Chicago ', 'Jun-12 Despicable Me 2']

问题是在这种情况下我没有让其他列显示,我该怎么做才能最好?我还希望将 MovieDate 值设置为一个长字符串,如下所示: 'Jun-12 Monsters University Jul-11 Frozen Aug-8 The Blues Brothers ' 而不是:

'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers '

谢谢各位,不胜感激。我是python菜鸟。

不幸的是,将 row[0], row[1] 更改为 row[0], row[1:] 并不能满足我的需求。我只想在第二列 (MovieDate) 中添加值,而不是像这样复制所有其他列:

['Jefferson Memorial Park, Chicago ', ['Jun-12 Monsters University ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Jul-11 Frozen ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Aug-8 The Blues Brothers ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353']]

【问题讨论】:

  • 具体出了什么问题?你得到不正确的输出吗?您收到错误消息吗?我们需要更多细节。
  • 嘿@user2357112,我更新了它-抱歉问题不完整。
  • rest 应该是整行的其余部分吗?因为row[1]只是第二列的东西。
  • 是的,这是一个误导性的标题,我会更改。 row[1] 是正确的,也是我们唯一要附加的内容。
  • 如果您只存储了row[0]row[1],为什么您希望其他列中的任何数据都显示在输出中?

标签: python dictionary ordereddictionary


【解决方案1】:

您只需要进行一些更改,您需要加入 lat 和 long,要删除重复的 lat 和 long,我们还需要使用它作为键:

with open("data.csv") as f,open("new.csv" ,"w") as out:
    r = csv.reader(f)
    wr= csv.writer(out)
    header = next(r)
    for row in r:
        od.setdefault((row[0], row[-2], row[-1]), []).append(" ".join(row[1:-2]))
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc[0]] + vals+list(loc[1:]))

输出:

Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA","Jun-9 It's a Mad, Mad, Mad, Mad World Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

A League of Their Own 是第一位的,因为它出现在 mad,mad 行之前, row[1:-2] 获取除 lat、long 和 location 之外的所有内容,我们将 lat 和 long 存储在我们的键元组中,以避免在每行末尾重复写入。

使用名称和解包可能会更容易理解:

with open("data.csv") as f, open("new.csv", "w") as out:
    r = csv.reader(f)
    wr = csv.writer(out)
    header = next(r)
    for row in r:
        loc, mov, form, lat, long = row
        od.setdefault((loc, lat, long), []).append("{} {}".format(mov, form))
    wr.writerow(header)
    for loc, vals in od.items():
        wr.writerow([loc[0]] + vals + list(loc[1:]))

使用 csv.Dictwriter 保留五列:

od = OrderedDict()
import csv

with open("data.csv") as f, open("new.csv", "w") as out:
    r = csv.DictReader(f,fieldnames=['Location', 'MovieDate', 'Formatted_Address', 'Lat', 'Lng'])
    wr = csv.DictWriter(out, fieldnames=r.fieldnames)
    for row in r:
        od.setdefault(row["Location"], dict(Location=row["Location"], Lat=row["Lat"], Lng=row["Lng"],
                                        MovieDate=[], Formatted_Address=row["Formatted_Address"]))

        od[row["Location"]]["MovieDate"].append(row["MovieDate"])
    for loc, vals in od.items():
        od[loc]["MovieDate"]= ", ".join(od[loc]["MovieDate"])
        wr.writerow(vals)

# 输出:

"Edgebrook Park, Chicago ","Jun-7 A League of Their Own, Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

所以五列保持不变,我们将"MovieDate" 加入到单个字符串中,Formatted_Address=form 始终是唯一的,因此我们不需要更新它。

事实证明,我们需要做的只是连接 MovieDate's 并删除 Location、Lat、Lng 和 'Formatted_Address' 的重复条目。

【讨论】:

    【解决方案2】:

    让我们尝试改变

    od.setdefault(loc, []).append(rest) 
    

    od[loc] = ' '.join([od.get(loc, ''), ' 'join(rest)])
    

    然后保持原样:

    wr.writerow([loc]+vals)
    

    【讨论】:

    • 尽管如此,我也复制了其他列:['Jefferson Memorial Park, Chicago', ['Jun-12 Monsters University', 'Jefferson Memorial Park, 4822 North Long Avenue , Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Jul-11 Frozen', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', ' -87.6294353'], ['Aug-8 The Blues Brothers', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353']]
    • 我已经用我认为你要问的内容更新了我的答案。让我知道结果如何。谢谢!
    • 嘿@Misandrist 不幸的是,这不好。回击:TypeError: sequence item 0: expected string, list found
    • 我在最初添加到字典时再次更改了答案以进行连接。让我们看看它是如何工作的。
    • 该死的,同样的错误 Misandrist。 TypeError: sequence item 1: expected string, list found
    【解决方案3】:

    假设位置是该行的第一项:

    dict = {}
    for line in f:
        if line[0] not in dict:
            dict[line[0]] = []
        dict[line[0]].append(line[1:])
    

    对于每个位置,您都拥有整行的其余部分

    for key, value in dict.iteritems():
        out.write(key + value)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2018-04-01
      • 2023-03-18
      • 2016-06-03
      • 2019-05-27
      • 1970-01-01
      • 2016-10-29
      • 1970-01-01
      相关资源
      最近更新 更多