【发布时间】:2018-05-04 21:14:34
【问题描述】:
我有一个相对较大的数据框,如下所示:
(我已经在这里上传了 csv 文件 - ufile.io/526t4)
value
0 [[1,92,"D"],[93,93,"C"],[94,113,"S"],[114,120,"C"],[121,181,"S"],[182,187,"C"],[188,292,"S"],[319,319,"S"],[320,353,"C"],[354,393,"D"]]
1 [[18,23,"D"],[24,27,"C"],[28,186,"S"],[187,198,"C"],[199,246,"S"]]
2 [[18,23,"D"],[24,27,"C"],[28,186,"S"],[187,198,"C"],[199,246,"S"]]
3 [[20,79,"D"]]
...
12352 [[25,36,"S"],[37,89,"C"],[90,115,"S"]]
12353 [[1,16,"D"],[17,407,"C"],[408,416,"D"]]
12354 [[16,21,"D"],[22,108,"C"],[109,123,"D"],[124,164,"C"],[165,421,"S"]]
12355 rows × 1 columns
我想创建一个包含所有“D”出现次数总和的新列
以第一行为例:
x = [[1,92,"D"],[93,93,"C"],[94,113,"S"],[114,120,"C"][121,181,"S"],182,187,"C"],[188,292,"S"],[319,319,"S"],[320,353,"C"],[354,393,"D"]]
new_colum_D = (sum([y[1]-y[0] for y in x if y[2]=="D"])) # applied for all rows
new_colum_D = 第一行的值为 130
我尝试了以下方法:
df['Column_D']=df["value"].apply(lambda x:sum([y[1]-y[0] for y in x if y[2]=="D"]))
但我收到以下消息:IndexError: string index out of range
IndexError Traceback (most recent call last)
<ipython-input-7-f7f23d42d4e5> in <module>()
----> 1 df['sum']=df["value"].apply(lambda x:sum([y[1]-y[0] for y in x if
y[2]=="D"]))
~\AppData\Local\conda\conda\envs\my_root\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
2549 else:
2550 values = self.asobject
-> 2551 mapped = lib.map_infer(values, f, convert=convert_dtype)
2552
2553 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-7-f7f23d42d4e5> in <lambda>(x)
----> 1 df['sum']=df["value"].apply(lambda x:sum([y[1]-y[0] for y in x if y[2]=="D"]))
<ipython-input-7-f7f23d42d4e5> in <listcomp>(.0)
----> 1 df['sum']=df["value"].apply(lambda x:sum([y[1]-y[0] for y in x if y[2]=="D"]))
IndexError: string index out of range
【问题讨论】:
-
第一行的
[114,120,"C"][121,181,"S"],182,187,"C"],应该是[114,120,"C"],[121,181,"S"],[182,187,"C"],吗? -
是的!谢谢,我会更新代码
标签: python python-3.x pandas dataframe lambda