zip(,) 字符串浮动？ [关闭]答案

【问题标题】：zip(,) string to float? [closed]zip(,) 字符串浮动？ [关闭]
【发布时间】：2012-12-17 10:05:09
【问题描述】：

我正在尝试计算每日损益，在 .csv 中使用 10 分钟价格（每个日期有 42 次）——其中一天的买入数量和卖出数量可能不相等。如果它们不相等，则程序应使用该唯一日期的收盘价 df["price"][t] 减去（从/减去），具体取决于它是买入还是卖出。

import pandas as pd

df=pd.read_csv("file.csv", names="date time price mag signal".split())

s=df["signal"]=="S"
b=df["signal"]=="B"
ns=df["signal"]!="S"
nb=df["signal"]!="B"
t=df["time"]=="1620"

a1=df["price"][buy|(nb & t)]
b1=df["date"][buy|(nb & t)]

h=df["price"][s|(ns & t)]
g=df["date"][s|(ns & t)]


c1=zip(b1,a1)
c=zip(g,h)

c1, c 是包含买卖数量及其各自日期的列表。这里的问题是 c1 & c 是字符串——一旦它们被压缩；因此不能减去。是否可以制作 a1, h 浮点数以便我可以区分它们？

我想匹配 c, c1 中的日期，以减去 S_i-B_i 的价格，对于给定日期的所有 i，然后对所有日期求和并返回一个值。我想在 h-a1 上区分价格，只有当日期匹配时。

一些样本数据：

日期时间价格磁信号

2007 年 1 月 3 日 930 1422.8
2007 年 1 月 3 日 940 1423.2 0
2007 年 1 月 3 日 950 1422.8 0
2007 年 1 月 3 日 1000 1420.5 0
2007 年 1 月 3 日 1010 1422.8 0
2007 年 1 月 3 日 1020 1426.2 1 秒

。 . .

2007 年 1 月 3 日 1230 1424.2 -1 B

2007 年 1 月 3 日 1240 1424.8 0
2007 年 1 月 3 日 1250 1425.8 1 秒

2007 年 1 月 3 日 1300 1426 0
2007 年 1 月 3 日 1310 1425 0
2007 年 1 月 3 日 1320 1423.5 -1 B

2007 年 1 月 3 日 1330 1421.8 0
2007 年 1 月 3 日 1340 1421.5 0
2007 年 1 月 3 日 1350 1420.5 0
2007 年 1 月 3 日 1400 1421 0
2007 年 1 月 3 日 1410 1417.2 -1 B

2007 年 1 月 3 日 1420 1412.8 -1 B

2007 年 1 月 3 日 1430 1414.8 0
2007 年 1 月 3 日 1440 1413.5 0
2007 年 1 月 3 日 1450 1410 0
2007 年 1 月 3 日 1500 1407.2 -1 B

2007 年 1 月 3 日 1510 1410.2 1 秒

2007 年 1 月 3 日 1520 1409.5 -1 B

2007 年 1 月 3 日 1530 1410.5 1 秒

2007 年 1 月 3 日 1540 1412.5 0
...

2007 年 1 月 3 日 1610 1415.5 1 秒

2007 年 1 月 3 日 1620 1414 -1 B

2007 年 1 月 4 日 930 1412.2 0
2007 年 1 月 4 日 940 1411 0
2007 年 1 月 4 日 950 1413 0
2007 年 1 月 4 日 1000 1412.2 0
2007 年 1 月 4 日 1010 1407.2 -1 B

zip 的结果，比如 c1 应该是这样的：

[('1/3/2007', '1424.2'),
('1/3/2007', '1423.5'),
('1/3/2007', '1417.2'),
('1/3/2007', '1412.8'),
('1/3/2007', '1407.2'),
('1/3/2007', '1409.5'),
('1/3/2007', '1414'),

 etc - all dates in between

 ('8/30/2012','1324')]

非常感谢。

【问题讨论】：

map 在zip 之前（甚至之后）？

标签： python pandas time-series

【解决方案1】：

不要使用 zip，您可以将数据保存在 pandas 原生数据结构中。
这里的价格应该在 DataFrame 中正确读取为浮点数。

你可以做类似sub 然后groupby 'date':

df['dif'] = a1.sub(h, fill_value=0)
g = df.groubpy('date')['dif'].sum()

请注意，您可以使用 read_csv 关键字 parse_dates 作为日期时间对象：

df = pd.read_csv("file.csv",
                 names="date time price mag signal".split()
                 parse_dates=[['date','time']])

【讨论】：

运行后，回溯给出： TypeError: unsupported operand type(s) for -: 'int' and 'str' which refer to sub() 我不知道为什么？由于 a1,h;价格，都应该是浮点数。
@user1374969 它们都是浮点数吗？ read_csv 应该正确选择浮点数，但也许你可以强制它：df['price'] = df['price'].apply(float)
我对 csv 文件的价格列运行了一个公式。确实，它们都是“数字”。 apply(float) 由于某种原因被拒绝 w/ValueError: could not convert string to float: price. Traceback 指向： -> mapped = lib.map_infer(self.values, f, convert=convert_dtype) 是这里的问题。看起来很奇怪。
如果您使用交互式调试器 (pdb)，您可以找到导致错误的值（这可以解释它）。