【发布时间】:2016-09-06 18:33:08
【问题描述】:
我在想那创建了非常简单的代码:
def recordset_2_list(cursor):
data = []
columns = [column[0] for column in cursor.description]
for row in cursor.fetchall():
data.append(dict(zip(columns, row)))
return data
但是,我几乎每次都在这里崩溃(内存问题)。 Row Count = 530002 条记录
如何优化?
engine = create_engine(CONN_STR, echo=True)
connection = engine.raw_connection()
with connection.cursor() as cursor:
cursor.callproc("sp_create_training_data", [id])
return list_2_dataframe(recordset_2_list(cursor))
def list_2_dataframe(data):
data = pd.DataFrame(data)
data.columns = map(str.upper, data.columns)
return data
更新 1: 使用以下代码的方法,也返回 Segmentation fault (core dumped)
df = pd.read_sql(sa.text('SELECT * FROM sp_create_training_data(:arg1'),
engine, params={'arg1': id})
return df
更新 2: 有效,有什么想法可以加快速度吗?
print 'Start loading data ...'
try:
engine = create_engine(CONN_STR, echo=True)
df = pd.read_sql(sa.text('SELECT * FROM sp_create_training_data(:arg1)'),
engine, params={'arg1': id}, chunksize=5000)
print "Executed Store Proc"
lst = []
for chun in df:
lst.append(chun)
print "Prepared chunks"
df_big = pd.concat(lst)
return df_big
except Exception, err:
print err.message
return None
【问题讨论】:
标签: python performance python-2.7 pandas sqlalchemy