将熊猫列转换为 PostgreSQL 列表？答案

【问题标题】：Converting pandas columns to PostgreSQL list?将熊猫列转换为 PostgreSQL 列表？
【发布时间】：2015-03-08 07:27:15
【问题描述】：

我正在使用包含几百列的 CSV，其中许多只是枚举，即：

[
['code_1', 'code_2', 'code_3', ..., 'code_50'],
[1, 2, 3, ..., 50],
[2, 3, 4, ..., 51],
...
[400000, 400001, 400002, ..., 400049]
]

我正在将此数据导入 PostgreSQL，并希望将这些列连接到一个数组中，例如：

[
['codes'],
['{1, 2, 3, ..., 50}']
]

等等。

我知道我可以通过“迂回”的方式来完成此任务，例如

df['codes'] = pd.DataFrame(["{" + df['code_1'] + ", " + df['code_2'] + "}"]).T

但鉴于此 CSV 的大小，要编写和维护的冗余代码很多。

我基本上必须使用的是列列表，我已经提取了枚举列，例如：

codes = [
    'code_1',
    'code_2',
    'code_3',
    ...
]

在我开始编写自己的自定义“implode_columns(arr)”函数之前，pandas 中是否有任何东西已经解决了这个问题，或者有什么特殊的方法可以方便地容纳 PostgreSQL 数组？

【问题讨论】：

我曾经使用过 python、csv 和 postgres，但不会使用 pandas。我对此有所了解。什么是源 .csv 字段？ postgres 表中存在哪些字段？为什么不使用 for in for 插入每个字段？但这取决于您的 RAM。
@colintobing 这是源字段pastebin.com/eXxxTtwN 我从分类法开始，因为它是扁平的，healthcare_provider_taxonomy_codes 和 other_provider_identifiers 将嵌套，但到那时我将有一个更好的角度来处理它们。我使用 pandas 来避免必须单独处理 300 列，但是如果我可以将其减少到一个更合乎逻辑的表，那么创建模型就不会太多工作。内存不会是问题。也许我正在使用 pandas 为自己创造更多的工作，这个 CSV 真是一团糟。

标签： python postgresql pandas

【解决方案1】：

假设您已经连接到 PostgreSQL 并且已经在 PostgreSQL 中拥有表。或访问此链接https://wiki.postgresql.org/wiki/Psycopg2_Tutorial

import psycopg2

try:
    conn = psycopg2.connect("host='localhost' dbname='template1' user='dbuser' password='dbpass'")
except:
    print "I am unable to connect to the database"

首先，打开 .csv 文件。

>>> import csv
>>> with open('names.csv') as csvfile:
...     reader = csv.DictReader(csvfile)
...     for row in reader:
...         print(row['first_name'], row['last_name'])
...

这是来自https://docs.python.org/2/library/csv.html 的示例用 insert 更改打印行到 PostgreSQL。

>>> import psycopg2    
>>> cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)",
    ...      (100, "abc'def"))

你可以用 (variable1, variable2) 改变 (100, "abc'def") 见这个链接http://initd.org/psycopg/docs/usage.html 或完整的示例代码：

>>> import csv
>>> import psycopg2
>>> with open('names.csv') as csvfile:
...     reader = csv.DictReader(csvfile)
...     for row in reader:
...         cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)", (variable1, variable2))
...

希望这会有所帮助...

【讨论】：