如何修复 - ArrowInvalid: ("Could not convert (x, y) with type tuple)?答案

【问题标题】：How to fix - ArrowInvalid: ("Could not convert (x, y) with type tuple)?如何修复 - ArrowInvalid: ("Could not convert (x, y) with type tuple)?
【发布时间】：2021-10-19 04:15:25
【问题描述】：

我正在尝试构建一个精简的应用程序。下面是我正在尝试的工作示例

from pm4py.objects.conversion.log import converter as log_converter
import pandas as pd
import pm4py
df = pm4py.format_dataframe(pd.read_csv('https://raw.githubusercontent.com/pm4py/pm4py-core/release/notebooks/data/running_example.csv', sep=';'), case_id='case_id',activity_key='activity',
                             timestamp_key='timestamp')

log = log_converter.apply(df)

precedence_dict = pm4py.discover_dfg(log)[0]

precedence_dict 是（前件，后件）和计数的字典

precedence_dict  = {('check ticket', 'decide'): 6,
         ('check ticket', 'examine casually'): 2,
         ('check ticket', 'examine thoroughly'): 1,
         ('decide', 'pay compensation'): 3,
         ('decide', 'reinitiate request'): 3,
         ('decide', 'reject request'): 3,
         ('examine casually', 'check ticket'): 4,
         ('examine casually', 'decide'): 2,
         ('examine thoroughly', 'check ticket'): 2,
         ('examine thoroughly', 'decide'): 1,
         ('register request', 'check ticket'): 2,
         ('register request', 'examine casually'): 3,
         ('register request', 'examine thoroughly'): 1,
         ('reinitiate request', 'check ticket'): 1,
         ('reinitiate request', 'examine casually'): 1,
         ('reinitiate request', 'examine thoroughly'): 1
}

将上述 dict 转换为 pandas 数据帧

precedence_df = pd.DataFrame.from_dict(precedence_dict, orient='index').reset_index()


rename_map = {"index" : "Antecedent,Consequent", 0 : "Count"}
precedence_df = precedence_df.rename(columns=rename_map)

precedence_df['Antecedent'], precedence_df['Consequent'] = zip(*precedence_df["Antecedent,Consequent"])
# precedence_df.assign(**dict(zip(['Antecedent', 'Consequent'], precedence_df["Antecedent,Consequent"].str)))
# precedence_df['Antecedent'], precedence_df['Consequent'] = precedence_df["Antecedent,Consequent"].str
precedence_mat = precedence_df[['Antecedent', 'Consequent', 'Count']]

st.dataframe(precedence_df)

在此行运行应用程序时出现 ArrowInvalid 错误

完整的错误回溯

ArrowInvalid: ("Could not convert (x, y) with type tuple: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column Antecedent, Consequent with type object')

Traceback:
File "C:\Users\zz\Documents\Streamlit\preced\app.py", line 1353, in <module>
    st.dataframe(precedence_df)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\elements\dataframe_selector.py", line 85, in dataframe
    return self.dg._arrow_dataframe(data, width, height)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\elements\arrow.py", line 82, in _arrow_dataframe
    marshall(proto, data, default_uuid)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\elements\arrow.py", line 160, in marshall
    proto.data = type_util.data_frame_to_bytes(df)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\type_util.py", line 371, in data_frame_to_bytes
    table = pa.Table.from_pandas(df)
File "pyarrow\table.pxi", line 1561, in pyarrow.lib.Table.from_pandas
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 595, in dataframe_to_arrays
    for c, f in zip(columns_to_convert, convert_fields)]
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 595, in <listcomp>
    for c, f in zip(columns_to_convert, convert_fields)]
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 581, in convert_column
    raise e
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 575, in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow\array.pxi", line 302, in pyarrow.lib.array
File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow\error.pxi", line 99, in pyarrow.lib.check_status

当前 pyarrow 版本 5.0.0。

当我尝试在 colab 中运行相同的代码（期望 st.dataframe）时，我没有任何问题/错误，我没有任何问题/错误。 ArrowInavlid Error 是什么意思，如何解决这个错误？

【问题讨论】：

标签： python pandas traceback pyarrow streamlit

【解决方案1】：

我对 streamlit 和 st.dataframe 不太熟悉，但它似乎正在尝试将 precedence_df 转换为 pyarrow.Table。

虽然在pandas 中您可以将任意对象作为列的数据类型，但在pyarrow 中这是不可能的。所以Antecedent,Consequent 列引起了问题，因为它是一个元组。

|    | Antecedent,Consequent                        |   Count |
|---:|:---------------------------------------------|--------:|
|  0 | ('check ticket', 'decide')                   |       6 |
|  1 | ('check ticket', 'examine casually')         |       2 |
|  2 | ('check ticket', 'examine thoroughly')       |       1 |
|  3 | ('decide', 'pay compensation')               |       3 |
|  4 | ('decide', 'reinitiate request')             |       3 |
|  5 | ('decide', 'reject request')                 |       3 |

在precedence_mat 这样的数据帧上工作更容易也更惯用，因为它使用扁平字符串列（而不是元组）。

|    | Antecedent         | Consequent         |   Count |
|---:|:-------------------|:-------------------|--------:|
|  0 | check ticket       | decide             |       6 |
|  1 | check ticket       | examine casually   |       2 |
|  2 | check ticket       | examine thoroughly |       1 |
|  3 | decide             | pay compensation   |       3 |
|  4 | decide             | reinitiate request |       3 |

话虽如此，如果您确实需要将元组传递给 pyarrow/streamlit，您有两种选择：

为您的元组创建一个模式，并使用它将数据帧转换为 pyarrow，然后再将其传递给 streamlit。

这有点棘手，你需要为你的元组提供一个模式，解释它们是什么：

import pyarrow as pa


schema  = pa.schema(
    [
        pa.field(
            "Antecedent,Consequent",
            pa.struct(
                [
                    pa.field("antecedent", pa.string()),
                    pa.field("consequent", pa.string()),
                ])
        ),
        pa.field("Count", pa.int32())
    ]

)

table = pa.Table.from_pandas(precedence_df, schema=schema)
st.dataframe(table)

将元组转换为列表，这使得 pyarrow 更容易猜测类型

copy_df = precedence_df.copy()
copy_df["Antecedent,Consequent"] = precedence_df["Antecedent,Consequent"].apply(list)
table = pa.Table.from_pandas(copy_df).to_pandas()
st.dataframe(table)

请注意，在这种情况下，“Antecedent,Consequent”数据从字符串元组转换为字符串列表。

【讨论】：

这是有效的。谢谢。但该解决方案在 colab 中不起作用。
所需的输出也是一个扁平的字符串列。使用table1 = pd.DataFrame(table["Antecedent,Consequent"].to_list(), columns=['Antecedent','Consequent']) 将字符串列表拆分为单独的列