为什么我的索引范围有这个问题？为什么它不起作用？答案

【问题标题】：Why I have this problem with index range?why does it not work?为什么我的索引范围有这个问题？为什么它不起作用？
【发布时间】：2020-11-04 15:48:59
【问题描述】：

尝试将我的一列拆分为几列时出现此错误。但它只在一两列上拆分。如果您想拆分为 3、4、5 列，它会写道：

ValueError                                Traceback (most recent call last)
/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    349             try:
--> 350                 return self._range.index(new_key)
    351             except ValueError:

ValueError: 2 is not in range

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-19-d4e6a4d03e69> in <module>
     22 data_old[Col_1_Label] = newz[0]
     23 data_old[Col_2_Label] = newz[1]
---> 24 data_old[Col_3_Label] = newz[2]
     25 #data_old[Col_4_Label] = newz[3]
     26 #data_old[Col_5_Label] = newz[4]

/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    350                 return self._range.index(new_key)
    351             except ValueError:
--> 352                 raise KeyError(key)
    353         return super().get_loc(key, method=method, tolerance=tolerance)
    354 

KeyError: 2

有我的代码。我有 csv 文件。当熊猫读取它时 - 创建一列，其值为“Контракт”。然后。我将它拆分到另一列。但它分成两列。我想要 7 列！请帮助理解这个逻辑！

import pandas as pd
from pandas import Series, DataFrame
import re

dframe1 = pd.read_csv('po.csv')
columns = ['Контракт']
data_old = pd.read_csv('po.csv', header=None, names=columns)
data_old
# The thing you want to split the column on
SplitOn = ':'

# Name of Column you want to split
Split_Col = 'Контракт'



newz = data_old[Split_Col].str.split(pat=SplitOn, n=-1, expand=True)

# Column Labels (you can add more if you will have more)
Col_1_Label = 'Номер телефону'
Col_2_Label = 'Тарифний пакет'
Col_3_Label = 'Вихідні дзвінки з України за кордон'
Col_4_Label = 'ВАРТІСТЬ ПАКЕТА/ЩОМІСЯЧНА ПЛАТА'
Col_5_Label = 'ЗАМОВЛЕНІ ДОДАТКОВІ ПОСЛУГИ ЗА МЕЖАМИ ПАКЕТА'
Col_6_Label = 'Вартість послуги "Корпоративна мережа'
Col_7_Label = 'ЗАГАЛОМ ЗА КОНТРАКТОМ (БЕЗ ПДВ ТА ПФ)'
data_old[Col_1_Label] = newz[0]
data_old[Col_2_Label] = newz[1]
data_old[Col_3_Label] = newz[2]
#data_old[Col_4_Label] = newz[3]
#data_old[Col_5_Label] = newz[4]
#data_old[Col_6_Label] = newz[5]
#data_old[Col_7_Label] = newz[6]


data_old

【问题讨论】：

标签： python pandas numpy jupyter-notebook

【解决方案1】：

Pandas 不支持“非结构化文本”，您应该将其转换为标准格式或 python 对象，然后从中创建数据框

假设您有一个包含此文本的文件，名为data.txt：

Contract № 12345679 Number of phone: +7984563774
Total price for month : 00.00000
Total price: 10.0000

你可以像这样用 Python 加载一个进程：

with open('data.txt') as f:
  content = list(data.readlines())

# First line contains the contract number and phone information
contract, phone = content[0].split(':')
# find contract number using regex
contract = re.findall('\d+', contract)[0]
# The phone is strightforward
phone = phone.strip()

# Second line and third line for prices
total_price = float(content[1].split(':')[1].strip())
total_month_price = float(content[2].split(':')[1].strip())

然后使用这些变量，您可以创建一个数据框

df = pd.DataFrame([dict(N_of_contract=contract, total_price=total_price, total_month_price =total_month_price )])

对所有文件重复相同的操作。

【讨论】：

谢谢。！！！好的。但是，如果我最初有带有此字符串的 .CSV 文件？