【问题标题】:Why I have this problem with index range?why does it not work?为什么我的索引范围有这个问题?为什么它不起作用?
【发布时间】:2020-11-04 15:48:59
【问题描述】:

尝试将我的一列拆分为几列时出现此错误。但它只在一两列上拆分。如果您想拆分为 3、4、5 列,它会写道:

ValueError                                Traceback (most recent call last)
/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    349             try:
--> 350                 return self._range.index(new_key)
    351             except ValueError:

ValueError: 2 is not in range

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-19-d4e6a4d03e69> in <module>
     22 data_old[Col_1_Label] = newz[0]
     23 data_old[Col_2_Label] = newz[1]
---> 24 data_old[Col_3_Label] = newz[2]
     25 #data_old[Col_4_Label] = newz[3]
     26 #data_old[Col_5_Label] = newz[4]

/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

/usr/local/Cellar/jupyterlab/2.1.5/libexec/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    350                 return self._range.index(new_key)
    351             except ValueError:
--> 352                 raise KeyError(key)
    353         return super().get_loc(key, method=method, tolerance=tolerance)
    354 

KeyError: 2

有我的代码。我有 csv 文件。当熊猫读取它时 - 创建一列,其值为“Контракт”。然后。我将它拆分到另一列。但它分成两列。我想要 7 列!请帮助理解这个逻辑!

import pandas as pd
from pandas import Series, DataFrame
import re

dframe1 = pd.read_csv('po.csv')
columns = ['Контракт']
data_old = pd.read_csv('po.csv', header=None, names=columns)
data_old
# The thing you want to split the column on
SplitOn = ':'

# Name of Column you want to split
Split_Col = 'Контракт'



newz = data_old[Split_Col].str.split(pat=SplitOn, n=-1, expand=True)

# Column Labels (you can add more if you will have more)
Col_1_Label = 'Номер телефону'
Col_2_Label = 'Тарифний пакет'
Col_3_Label = 'Вихідні дзвінки з України за кордон'
Col_4_Label = 'ВАРТІСТЬ ПАКЕТА/ЩОМІСЯЧНА ПЛАТА'
Col_5_Label = 'ЗАМОВЛЕНІ ДОДАТКОВІ ПОСЛУГИ ЗА МЕЖАМИ ПАКЕТА'
Col_6_Label = 'Вартість послуги "Корпоративна мережа'
Col_7_Label = 'ЗАГАЛОМ ЗА КОНТРАКТОМ (БЕЗ ПДВ ТА ПФ)'
data_old[Col_1_Label] = newz[0]
data_old[Col_2_Label] = newz[1]
data_old[Col_3_Label] = newz[2]
#data_old[Col_4_Label] = newz[3]
#data_old[Col_5_Label] = newz[4]
#data_old[Col_6_Label] = newz[5]
#data_old[Col_7_Label] = newz[6]


data_old

【问题讨论】:

    标签: python pandas numpy jupyter-notebook


    【解决方案1】:

    Pandas 不支持“非结构化文本”,您应该将其转换为标准格式或 python 对象,然后从中创建数据框

    假设您有一个包含此文本的文件,名为data.txt

    Contract № 12345679 Number of phone: +7984563774
    Total price for month : 00.00000
    Total price: 10.0000
    

    你可以像这样用 Python 加载一个进程:

    with open('data.txt') as f:
      content = list(data.readlines())
    
    # First line contains the contract number and phone information
    contract, phone = content[0].split(':')
    # find contract number using regex
    contract = re.findall('\d+', contract)[0]
    # The phone is strightforward
    phone = phone.strip()
    
    # Second line and third line for prices
    total_price = float(content[1].split(':')[1].strip())
    total_month_price = float(content[2].split(':')[1].strip())
    

    然后使用这些变量,您可以创建一个数据框

    df = pd.DataFrame([dict(N_of_contract=contract, total_price=total_price, total_month_price =total_month_price )])
    

    对所有文件重复相同的操作。

    【讨论】:

    • 谢谢。!!!好的。但是,如果我最初有带有此字符串的 .CSV 文件?
    猜你喜欢
    • 1970-01-01
    • 2013-10-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-11-21
    • 1970-01-01
    • 2022-10-06
    • 2011-12-14
    相关资源
    最近更新 更多