【发布时间】:2018-06-02 03:39:32
【问题描述】:
我是 Python 和编程新手。如果我的问题看起来很愚蠢或不清楚,请原谅我。我做过研究,但坦率地说,我读过的一些解释我很难理解。
我有一个数据框,其中包含医院的大量预约预约数据,需要对其进行评估和修改,以便将其导入到他们的新预约应用程序中。不幸的是,供应商的导入工具是垃圾并且进行零检查,所以我必须编写一些东西来检查旧数据并将其转换为新系统的上传数据。以下是格式示例:
start appointment department procedure resource
20171020131500 MAM BDXMAMUNI BDIAG2
20171020133000 MAM BDXMAMUNI BDIAG1
20171020141500 MAM BDXMAMUNI BDIAG2
20171020143000 MAM BDXMAMUNI BDIAG1
20171020144500 MAM BDXMAMBIL BDIAG2
20171020150000 MAM BDXMAMBIL BDIAG1
20171020151500 MAM BDXMAMUNI BDIAG2
20171023080000 MAM BDXMAMBIL BDIAG1
20171023081500 MAM BDXMAMBIL BDIAG2
我正在尝试根据标准进行更新。这是我想出的,但我无法让它更新该字段。以下是我自己的标准。
如果在索引 X 分钟 = 15 并且(hr = 8 或 h= 9 或 hr = 10 或 hr = 11 或 h =13 或 hr =14 或 hr =15) 并且资源 = BDIAG1、BDIAG2 或 BDIAG 3 然后在索引 X 开始约会 索引 X 处的资源 ZBMDX3
如果在索引 X 开始约会的分钟数 = 00 并且(hr = 8 或 hr = 9 或 hr = 10 或 hr = 11 或 hr = 13 或 hr = 14 或 hr =15) 然后开始 索引 X 的任命将在索引 X 的资源 ZBMDX2 中
如果在索引 X 分钟 = 45 并且(hr = 7 或 hr = 8 或 hr = 9 或 hr = 10 或 hr 12 或 hr = 13 或 hr = 14) 然后开始 索引 X 的任命将在索引 X 的资源 ZBMDX1 中
如果在索引 X 开始约会,则分钟 = 30 并且(hr = 8 或 hr = 9 或 hr = 10 或 hr = 13 或 hr = 14) 然后在索引 X 开始约会 将在索引 X 的资源 ZBMDX4 中
创建输出文件时,它没有任何更新的更改。我对 StackOverflow 做了一些研究,但我读过的所有线程似乎都不起作用。有些人建议使用 locs 和 ix 以及 df.update 做一些事情。
import pandas as pd
df = pd.read_excel(my_file, sheet_name='Sheet1')
dept = df['department']
resource = df['resource']
start_appointment = df['start appointment']
def diagnostic(): # Check Diagnostic Breast scheduled appointments
for i in range(10):
minutes = str(start_appointment[i])[14:16]
hour = str(start_appointment[i])[11:13]
if minutes == '15' and (
hour == '8' or hour == '9' or hour == '10' or hour == '11'
or hour == '13' or hour == '14' or hour == '15') and (
resource[i] == 'BIDAG1' or resource[i] == 'BDIAG2' or
resource[i] == 'BDIAG3'):
df.update['resource'][i] = 'ZBMDX3'
elif minutes == '00' and (hour == '8' or hour == '9' or hour == '10' or
hour == '11' or hour == '13' or hour == '14' or hour == '15')
and (resource[i] == 'BIDAG1' or resource[i] == 'BDIAG2' or
resource[i] == 'BDIAG2'):
df.update['resource'][i] = 'ZBMDX2'
elif minutes == '45' and (
hour == '7' or hour == '8' or hour == '9' or hour == '10' or
hour == '12' or hour == '13' or hour == '14') and (
resource[i] == 'BIDAG1' or resource[i] == 'BDIAG2' or
resource[i] == 'BDIAG1'):
df.update['resource'][i] = 'ZBMDX1'
elif minutes == '30' and (hour == '8' or hour == '9' or hour == '10' or
hour == '13' or hour == '14') and (
resource[i] == 'BIDAG1' or resource[i] == 'BDIAG2' or
resource[i] == 'BDIAG1'):
df.update['resource'][i] = 'ZBMDX4'
diagnostic()
# Specify a writer
writer = pd.ExcelWriter('C:\\Users\user_name\Desktop\Python 3\Python_Output.xlsx', engine='xlsxwriter')
# Write your DataFrame to a file
df.to_excel(writer, 'Sheet1')
# Save the result
writer.save()
我做了建议的更改。
df2 = diagnostic(df)
# Specify a writer
writer = pd.ExcelWriter('C:\\Users\cboutsikos\Desktop\Python 3\Python_Output.xlsx', engine='xlsxwriter')
# Write your DataFrame to a file
df2.to_excel(writer, 'Sheet1')
# Save the result
writer.save()
现在我遇到了错误。 回溯(最近一次通话最后): 文件“Excel Parse.py”,第 55 行,在 df2.to_excel(作家,'Sheet1') AttributeError:“NoneType”对象没有属性“to_excel” 异常被忽略:> 回溯(最近一次通话最后): del 中的文件“C:\ProgramData\Anaconda3\lib\site-packages\xlsxwriter\workbook.py”,第 153 行 异常:工作簿析构函数中捕获的异常。工作簿可能需要显式 close()。
Seiji,我完全更新了我的代码以反映您的更改。让我们看看解决方案 2,因为它处理得更快。
import pandas as pd
my_file = 'C:\\Users\user_name\Desktop\Python 3\schdocexprt10_Bob - Copy.xlsx'
df = pd.read_excel(my_file, sheetname='Sheet3')
def update_val(row):
minutes = str(row['start appointment'])[14:16]
hour = str(row['start appointment'])[11:13]
resource = row['resource']
# cond1, cond2, cond3, cond4 = True, False, False, False
# Condition 1
if minutes == '00' and hour in ['8', '9', '10', '11', '13', '14', '15']
and resource in ['BDIAG1', 'BDIAG2', 'BDIAG3'] == True:
row['resource'] = 'ZBMDX2'
# Condition 2
elif minutes == '15' and hour in ['9', '10','11','13','14','15']
and resource in ['BDIAG1','BDIAG2','BDIAG3'] == True:
row['resource'] = 'ZBMDX3'
# Condition 3
elif minutes == '45' and hour in ['7','8','9','10','12','13','14']
and resource in ['BDIAG1','BDIAG2','BDIAG3'] == True:
row['resource'] = 'ZBMDX1'
# Condition 4
elif minutes == '30' and hour in ['8','9','10','13','14']
and resource in ['BDIAG1','BDIAG2','BDIAG3'] == True:
row['resource'] = 'ZBMDX4'
return row
df2 = df.apply(update_val, axis='columns')
# Specify a writer
writer = pd.ExcelWriter('C:\\Users\user_name\Desktop\Python 3\Python_Output.xlsx', engine='xlsxwriter')
# Write your DataFrame to a file
df2.to_excel(writer, 'Sheet1')
# Save the result
writer.save()
创建输出文件后,我仍然没有看到资源字段的更新。我手动评估了前 10 行,以确保不满足条件存在并且可能它正在工作但条件存在。
start appointment dept procedure resource
20171020131500 MAM BDXMAMUNI BDIAG2 should change to ZBMDX3
20171020133000 MAM BDXMAMUNI BDIAG1 should change to ZBMDX4
20171020141500 MAM BDXMAMUNI BDIAG2 should change to ZBMDX3
20171020143000 MAM BDXMAMUNI BDIAG1 should change to ZBMDX4
20171020144500 MAM BDXMAMBIL BDIAG2 should change to ZBMDX1
Seiji 的解决方案 1
import pandas as pd
df = pd.read_excel(my_file, sheet_name='Sheet3')
# Pull Columns as a Variable
dept = df['department']
resource = df['resource']
start_appointment = df['start appointment']
def diagnostic(df):
for i in range(1,100):
minutes = str(start_appointment[i])[14:16]
hour = str(start_appointment[i])[11:13]
if minutes == '15' and hour in ['9', '10','11','13','14','15'] and resource[i] in ['BDIAG1','BDIAG2','BDIAG3']:
df.loc[i, 'resource'] = 'ZBMDX3'
elif minutes == '00' and hour in ['8','9','10','11','13','14','15'] and resource[i] in ['BDIAG1','BDIAG2','BDIAG3']:
df.loc[i, 'resource'] = 'ZBMDX2'
elif minutes == '45' and hour in ['7','8','9','10','12','13','14'] and resource[i] in ['BIDAG1','BDIAG2','BDIAG3']:
df.loc[i, 'resource'] = 'ZBMDX1'
elif minutes == '30' and hour in ['8','9','10','13','14'] and resource[i] in ['BIDAG1','BDIAG2','BDIAG3']:
df.loc[i, 'resource'] = 'ZBMDX4'
return df
df2 = diagnostic(df)
# Specify a writer
writer = pd.ExcelWriter('C:\\Users\cboutsikos\Desktop\Python 3\Python_Output.xlsx', engine='xlsxwriter')
# Write your DataFrame to a file
df2.to_excel(writer, 'Sheet1')
# Save the result
writer.save()
同样的问题。输出文件没有更新。
修改时分切片
仍然没有在输出中显示更新。此时我想知道是否应该将 xlsx 文件保存为 CSV 并且不使用任何库,或者我是否应该通过将每一列(开始约会、资源)迭代到各自的列表中来从头开始创建数据框.你怎么看?
import pandas as pd
my_file = 'C:\\Users\cboutsikos\Desktop\Python 3\schdocexprt10_Bob - Copy.xlsx'
df = pd.read_excel(my_file, sheetname='Sheet3')
def update_val(row):
minutes = str(row['start appointment'])[10:12]
hour = str(row['start appointment'])[8:10]
resource = row['resource']
# Condition 1
if (minutes == '00') and (hour in ['8', '9', '10', '11', '13', '14', '15']) \
and (resource in ['BDIAG1', 'BDIAG2', 'BDIAG3']) == True:
row['resource'] = 'ZBMDX2'
# Condition 2
elif (minutes == '15') and (hour in ['9', '10','11','13','14','15']) \
and (resource in ['BDIAG1','BDIAG2','BDIAG3']):
row['resource'] = 'ZBMDX3'
# Condition 3
elif (minutes == '45') and (hour in ['7','8','9','10','12','13','14']) \
and (resource in ['BDIAG1','BDIAG2','BDIAG3']):
row['resource'] = 'ZBMDX1'
# Condition 4
elif (minutes == '30') and (hour in ['8','9','10','13','14']) \
and (resource in ['BDIAG1','BDIAG2','BDIAG3']):
row['resource'] = 'ZBMDX4'
return row
df2 = df.apply(update_val, axis='columns')
print(df2.head())
【问题讨论】:
-
也许这是一个词法范围问题?你在函数内部对 df 做一些事情,但你没有返回任何东西,所以 df 在函数调用范围之外保持不变。尝试使用
def diagnostic(df):定义您的函数,然后使用df = diagnostic(df)调用它 -
我要试试 df2 = diagnostic(df)
-
嘿,如果您仍然卡在这个问题上,请给我发送一个小的 excel 文件的 sn-p,我会看看问题出在哪里。虽然没有敏感信息。 g mail 的 seiji dot armstrong
-
是的,我是。会做!谢谢!