【问题标题】:VBA To Pandas - Having trouble matching VBA Logic With PandasVBA To Pandas - 将 VBA 逻辑与 Pandas 匹配时遇到问题
【发布时间】:2019-07-17 02:51:14
【问题描述】:

您好,请查看我的有效 VBA 代码。我正在尝试用 Pandas 重写它,但我的 Pandas 脚本无法正常工作(我的 Pandas 尝试脚本低于 VBA)如果可能的话,谁能帮我完成这个(我认为是)

Sub mymacro()
Columns(19).Replace "DFHD", "SFD"
Columns(19).Replace "DFBG", "SFD"
Columns(19).Replace "DFVD", "SFD"
Columns(19).Replace "MFUB", "BFD"
Columns(19).Replace "MFBD", "BFD"
Columns(19).Replace "DFBD", "BFD"
Columns(19).Replace "UFNC", "CFD"
Columns(19).Replace "UFNC", "CFD"
Columns(19).Replace "BFYD", "BFD"
'Having trouble starting below here'
Columns("T:AC").Select
    Selection.EntireColumn.Hidden = True
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=19, Criteria1:=Array( _
        "U*"), Operator:=xlFilterValues
    ActiveWindow.SmallScroll Down:=-100
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=30, Criteria1:=Array( _
        "350", "B*"), Operator:=xlFilterValues
    ActiveWindow.SmallScroll Down:=-100
    Range("S3").Select
    ActiveCell.FormulaR1C1 = "BD"
    Range("S3").Select
    Selection.Copy
    Range(Selection, Selection.End(xlDown)).Select
    ActiveSheet.Paste
    Range("S3").Select
    Application.CutCopyMode = False
    ActiveSheet.ShowAllData
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=19, Criteria1:="=UND", Operator:=xlOr, Criteria2:="=UNH"
    ActiveWindow.SmallScroll Down:=-21
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=30, Criteria1:=Array( _
     "DR9", "DV0", "DV5", "DV8", "DV9", "DVG", "DV*"), Operator:=xlFilterValues
    ActiveWindow.SmallScroll Down:=-36
    Range("S11").Select
    ActiveCell.FormulaR1C1 = "SD"
    Range("S11").Select
    Selection.Copy
    Range(Selection, Selection.End(xlDown)).Select
    ActiveSheet.Paste
    Range("S11").Select
    Application.CutCopyMode = False
    ActiveSheet.ShowAllData
    ActiveWindow.SmallScroll Down:=-10
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=19, Criteria1:="UNH"
    ActiveWindow.SmallScroll Down:=-27
    Range("S1815").Select
    ActiveCell.FormulaR1C1 = "FUHD"
    Range("S1815").Select
    Selection.Copy
    Range(Selection, Selection.End(xlDown)).Select
    ActiveSheet.Paste
    Range("S1815").Select
    Application.CutCopyMode = False
    ActiveWindow.SmallScroll Down:=-30
    ActiveSheet.ShowAllData
    ActiveWindow.SmallScroll Down:=-240

下面是我的 Pandas 脚本,请注意我开始遇到问题的地方已被注释掉,因为前 12 行代码效果很好。

import pandas as pd
import numpy as np
data = pd.read_excel("orsthrufirstarticledeltion.xlsx", encoding = "ISO-8859-1", dtype=object)
data.loc[data.Format == 'DFHD', 'Format'] = 'SFD'
data.loc[data.Format == 'DFBG', 'Format'] = 'SFD'
data.loc[data.Format == 'DFVD', 'Format'] = 'SFD'
data.loc[data.Format == 'MFUB', 'Format'] = 'BFD'
data.loc[data.Format == 'MFBD', 'Format'] = 'BFD'
data.loc[data.Format == 'DFBD', 'Format'] = 'BFD'
data.loc[data.Format == 'UFNC', 'Format'] = 'CFD'
data.loc[data.Format == 'BFYD', 'Format'] = 'BFD'

# Trouble starts below
data.loc[(data["Fmt"] != str) & (data["Format"] == "UN*"), "Format"] = 'BD' # the UN* did not work 
#data.loc[(data["Fmt"] == '350') & (data["Format"] == "UNB"), "Format"] = 'BD'
#data.loc[(data["Fmt"] != str) & (data[data.Format.str.startswith('UN',na=False)]), "Format"] = 'BD'
#
writer = pd.ExcelWriter('mstrplc2.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='Sheet1')
writer.save()

-----寻求解决方案的新尝试---------

请在下面查看示例数据框以及我们将开始使用的原始数据,如果您愿意,我有代码可以导出到 excel 中。

import pandas as pd

startdf = pd.DataFrame({'Column_A':['DFHD', 'DFBG', 'DFVD', 'MFUB', 'MFBD', 'DFBD', 'UFNC', 'UFNC', 'BFYD',
                                    'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],

'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})



writer = pd.ExcelWriter('testdf.xlsx', engine='xlsxwriter')
    startdf.to_excel(writer, sheet_name='Sheet1')

第一步是获取 A 列中的所有值,并将现有值替换为下面列出的新值(因此只是编辑 A 列)

  • “DFHD”->“SFD”“DFBG”->“SFD”“DFVD”->“SFD”“MFUB”->“BFD”“MFBD”-> “BFD”“DFBD”->“BFD”“UFNC”->“CFD”“UFNC”->“CFD”“BFYD”->“BFD”

写完这个逻辑后,数据应该是这样的:

df2 = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                            'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],
'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})

现在我们将继续只编辑 A 列,但使用 B 列中的值来指示 A 列的值应该是什么,因此逐行考虑每个值。首先从 A 列中过滤掉 SFD、BFD 和 CFD,所以剩下的值将是 'UNFZ'、'UNT'、'UNIX'、'UNFZ'、'UNT'、'UNIX'、'UNFZ'、'UNT'、 'UNIX'、'UNFZ'、'UNT'、'UNIX'、'UNFZ'、'UNT'、'UNIX'。对于这些剩余的值,我们将查看 B 列来决定如何更改 A 列中的内容。逻辑如下:

  1. 以 B 开头的值或 B 列中的数字应该意味着 A 列中的匹配行值现在应该更改为 BFD
  2. B 列中以 D 或 OPT 开头的值应该意味着 A 列中的匹配行值现在应该更改为 SFD
  3. 以 U 开头的值或 B 列中的数字应该意味着 A 列中的匹配行值现在应该更改为 UHFD

在这个逻辑之后,最终的输出数据帧应该是

     resultdf = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                                     'BFD', 'SFD', 'SFD', 'SFD', 'SFD', 'BFD','UHFD', 'UHFD', 'BFD', 'BFD', 'BFD', 'SFD','BFD', 'UHFD', 'SFD'],
    'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})

writer = pd.ExcelWriter('finalresult.xlsx', engine='xlsxwriter')
        resultdf.to_excel(writer, sheet_name='Sheet1')

【问题讨论】:

  • 您的大部分vba代码可以被删除。我假设您使用了宏记录器,因为您有很多多余的操作。您还希望看到 this 删除您对 .Select 的使用
  • @QHarr ,我相信这一点,但是如何将其翻译成 Pandas,第二部分。我试图不再使用那个 VBA,所以不用担心多余的 VBA
  • 从问题中的 VBA 脚本中删除不需要的代码将使这里的人们更容易理解。这将增加您获得有用答案的机会。
  • 我认为您所说的“没有工作”并不意味着运行时错误,而是没有发生预期的数据修改,对吧?
  • 是的@im_chc 我很难阅读熊猫文档,因为我猜是条件逻辑

标签: python excel vba pandas


【解决方案1】:

仍然有一个问题,当对来自 excel 的实时数据使用它时,我的 Column_B 作为“对象”导入到数据框中,它主要包含字符串,但也包含一些数值,例如“350”,以及逻辑不适用于所述 int 值...有什么原因吗?

可以使用此代码: data.loc[data.Fmt .astype(str) == '350', 'Fm'] = 'test' 所有,下面是一个似乎有效的答案,(每行的顺序很重要)

但是有没有更 Pythonic 的方式来实现这一点,即使用通配符?上面针对通配符解决方案的借出答案不起作用,因此请查看下面的冗长解决方案:

import pandas as pd

startdf = pd.DataFrame({'Column_A':['DFHD', 'DFBG', 'DFVD', 'MFUB', 'MFBD', 'DFBD', 'UFNC', 'UFNC', 'BFYD',
                                    'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],

'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})
#writer = pd.ExcelWriter('testdf.xlsx', engine='xlsxwriter')
#df.to_excel(writer, sheet_name='Sheet1')
#writer.save()

df2 = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                            'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],
'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})


resultdf = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                                 'BFD', 'SFD', 'SFD', 'SFD', 'SFD', 'BFD','UHFD', 'UHFD', 'BFD', 'BFD', 'BFD', 'SFD','BFD', 'UHFD', 'SFD'],
'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})

test = startdf

test.loc[test.Column_A == 'DFHD', 'Column_A'] = 'SFD'
test.loc[test.Column_A == 'DFBG', 'Column_A'] = 'SFD'
test.loc[test.Column_A == 'DFVD', 'Column_A'] = 'SFD'
test.loc[test.Column_A == 'MFUB', 'Column_A'] = 'BFD'
test.loc[test.Column_A == 'MFBD', 'Column_A'] = 'BFD'
test.loc[test.Column_A == 'DFBD', 'Column_A'] = 'BFD'
test.loc[test.Column_A == 'UFNC', 'Column_A'] = 'CFD'
test.loc[test.Column_A == 'BFYD', 'Column_A'] = 'BFD'

test.loc[test.Column_B == '357', 'Column_A'] = 'BFD'
test.loc[test.Column_B == '350', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'B50', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'B25', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'BVG', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'BUG', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'DVG', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV9', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV5', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV8', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV0', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DBG', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'U66', 'Column_A'] = 'UHFD'
test.loc[test.Column_B == 'U1C', 'Column_A'] = 'UHFD'
test.loc[test.Column_B == 'UVG', 'Column_A'] = 'UHFD'

finaldf = test 

【讨论】:

    【解决方案2】:

    现在,您的条件过滤器正在针对“格式”列查找文字“UN*”。要将星号用作通配符,您可以使用fnmatch 模块。

    import fnmatch
    
    data.loc[(data["Fmt"] != str) & (data["Format"].apply(lambda x: fnmatch.fnmatch(x, 'UN*')), "Format"] = 'BD'
    

    【讨论】:

    • 这不起作用,如果您在空闲时复制代码,它会在尝试运行时抛出错误,..
    • ] -- '无效语法"
    • @pes04 'UN*') 后面缺少右括号。我已经加进去了。
    猜你喜欢
    • 2014-08-15
    • 1970-01-01
    • 1970-01-01
    • 2013-05-29
    • 2016-09-03
    • 2020-08-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多