【问题标题】:How do I iterate over a column in a Pandas.DataFrame and append the result of a function to the same row?如何遍历 Pandas.DataFrame 中的列并将函数的结果附加到同一行?
【发布时间】:2026-01-25 23:35:01
【问题描述】:

我有一个通过以下 CSV 生成的Pandas.DataFrame

Category,Brand,Product Name,Price,Expiration Date, Package ID,Quantity
Cat1,Brand1,Product1,$1000,07/14/2020,XXXXXX,34

我正在尝试将一列附加到 CSV,每行中都有一个整数,对应于到期日期的多长时间(4 表示大于 6 个月,3 表示在 3 到 6 个月之间,等等)。

我的问题是,当尝试将Expiration Date 列转换为日期时间(使用pandas.to_datetime(df['Expiration Date']))然后应用我的classify_expiration() 函数时,类型要么与函数指示的内容不匹配,要么尝试应用函数到index 0,这是我认为的标题(因此与%m/%d/%Y 格式不匹配)。我尝试在分类函数内以及在 .apply() 调用之前将列转换为日期时间。我也尝试使用timedelta 将到期日期与今天的当前日期进行比较,但它不适用于datetime.date.today()

这是我尝试的第一种方法:

def classify_expiration(row):    
    one_week = timedelta(weeks=1, days=0, hours=0, minutes=0, seconds=0)

    if ((one_week * 0) <= (date.today() - row['Expiration Date']) <= (one_week * 4)):
        return 4

这种方式会给我与 index 0 的类型不正确或无法将函数应用于系列有关的错误。

这是我刚刚尝试过的,它给了我一个AssertionError

def days_between(date1, date2):
    """Calculates the number of days between two dates

    Keyword arguments:
    date1 -- The first date in the subtraction.
    date2 -- The second date in the subtraction.
    """
    date1 = datetime.strptime(date1, '%m/%d/%Y')
    date2 = datetime.strptime(date2, '%m/%d/%Y')
    return abs((date2 - date1).days)


def classify_expiration(row):
    """Calculate days/weeks to expiration. Assign quartile based on value.

    Keyword arguments:
    row -- row in a `pandas.core.frame.DataFrame` object. e.g. `df['A']`
    """

    date_today = datetime.strptime(
        date.today().strftime('%m/%d/%Y'), '%m/%d/%Y')

    if (days_between(row, date_today) <= 30):
        return 4
    if (31 <= days_between(row, date_today) <= 90):
        return 3
    if (91 <= days_between(row, date_today) <= 120):
        return 2
    if (days_between(row, date_today) >= 121):
        return 1

这是我尝试应用该功能的地方:

# Convert column to `datetime` if its current type is str
pd.to_datetime(product_sales['Expiration Date'])

# Applying the `classify_expiration()` function
product_sales['Expiration Quartile'] = product_sales.apply(
    lambda row: classify_expiration(row), axis=1
)

我希望该函数向 DataFrame 添加一个新列,该列包含为每一行中的到期日期生成的四分位数。我会收到AssertionErrorargument 1 must be str, not Series 以及与index 0 相关的各种其他错误。

【问题讨论】:

    标签: python pandas csv dataframe datetime


    【解决方案1】:

    如果分配回product_sales['Expiration Date'] = pd.to_datetime(product_sales['Expiration Date']),则需要在days_between 函数中删除转换为日期时间,然后按标量使用product_sales['Expiration Date'].apply(classify_expiration) for 循环:

    def days_between(date1, date2):
        """Calculates the number of days between two dates
    
        Keyword arguments:
        date1 -- The first date in the subtraction.
        date2 -- The second date in the subtraction.
        """
        return abs((date2 - date1).days)
    
    
    product_sales['Expiration Date'] = pd.to_datetime(product_sales['Expiration Date'])
    
    product_sales['Expiration Quartile'] = (product_sales['Expiration Date']
                                                   .apply(classify_expiration))
    print (product_sales)
      Category   Brand Product Name  Price Expiration Date Package ID  Quantity  \
    0     Cat1  Brand1     Product1  $1000      2020-07-14     XXXXXX        34   
    
       Expiration Quartile  
    0                    1  
    

    Pandas 具有 binnig 的特殊功能,因此您可以使用 cut

    product_sales['Expiration Date'] = pd.to_datetime(product_sales['Expiration Date'])
    
    product_sales['Expiration Quartile'] = (product_sales['Expiration Date']
                                                 .apply(classify_expiration))
    
    s = product_sales['Expiration Date'].sub(pd.to_datetime('today').floor('d')).dt.days
    
    product_sales['Expiration Quartile1'] = pd.cut(s, 
                                                   bins=[0, 30, 90,120, np.inf], 
                                                   labels=[4,3,2,1])
    print (product_sales)
      Category   Brand Product Name  Price Expiration Date Package ID  Quantity  \
    0     Cat1  Brand1     Product1  $1000      2020-07-14     XXXXXX        34   
    1     Cat1  Brand1     Product1  $1000      2020-01-13     XXXXXX        34   
    2     Cat1  Brand1     Product1  $1000      2019-11-01     XXXXXX        34   
    3     Cat1  Brand1     Product1  $1000      2020-01-15     XXXXXX        34   
    
       Expiration Quartile Expiration Quartile1  
    0                    1                    1  
    1                    3                    3  
    2                    4                    4  
    3                    2                    2  
    

    【讨论】:

      最近更新 更多