【问题标题】:TestDome Data Science: Not getting correct answerTestDome 数据科学:没有得到正确的答案
【发布时间】:2018-12-08 21:58:23
【问题描述】:

我尝试从 TestDome 回答这个 question 并得到 250877.19298245612 而不是建议的 250000。请让我出了什么问题。谢谢

import numpy as np
from sklearn import linear_model

class MarketingCosts:

    # param marketing_expenditure list. Expenditure for each previous campaign.
    # param units_sold list. The number of units sold for each previous campaign.
    # param desired_units_sold int. Target number of units to sell in the new campaign.
    # returns float. Required amount of money to be invested.
    @staticmethod
    def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
        X = [[i] for i in units_sold]
        reg = linear_model.LinearRegression()
        reg.fit(X, marketing_expenditure)
        return float(reg.predict(desired_units_sold))

#For example, with the parameters below the function should return 250000.0.
print(MarketingCosts.desired_marketing_expenditure(
    [300000, 200000, 400000, 300000, 100000],
    [60000, 50000, 90000, 80000, 30000],
    60000))

【问题讨论】:

  • 我认为 250877 已经足够接近 25000 了。相差不到 0.5%。代码似乎正确。
  • 也许您可以对输出进行后处理以四舍五入
  • 您的答案通过了 1 次测试。如果将其舍入到 25000,则您通过了另一项测试,但未通过您之前通过的测试。不知道如何通过最后的测试。好像坏了。

标签: python python-3.x scikit-learn


【解决方案1】:

我认为这是解决方案,因为我们搜索从 y 预测 X,而这个问题中的标签是 units_sold。

import numpy as np
from sklearn import linear_model

class MarketingCosts:

    # param marketing_expenditure list. Expenditure for each previous campaign.
    # param units_sold list. The number of units sold for each previous campaign.
    # param desired_units_sold int. Target number of units to sell in the new campaign.
    # returns float. Required amount of money to be invested.
    @staticmethod
    def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
        marketing_expenditure = marketing_expenditure.reshape(-1, 1)
        units_sold = units_sold.reshape(-1, 1)
        reg = linear_model.LinearRegression()
        reg.fit(marketing_expenditure , units_sold)
        return (desired_units_sold - reg.intercept_)/reg.coef_

#For example, with the parameters below the function should return 250000.0.
print(MarketingCosts.desired_marketing_expenditure(
    [300000, 200000, 400000, 300000, 100000],
    [60000, 50000, 90000, 80000, 30000],
    60000))

【讨论】:

  • 这是第一个正确答案。关键是 X = marketing_expenditure,Y=units_sold。但是,您不会使用 predict 函数(给出 250870)。你会得到'desired_units',它是 Y 变量,并被要求找出 X 变量。 - 这意味着 a*x + b = 60000。现在求解 x。
【解决方案2】:
import numpy as np
from sklearn import linear_model

class MarketingCosts:

    # param marketing_expenditure list. Expenditure for each previous campaign.
    # param units_sold list. The number of units sold for each previous campaign.
    # param desired_units_sold int. Target number of units to sell in the new campaign.
    # returns float. Required amount of money to be invested.
    @staticmethod
    def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
        marketing_expenditure = np.asarray(marketing_expenditure).reshape(-1, 1)
        units_sold = np.asarray(units_sold).reshape(-1, 1)
        reg = linear_model.LinearRegression()
        reg.fit(marketing_expenditure , units_sold)
        return np.float((desired_units_sold - reg.intercept_)/reg.coef_)

#For example, with the parameters below the function should return 250000.0.
print(MarketingCosts.desired_marketing_expenditure(
    [300000, 200000, 400000, 300000, 100000],
    [60000, 50000, 90000, 80000, 30000],
    60000))

【讨论】:

    【解决方案3】:

    我遇到了同样的问题,我正在四舍五入解决第一个测试用例,因此第二个测试用例失败了......这是一个小样本,单变量回归,所以实际上看起来你不能使用普通回归,但是泰尔-森回归。我检查了结果,结果是 250000.00003619,然后您只需四舍五入即可。

    来源: https://gist.github.com/mfakbar/f97949299171c75e868a37f3f578fa54

    import numpy as np
    from sklearn import linear_model
    
    class MarketingCosts:
    
        # param marketing_expenditure list. Expenditure for each previous campaign.
        # param units_sold list. The number of units sold for each previous campaign.
        # param desired_units_sold int. Target number of units to sell in the new campaign.
        # returns float. Required amount of money to be invested.
        @staticmethod
        def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
            y, x = np.array(marketing_expenditure), np.array(units_sold).reshape(-1, 1)
            regressor = linear_model.TheilSenRegressor(max_subpopulation=10)
            regressor.fit(x, y)
            desired_units_sold = np.array([desired_units_sold]).reshape(-1, 1)
            return float(round(regressor.predict(desired_units_sold).item()))
    
    # For example, with the parameters below the function should return 250000.0.
    print(MarketingCosts.desired_marketing_expenditure(
        [300000, 200000, 400000, 300000, 100000],
        [60000, 50000, 90000, 80000, 30000],
        60000))
    

    【讨论】:

    • 我们能否详细说明为什么不能使用 sklearn 的默认 LinearRegression()?
    【解决方案4】:

    这是我通过所有测试的答案:

    import numpy as np
    from sklearn.linear_model import LinearRegression
    
    def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
    
        x = np.array(marketing_expenditure).reshape(-1, 1)
        y = np.array(units_sold).reshape(-1, 1)
        model = LinearRegression()
        model.fit(x , y)
    
        return (desired_units_sold - model.intercept_)/model.coef_    
    

    【讨论】:

    • 感谢您提供此代码 sn-p,它可能会提供一些有限的短期帮助。一个正确的解释would greatly improve 其长期价值,通过展示为什么这是解决问题的好方法,并将使其对未来有其他类似问题的读者更有用。请edit您的回答添加一些解释,包括您所做的假设。
    【解决方案5】:

    这是我通过所有测试用例的答案

    您可以找到执行线性回归的简单步骤here

    import numpy as np
    from sklearn.linear_model import LinearRegression
    
    def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
        s_x = sum(marketing_expenditure)
        s_y = sum(units_sold)
        xy = [] 
        for i in range (len(marketing_expenditure)):
            z= marketing_expenditure[i]*units_sold[i]
            xy.append(z)
        s_xy = sum(xy)
        sq_x = [number ** 2 for number in marketing_expenditure]
        s_sq_x = sum(sq_x)
        sq_y = [number ** 2 for number in units_sold]
        s_sq_y = sum(sq_y)   
        
        # calculating coefficients a and b for liner regression
        a=((s_y*s_sq_x) - (s_x*s_xy))/(len(marketing_expenditure)*s_sq_x - (s_x**2))
        b=(len(marketing_expenditure)*s_xy - (s_x*s_y)) / 
           (len(marketing_expenditure)*s_sq_x - (s_x**2))
        return (desired_units_sold-a)/b
    
    
    #For example, with the parameters below, the function should return 250000.0
    print(desired_marketing_expenditure(
        [300000, 200000, 400000, 300000, 100000],
        [60000, 50000, 90000, 80000, 30000],
        60000))
    
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-03-06
    • 2023-03-16
    • 2020-01-06
    • 1970-01-01
    • 2018-09-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多