TestDome 数据科学：没有得到正确的答案答案

【问题标题】：TestDome Data Science: Not getting correct answerTestDome 数据科学：没有得到正确的答案
【发布时间】：2018-12-08 21:58:23
【问题描述】：

我尝试从 TestDome 回答这个 question 并得到 250877.19298245612 而不是建议的 250000。请让我出了什么问题。谢谢

import numpy as np
from sklearn import linear_model

class MarketingCosts:

    # param marketing_expenditure list. Expenditure for each previous campaign.
    # param units_sold list. The number of units sold for each previous campaign.
    # param desired_units_sold int. Target number of units to sell in the new campaign.
    # returns float. Required amount of money to be invested.
    @staticmethod
    def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
        X = [[i] for i in units_sold]
        reg = linear_model.LinearRegression()
        reg.fit(X, marketing_expenditure)
        return float(reg.predict(desired_units_sold))

#For example, with the parameters below the function should return 250000.0.
print(MarketingCosts.desired_marketing_expenditure(
    [300000, 200000, 400000, 300000, 100000],
    [60000, 50000, 90000, 80000, 30000],
    60000))

【问题讨论】：

我认为 250877 已经足够接近 25000 了。相差不到 0.5%。代码似乎正确。
也许您可以对输出进行后处理以四舍五入
您的答案通过了 1 次测试。如果将其舍入到 25000，则您通过了另一项测试，但未通过您之前通过的测试。不知道如何通过最后的测试。好像坏了。

标签： python python-3.x scikit-learn

【解决方案1】：

我认为这是解决方案，因为我们搜索从 y 预测 X，而这个问题中的标签是 units_sold。

import numpy as np
from sklearn import linear_model

class MarketingCosts:

    # param marketing_expenditure list. Expenditure for each previous campaign.
    # param units_sold list. The number of units sold for each previous campaign.
    # param desired_units_sold int. Target number of units to sell in the new campaign.
    # returns float. Required amount of money to be invested.
    @staticmethod
    def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
        marketing_expenditure = marketing_expenditure.reshape(-1, 1)
        units_sold = units_sold.reshape(-1, 1)
        reg = linear_model.LinearRegression()
        reg.fit(marketing_expenditure , units_sold)
        return (desired_units_sold - reg.intercept_)/reg.coef_

#For example, with the parameters below the function should return 250000.0.
print(MarketingCosts.desired_marketing_expenditure(
    [300000, 200000, 400000, 300000, 100000],
    [60000, 50000, 90000, 80000, 30000],
    60000))

【讨论】：

这是第一个正确答案。关键是 X = marketing_expenditure，Y=units_sold。但是，您不会使用 predict 函数（给出 250870）。你会得到'desired_units'，它是 Y 变量，并被要求找出 X 变量。 - 这意味着 a*x + b = 60000。现在求解 x。

【解决方案2】：

import numpy as np
from sklearn import linear_model

class MarketingCosts:

    # param marketing_expenditure list. Expenditure for each previous campaign.
    # param units_sold list. The number of units sold for each previous campaign.
    # param desired_units_sold int. Target number of units to sell in the new campaign.
    # returns float. Required amount of money to be invested.
    @staticmethod
    def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
        marketing_expenditure = np.asarray(marketing_expenditure).reshape(-1, 1)
        units_sold = np.asarray(units_sold).reshape(-1, 1)
        reg = linear_model.LinearRegression()
        reg.fit(marketing_expenditure , units_sold)
        return np.float((desired_units_sold - reg.intercept_)/reg.coef_)

#For example, with the parameters below the function should return 250000.0.
print(MarketingCosts.desired_marketing_expenditure(
    [300000, 200000, 400000, 300000, 100000],
    [60000, 50000, 90000, 80000, 30000],
    60000))

【讨论】：

【解决方案3】：

我遇到了同样的问题，我正在四舍五入解决第一个测试用例，因此第二个测试用例失败了......这是一个小样本，单变量回归，所以实际上看起来你不能使用普通回归，但是泰尔-森回归。我检查了结果，结果是 250000.00003619，然后您只需四舍五入即可。

来源： https://gist.github.com/mfakbar/f97949299171c75e868a37f3f578fa54

import numpy as np
from sklearn import linear_model

class MarketingCosts:

    # param marketing_expenditure list. Expenditure for each previous campaign.
    # param units_sold list. The number of units sold for each previous campaign.
    # param desired_units_sold int. Target number of units to sell in the new campaign.
    # returns float. Required amount of money to be invested.
    @staticmethod
    def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
        y, x = np.array(marketing_expenditure), np.array(units_sold).reshape(-1, 1)
        regressor = linear_model.TheilSenRegressor(max_subpopulation=10)
        regressor.fit(x, y)
        desired_units_sold = np.array([desired_units_sold]).reshape(-1, 1)
        return float(round(regressor.predict(desired_units_sold).item()))

# For example, with the parameters below the function should return 250000.0.
print(MarketingCosts.desired_marketing_expenditure(
    [300000, 200000, 400000, 300000, 100000],
    [60000, 50000, 90000, 80000, 30000],
    60000))

【讨论】：

我们能否详细说明为什么不能使用 sklearn 的默认 LinearRegression()？

【解决方案4】：

这是我通过所有测试的答案：

import numpy as np
from sklearn.linear_model import LinearRegression

def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):

    x = np.array(marketing_expenditure).reshape(-1, 1)
    y = np.array(units_sold).reshape(-1, 1)
    model = LinearRegression()
    model.fit(x , y)

    return (desired_units_sold - model.intercept_)/model.coef_

【讨论】：

感谢您提供此代码 sn-p，它可能会提供一些有限的短期帮助。一个正确的解释would greatly improve 其长期价值，通过展示为什么这是解决问题的好方法，并将使其对未来有其他类似问题的读者更有用。请edit您的回答添加一些解释，包括您所做的假设。

【解决方案5】：

这是我通过所有测试用例的答案

您可以找到执行线性回归的简单步骤here

import numpy as np
from sklearn.linear_model import LinearRegression

def desired_marketing_expenditure(marketing_expenditure, units_sold, desired_units_sold):
    s_x = sum(marketing_expenditure)
    s_y = sum(units_sold)
    xy = [] 
    for i in range (len(marketing_expenditure)):
        z= marketing_expenditure[i]*units_sold[i]
        xy.append(z)
    s_xy = sum(xy)
    sq_x = [number ** 2 for number in marketing_expenditure]
    s_sq_x = sum(sq_x)
    sq_y = [number ** 2 for number in units_sold]
    s_sq_y = sum(sq_y)   
    
    # calculating coefficients a and b for liner regression
    a=((s_y*s_sq_x) - (s_x*s_xy))/(len(marketing_expenditure)*s_sq_x - (s_x**2))
    b=(len(marketing_expenditure)*s_xy - (s_x*s_y)) / 
       (len(marketing_expenditure)*s_sq_x - (s_x**2))
    return (desired_units_sold-a)/b


#For example, with the parameters below, the function should return 250000.0
print(desired_marketing_expenditure(
    [300000, 200000, 400000, 300000, 100000],
    [60000, 50000, 90000, 80000, 30000],
    60000))

【讨论】：

statisticshowto.com/probability-and-statistics/…