【问题标题】:How do I scrape an argument of a javascript function inside of a javascript html tag?如何在 javascript html 标记中抓取 javascript 函数的参数?
【发布时间】:2017-09-14 05:37:11
【问题描述】:

我想刮掉 Dygraph 函数的参数(主要是一长串日期),因为它们是图表上的点。到目前为止,我一直在抓取其他类型的标签,这些标签很容易通过使用 findAll 函数获取,但是,看起来我需要在这个问题中深入挖掘。

<script type="text/javascript">     
g = new Dygraph(

// containing div
document.getElementById('DailySubscribers'),
// CSV or path to a CSV file.
"Date,Daily Subs\n" + "2016-07-31,1\n" + "2016-08-01,1\n" + "2016-08-02,0\n" + "2016-08-03,1\n" + "2016-08-04,0\n" + "2016-08-05,2\n" + "2016-08-06,10\n" + "2016-08-07,5\n" + "2016-08-08,1\n" + "2016-08-09,1\n" + "2016-08-10,2\n" + "2016-08-11,0\n" + "2016-08-12,0\n" + "2016-08-13,0\n" + "2016-08-14,0\n" + "2016-08-15,1\n" + "2016-08-16,1\n" + "2016-08-17,0\n" + "2016-08-18,0\n" + "2016-08-19,1\n" + "2016-08-20,0\n" + "2016-08-21,1\n" + "2016-08-22,0\n" + "2016-08-23,0\n" + "2016-08-24,7\n" + "2016-08-25,2\n" + "2016-08-26,0\n" + "2016-08-27,1\n" + "2016-08-28,1\n" + "2016-08-29,0\n" + "2016-08-30,0\n" + "2016-08-31,0\n" + "2016-09-01,0\n" + "2016-09-02,0\n" + "2016-09-03,0\n" + "2016-09-04,0\n" + "2016-09-05,1\n" + "2016-09-06,0\n" + "2016-09-07,0\n" + "2016-09-08,0\n", {
        title: 'Daily Subs Gained for UCZx2vmIsQQLwzqwGWUbfqQA ',
        legend: 'always',
        ylabel: 'Daily Subs',
        titleHeight: 20,
        labelsDivStyles: {
                        'background': 'none',
                        'margin-top': '-10px',
                        'text-align': 'right',
                      },
        strokeWidth: 1,
        colors: ["#dd2323",
                 "#dd2323",
                 "#dd2323",
                 "#dd2323"],
        labelsKMB: true,
        maxNumberWidth: 10
        }
);
</script>

【问题讨论】:

  • 你想要什么值?
  • 我想提取 Date,Daily Subs\n" + "2016-07-31,1\n" + "2016-08-01,1\n" .... 这些值
  • 所以你想提取那行直到2016-09-08,0\n", { 对吗?
  • 是的,没错。所有日期及其值。

标签: javascript python html web-scraping beautifulsoup


【解决方案1】:

这是解决它的快速方法(蛮力但有效)

bs = BeautifulSoup(data, 'html.parser')
print(bs)
values = (str(bs).split('"Date,Daily Subs\\n" +')[1].split(', {')[0].replace('\\n" + "', " ").replace('\\n', " ").replace("\"", "").split(" "))[1:-1]
print(values)

输出:

<script type="text/javascript">g = new Dygraph(// containing divdocument.getElementById('DailySubscribers'),// CSV or path to a CSV file."Date,Daily Subs\n" + "2016-07-31,1\n" + "2016-08-01,1\n" + "2016-08-02,0\n" + "2016-08-03,1\n" + "2016-08-04,0\n" + "2016-08-05,2\n" + "2016-08-06,10\n" + "2016-08-07,5\n" + "2016-08-08,1\n" + "2016-08-09,1\n" + "2016-08-10,2\n" + "2016-08-11,0\n" + "2016-08-12,0\n" + "2016-08-13,0\n" + "2016-08-14,0\n" + "2016-08-15,1\n" + "2016-08-16,1\n" + "2016-08-17,0\n" + "2016-08-18,0\n" + "2016-08-19,1\n" + "2016-08-20,0\n" + "2016-08-21,1\n" + "2016-08-22,0\n" + "2016-08-23,0\n" + "2016-08-24,7\n" + "2016-08-25,2\n" + "2016-08-26,0\n" + "2016-08-27,1\n" + "2016-08-28,1\n" + "2016-08-29,0\n" + "2016-08-30,0\n" + "2016-08-31,0\n" + "2016-09-01,0\n" + "2016-09-02,0\n" + "2016-09-03,0\n" + "2016-09-04,0\n" + "2016-09-05,1\n" + "2016-09-06,0\n" + "2016-09-07,0\n" + "2016-09-08,0\n", {        title: 'Daily Subs Gained for UCZx2vmIsQQLwzqwGWUbfqQA ',        legend: 'always',        ylabel: 'Daily Subs',        titleHeight: 20,        labelsDivStyles: {                        'background': 'none',                        'margin-top': '-10px',                        'text-align': 'right',                      },        strokeWidth: 1,        colors: ["#dd2323",                 "#dd2323",                 "#dd2323",                 "#dd2323"],        labelsKMB: true,        maxNumberWidth: 10        });</script>
['2016-07-31,1', '2016-08-01,1', '2016-08-02,0', '2016-08-03,1', '2016-08-04,0', '2016-08-05,2', '2016-08-06,10', '2016-08-07,5', '2016-08-08,1', '2016-08-09,1', '2016-08-10,2', '2016-08-11,0', '2016-08-12,0', '2016-08-13,0', '2016-08-14,0', '2016-08-15,1', '2016-08-16,1', '2016-08-17,0', '2016-08-18,0', '2016-08-19,1', '2016-08-20,0', '2016-08-21,1', '2016-08-22,0', '2016-08-23,0', '2016-08-24,7', '2016-08-25,2', '2016-08-26,0', '2016-08-27,1', '2016-08-28,1', '2016-08-29,0', '2016-08-30,0', '2016-08-31,0', '2016-09-01,0', '2016-09-02,0', '2016-09-03,0', '2016-09-04,0', '2016-09-05,1', '2016-09-06,0', '2016-09-07,0', '2016-09-08,0']

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-05-09
    • 2017-11-08
    • 2017-04-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-02-03
    相关资源
    最近更新 更多