【问题标题】:Issue in plotting a graph绘制图表的问题
【发布时间】:2023-03-30 12:22:02
【问题描述】:

我一直在尝试对新闻网站进行分析,比较每个网站写了多少关于 covid 的文章。我成功地提取了新闻网站的文章标题,并进行了单词搜索,它可以告诉我们提取的所有文章中有多少包含“COVID”这个词。现在我想在条形图中表示单词搜索结果,我使用了 matplotlib,但我遇到了一个我无法理解的错误。请帮忙。

以下是代码,(代码的最后一部分是关于我遇到错误的图表):

from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
from newspaper import Article
import requests
import matplotlib.pyplot as plt 
URL=["https://www.timesnownews.com/coronavirus","https://www.indiatoday.in/coronavirus"]
Url_count = []
for url in URL:
    parser = 'html.parser'  
    resp = requests.get(url)
    http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
    html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
    encoding = html_encoding or http_encoding
    soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)
    
    links = []
    for link in soup.find_all('a', href=True):
        if "javascript" in link["href"]:
            continue
        links.append(link['href'])
            
    count = 0
     
            
    for link in links:
        try:
            article = Article(link)
            article.download()
            article.parse()
            print(article.title)
            if "COVID" in article.title or "coronavirus" in article.title or "Coronavirus"in article.title or "Covid-19" in article.title or "COVID-19" in article.title :
                    count += 1
    
        except:
            pass
    Url_count.append(count)
    
for url, count in zip(URL, Url_count):
    print("Site:", url, "Count:", count)
    # x-coordinates of left sides of bars  
    left = [URL]
    # heights of bars 
    height=[Url_count]
    # labels for bars 
    tick_label=['timesnow', 'India today']
    # plotting a bar chart 
    plt.bar(left, height, tick_label = tick_label, 
        width = 0.8, color = ['red', 'green']) 
    # naming the x-axis 
    plt.xlabel('News websites') 
    # naming the y-axis 
    plt.ylabel('Number of articles') 
    # plot title 
    plt.title('Media analysis') 
  
    # function to show the plot 
    plt.show() 

以下是错误:

--------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in bar(self, x, height, width, bottom, align, **kwargs)
   2369                 try:
-> 2370                     left = x - width / 2
   2371                 except TypeError as e:

UFuncTypeError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U40'), dtype('<U40')) -> dtype('<U40')

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-59393af79dcd> in <module>
     45     tick_label=['timesnow', 'India today']
     46     # plotting a bar chart
---> 47     plt.bar(left, height, tick_label = tick_label, 
     48         width = 0.8, color = ['red', 'green']) 
     49     # naming the x-axis

~\anaconda3\lib\site-packages\matplotlib\pyplot.py in bar(x, height, width, bottom, align, data, **kwargs)
   2405         x, height, width=0.8, bottom=None, *, align='center',
   2406         data=None, **kwargs):
-> 2407     return gca().bar(
   2408         x, height, width=width, bottom=bottom, align=align,
   2409         **({"data": data} if data is not None else {}), **kwargs)

~\anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, data, *args, **kwargs)
   1563     def inner(ax, *args, data=None, **kwargs):
   1564         if data is None:
-> 1565             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1566 
   1567         bound = new_sig.bind(ax, *args, **kwargs)

~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in bar(self, x, height, width, bottom, align, **kwargs)
   2370                     left = x - width / 2
   2371                 except TypeError as e:
-> 2372                     raise TypeError(f'the dtypes of parameters x ({x.dtype}) '
   2373                                     f'and width ({width.dtype}) '
   2374                                     f'are incompatible') from e

TypeError: the dtypes of parameters x (<U40) and width (float64) are incompatible

【问题讨论】:

    标签: python matplotlib graph beautifulsoup


    【解决方案1】:

    您使用条形图的方式不正确。您不需要 for 循环来绘制多个条形图。您可以给出plt.bar(x, y) x 和 y,其中 x 是条形坐标列表,y 是高度列表。 我建议您查看barplot上的文档

    如果您要为许多网站执行此操作,那么最好从 URL 列表中生成一个列表,其中仅包含网站名称,因此删除“www”。和“.com/....”,所以你只需要给你的程序一个 url 列表,它就会为你完成这项工作。

    要解决您的问题,请将最后一个 for 循环替换为以下代码:

    import numpy as np 
    
    # make an array for the x-axis from the URL List 
    nx = np.arange(len(URL)) 
    
    labels = ['timessnownews', 'indiatoday']
    
    fig = plt.figure()
    ax = fig.gca()
    ax.set_title('Media analysis')
    ax.bar(nx, Url_count)
    ax.set_xticks(nx)
    ax.set_xticklabels(labels)
    ax.set_ylabel('Number of articles')
    ax.set_xlabel('News Websites')
    

    【讨论】:

      【解决方案2】:

      left 参数需要是 x 轴值的数组。您正在传递一个字符串列表。而且我不确定为什么每次都在绘制整个集合时要循环执行此操作。

      另外,height 需要是条形高度的列表,但您要使其成为列表的列表。请记住,Url_count 已经是一个列表。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-04-26
        • 2011-03-14
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多