【问题标题】:python requesting clarification on TypeError: unhashable type: 'list'python 请求澄清 TypeError: unhashable type: 'list'
【发布时间】:2021-05-28 15:06:22
【问题描述】:

需要澄清一下我面临的错误。

语料库是一个 Python 字典,将页面名称映射到由该页面链接的所有页面的集合。

page是代表页面的字符串

当我尝试这个时linkouts = corpus[page]

TypeError: unhashable type: 'list'

当我打印 corpus[page] 这是输出(语料库是集合的字典) {'3.html', '1.html'} 当 i ```print(type(corpus[page])) 设置输出时。

我可以遍历 corpus[page] ,但如果我尝试 len(corpus[page]) 会发生同样的错误。 corpus[page] 不是一个集合吗?我应该如何解决这个错误? Makinf a corpus[page].copy() 也面临同样的问题。非常感谢任何建议和帮助,谢谢大家!

pagelink.py 的代码

import os
import random
import re
import sys

DAMPING = 0.85
SAMPLES = 10000


def main():
    if len(sys.argv) != 2:
        sys.exit("Usage: python pagerank.py corpus")
    corpus = crawl(sys.argv[1])
    ranks = sample_pagerank(corpus, DAMPING, SAMPLES)
    print(f"PageRank Results from Sampling (n = {SAMPLES})")
    for page in sorted(ranks):
        print(f"  {page}: {ranks[page]:.4f}")
    #ranks = iterate_pagerank(corpus, DAMPING)
    #print(f"PageRank Results from Iteration")
    for page in sorted(ranks):
        print(f"  {page}: {ranks[page]:.4f}")


def crawl(directory):
    """
    Parse a directory of HTML pages and check for links to other pages.
    Return a dictionary where each key is a page, and values are
    a list of all other pages in the corpus that are linked to by the page.
    """
    pages = dict()

    # Extract all links from HTML files
    for filename in os.listdir(directory):
        if not filename.endswith(".html"):
            continue
        with open(os.path.join(directory, filename)) as f:
            contents = f.read()
            links = re.findall(r"<a\s+(?:[^>]*?)href=\"([^\"]*)\"", contents)
            pages[filename] = set(links) - {filename}

    # Only include links to other pages in the corpus
    for filename in pages:
        pages[filename] = set(
            link for link in pages[filename]
            if link in pages
        )

    return pages


def transition_model(corpus, page, damping_factor):
    """
    Return a probability distribution over which page to visit next,
    given a current page.

    With probability `damping_factor`, choose a link at random
    linked to by `page`. With probability `1 - damping_factor`, choose
    a link at random chosen from all pages in the corpus.
    """
    linkouts =  set(corpus[page])
    output = {}
    for key in corpus:
        output[key] = 0.00
    dampvalue = damping_factor / len(linkouts)
    for link in linkouts:
        output[link] += dampvalue
    if linkouts:
        dampvalue = 1 - damping_factor
        dampvalue = dampvalue / len(corpus)
        for key in corpus:
            output[key] += dampvalue
    else:
        dampvalue = 1 / len(corpus)
        for key in corpus:
            output[key] += dampvalue
    return output



def sample_pagerank(corpus, damping_factor, n):
    """
    Return PageRank values for each page by sampling `n` pages
    according to transition model, starting with a page at random.

    Return a dictionary where keys are page names, and values are
    their estimated PageRank value (a value between 0 and 1). All
    PageRank values should sum to 1.
    """
    samples = []
    first = random.choice(list(corpus))
    samples.append(first)
    for i in range(n-1):
        output = transition_model(corpus, first, damping_factor)
        second = random.choices(list(output), weights=(output.values()))
        samples.append(second)
        first = second

    output = {}
    for link in corpus:
        num = 0
        for sample in samples:
            if sample == link:
                num += 1
        output[link] = num / n

    return output





def iterate_pagerank(corpus, damping_factor):
    """
    Return PageRank values for each page by iteratively updating
    PageRank values until convergence.

    Return a dictionary where keys are page names, and values are
    their estimated PageRank value (a value between 0 and 1). All
    PageRank values should sum to 1.
    """
    raise NotImplementedError


if __name__ == "__main__":
    main()

1.html2.html 的代码在与 pagerank.py 相同的文件夹中的文件夹(corpus0)中

1.html

<html lang="en">
    <head>
        <title>1</title>
    </head>
    <body>
        <h1>1</h1>

        <div>Links:</div>
        <ul>
            <li><a href="2.html">2</a></li>
        </ul>
    </body>
</html>

2.html

<!DOCTYPE html>
<html lang="en">
    <head>
        <title>2</title>
    </head>
    <body>
        <h1>2</h1>

        <div>Links:</div>
        <ul>
            <li><a href="1.html">1</a></li>
            <li><a href="3.html">3</a></li>
        </ul>
    </body>
</html>

程序使用 python pagerank.py corpus0 运行

编辑

linkouts = []
    for i in corpus[page]:
        linkouts.append(i)

给出相同类型的错误,但如果我将 linkouts.append(i) 替换为 print(i) 则没有错误,i 也是类型 str

【问题讨论】:

  • 这取决于page的类型。我认为您可能正在使用不同的页面对象来索引并获得不同的结果。考虑发布一个可重现的最小示例 (stackoverflow.com/help/minimal-reproducible-example) 以获得更好的反馈。
  • @nneonneo 是的,添加了。但page 是一个字符串
  • 问题似乎出在sample_pagerank 而不是transition_model

标签: python set typeerror


【解决方案1】:

random.choices 即使在k=1 时也会返回一个列表,因此从第二次迭代开始,我在pages 中将list 发送到transition_model

sample_pagerank 内修复

for i in range(n-1):
        output = transition_model(corpus, first, damping_factor)
        second = random.choices(list(output), weights=(output.values()), k=1)[0]
        samples.append(second)
        first = second

【讨论】:

    猜你喜欢
    • 2015-02-11
    • 2020-05-11
    • 2022-12-05
    • 2021-03-14
    • 1970-01-01
    • 1970-01-01
    • 2019-10-11
    • 2020-03-27
    • 1970-01-01
    相关资源
    最近更新 更多