正确获取与关联对象的关系答案

【问题标题】：Getting relations with Association Object right正确获取与关联对象的关系
【发布时间】：2022-11-09 22:01:20
【问题描述】：

使用 Scrapy 抓取网站时，创建以下形式的数据库（在教程结构中的 models.py 中定义）：

    from sqlalchemy import create_engine, Column, Table, ForeignKey, MetaData
    from sqlalchemy.orm import relationship
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy import (Integer, String, Date, DateTime, Float, Boolean, Text)
    from scrapy.utils.project import get_project_settings
    
    Base = declarative_base()
    
    def db_connect():
        return create_engine(get_project_settings().get("CONNECTION_STRING"))
    
    def create_table(engine):
        Base.metadata.create_all(engine)

    Article_author = Table('article_author', Base.metadata,
      Column('article_id', Integer, ForeignKey('article.article_id'), primary_key=True),
      Column('author_id', Integer, ForeignKey('author.author_id'), primary_key=True),
      Column('author_number', Integer)
    )

    class Article(Base):
      __tablename__ = "article"

      article_id    = Column(Integer, primary_key=True)
      article_title = Column('name', String(50), unique=True)
      authors = relationship('Author', secondary='article_author',lazy='dynamic', backref="article") 

    class Author(Base):
      __tablename__ = "author"

      author_id        = Column(Integer, primary_key=True)
      author_name     = Column('name', String(50), unique=True)
      articles = relationship('Article', secondary='article_author',lazy='dynamic', backref="article")

将作者编号（例如第一作者或第二作者）添加到自动创建的关联表“article_author”时会发生错误，因为我不知道如何从 pipelines.py 脚本中访问该表。 article 和 author 表之间存在多对多关系，因为一个作者可以写多篇文章，而文章可以有多个作者。 article 表有一个唯一的 article_id，而 author 表有一个唯一的 author_id。关联表具有唯一的 (article_id,author_id) 结构。在 pipeline.py 脚本中有一个函数 proces_item 可以在其中创建文章的实例，然后相应地更新作者和关联表。问题是如何插入作者编号。

是否应该在 models.py 中添加关系？

脚本 pipeline.py 内容如下：

    from sqlalchemy.orm import sessionmaker
    from scrapy.exceptions import DropItem
    from tutorial.models import Article, Author, Article_author, Article_author, db_connect, create_table
    
    class SavePipeline(object):
        
        def __init__(self):
            """
            Initializes database connection and sessionmaker
            Creates tables
            """
            engine = db_connect()
            create_table(engine)
            self.Session = sessionmaker(bind=engine)
    
    
        def process_item(self, item, spider):
            session = self.Session()
            article = Article()
            #article_author = Article_author()
    
            #check whether the current article has authors or not
            if 'author' in item:
                for author,n in zip(item["author"],item["n"]):
                    writer = Author(author=author)
                    # check whether author already exists in the database
                    exist = session.query(Author).filter_by(author = writer.author).first()
                    if exist_title is not None:  
                    # the current author exists
                        writer = exist
                    article.authors.append(writer)
                    nr = article_author(author_number =n)
                    article.article_author.append(nr)
                    #article_author.append(nr)
                    #article.authors.append(pag) 
                    #article_author.author_number = n               
    
            try:
                session.add(proverb)
                session.commit()
    
            except:
                session.rollback()
                raise
    
            finally:
                session.close()
    
            return item

终端产生的错误是完整性错误，因为它与 author_id 无关：

sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: article_author.author_id
[SQL: INSERT INTO proverb_source (article_id, author_number) VALUES (?, ?)]
[parameters: (30, 2]

在 process_item 中定义实例 Article_author 并通过

    nr = Article_author(author_number =n)
    article_author.append(nr)

它会导致属性错误：

article_author.append(nr)
AttributeError: 'Article_author' object has no attribute 'append'

通过文章的作者成员添加时

    article.authors.append(pag)

它给出了一个 ValueError：

ValueError: Bidirectional attribute conflict detected: Passing object <Article_author at 0x7f9007276c70> to attribute "Article.authors" triggers a modify event on attribute "Article.article_author" via the backref "Article_author.article".

直接访问它时不会出错，但会将该列留空，

article_author.author_number = n

【问题讨论】：

NameError 表明代码中存在逻辑错误。你能edit这个问题包含完整的错误回溯吗？
@snakecharmerb，感谢您的回复，我添加了错误回溯。也许可以通过作为文章成员访问它来解决：article.article_authors，但这可能需要在关系中定义。你也许知道怎么做？
可以包含process_item 的代码吗？回溯中的代码与问题中的任何代码都不匹配。
@snakecharmerb，谢谢回复，我添加了'''process_item'''，忘记导入关联表类。现在它给出了一个完整性错误。你知道如何正确调用吗？
而不是 nr = article_author(author_number =n) article.article_author.append(nr)

标签： sqlalchemy scrapy

【解决方案1】：

我通过定义关联表中的关系并从该表中追加来解决这个问题，参见。 https://docs.sqlalchemy.org/en/14/glossary.html#term-association-relationship

【讨论】：