【发布时间】:2021-09-10 16:32:18
【问题描述】:
代码
from __future__ import unicode_literals
import youtube_dl
import pandas as pd
import csv
import re
# read the csv file
number_of_rows = pd.read_csv('single.csv')
# Initialize YouTube-DL Array
ydl_opts = {}
all_scrapes = []
twitter_list = []
# Scrape Online Product
def run_scraper():
# Read CSV to List
with open("single.csv", "r") as f:
csv_reader = csv.reader(f)
next(csv_reader)
# Scrape Data From Store
for csv_line_entry in csv_reader:
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
meta = ydl.extract_info(csv_line_entry[0], download=False)
channel = meta['channel']
title = meta['title']
description = meta['description']
print('Channel :', channel)
print('Title :', title)
#print('description :', description)
get_links(description)
print("-" * 120)
print()
print('Demo:', twitter_list)
# Make a tuple with the relevant info of the current YouTube Scrapes
current_scrapes = (channel, title, twitter_list)
all_scrapes.append(current_scrapes)
print('All Scrapes:', all_scrapes)
print()
def get_links(description):
# Find URLs in description
description_urls = re.findall(r'(https?://[^\s]+)', description)
#print('List Before :', description_urls, '\n')
# Twitter Resources
if 'twitter.com' in description:
for item in description_urls:
#print('Print All URLs:', item)
if 'twitter.com' in item:
print('- Twitter URL Found:', item)
twitter_list.append(item)
run_scraper()
CSV 文件
Videos
https://www.youtube.com/watch?v=kqtD5dpn9C8
https://www.youtube.com/watch?v=rfscVS0vtbw
上述代码从 CSV 文件中提取 YouTube 网址,然后打印频道和标题信息。
此外,它通过 get_links 函数从 YouTube 描述中提取 Twitter URL。
问题
当我在 get_links 函数中打印捕获的 Twitter 网址时(第 61 行)
print('- Twitter URL Found:', item)
结果显示正确显示每个用户各自的 Twitter 条目。
如果没有看到所有捕获的 Twitter 网址填充每个元组条目,我无法将此信息提取到元组 current_scrapes 中。
任何帮助将不胜感激。
【问题讨论】:
标签: python list function web-scraping tuples