判断静态文件站点是否包含css继承规则答案

【问题标题】：Determine whether static file site contains css inheritance rule判断静态文件站点是否包含css继承规则
【发布时间】：2017-10-27 14:38:29
【问题描述】：

我正在开发一个呈现静态 html 文件的站点，我希望确定站点中的哪些页面包含特定的 css 继承规则，例如 .parent .child（从父级继承的子类）。

我可以想象一个网络爬虫访问这些页面中的每一个，运行测试以查看给定页面是否具有该样式，并返回报告，但是是否有任何工具已经可以很好地为静态文件站点执行此操作（例如，不是 webpack 的 css-tree-shake-plugin）？如果其他人可以就这个问题提供任何见解，我将不胜感激。

【问题讨论】：

标签： css tree-shaking

【解决方案1】：

这是我想出的：

#!/usr/bin/env python

'''
from source:

pip install selenium
pip install beautifulsoup4
brew install phantomjs

usage: python shake_trees.py '_site' '.parent .child'
'''

from bs4 import BeautifulSoup
from selenium import webdriver
import sys, copy, multiprocessing, os, fnmatch

def clean_href(href):
  return href.split('#')[0].split('?')[0]

def get_web_urls(all_links, visited):
  recurse = False
  for link in copy.copy(all_links):
    if link not in visited:
      visited.add(link)
      driver.get(link)
      for tag in driver.find_elements_by_tag_name('a'):
        href = clean_href(tag.get_attribute('href'))
        if domain in href and href not in visited:
          recurse = True
          all_links.add(href)
  if recurse:
    return get_web_urls(all_links, visited)
  else:
    print(all_links, visited)
    return all_links

def get_static_site_urls():
  matches = []
  for root, dirnames, filenames in os.walk(root_url):
    for filename in fnmatch.filter(filenames, file_match):
      matches.append(os.path.join(root, filename))
  return matches

if __name__ == '__main__':

  # parse command line arguments
  root_url = sys.argv[1]
  css_selector = sys.argv[2]

  # globals
  domain = root_url
  static_site = False if 'http' in root_url else True
  file_match = '*.html'

  # initialize the phantom driver
  driver = webdriver.PhantomJS()
  driver.set_window_size(1000, 1000)

  if static_site:
    urls = get_static_site_urls()
  else:
    driver.get( root_url )
    urls = get_web_urls( set([root_url]), set() )

  for url in urls:
    if static_site:
      url = 'file://' + os.path.join(os.getcwd(), url)
    driver.get(url)
    soup = BeautifulSoup( driver.page_source, 'html.parser' )
    if soup.select_one(css_selector):
      print('match on', url)

【讨论】：