【发布时间】:2018-01-28 06:15:40
【问题描述】:
我正在阅读一个 HTML 文档,并希望将嵌套在某个名称的 div 标记中的 HTML 存储起来,同时保持其结构(间距)。这是为了能够将 HTML 文档转换为 React 的组件。我正在努力解决如何存储嵌套 HTML 的结构,并为 div 找到正确的结束标记,这表示嵌套在其中的所有内容都将成为 React 组件(div class='rc-componentname' 是开始标记)。任何帮助将不胜感激。谢谢!
编辑:我认为正则表达式是解决此问题的最佳方法。我以前没有使用过正则表达式,所以如果这是正确的,有人可以为我指出在这种情况下使用的表达式的正确方向。
import os
components = []
class react_template():
def __init__(self, component_name): # add nested html as second element
self.Import = "import React, { Component } from ‘react’;"
self.Class = "Class " + component_name + ' extends Component {'
self.Render = "render() {"
self.Return = "return "
self.Export = "Default export " + component_name + ";"
def react(component):
r = react_template(component)
if not os.path.exists('components'): # create components folder
os.mkdir('components')
os.chdir('components')
if not os.path.exists(component): # create folder for component
os.mkdir(component)
os.chdir(component)
with open(component + '.js', 'wb') as f: # create js component file
for j_key, j_code in r.__dict__.items():
f.write(j_code.encode('utf-8') + '\n'.encode('utf-8'))
f.close()
def process_html():
with open('file.html', 'r') as f:
for line in f:
if 'rc-' in line:
char_soup = list(line)
for index, char in enumerate(char_soup):
if char == 'r' and char_soup[index+1] == 'c' and char_soup[index+2] == '-':
sliced_soup = char_soup[int(index+3):]
c_slice_index = sliced_soup.index("\'")
component = "".join(sliced_soup[:c_slice_index])
components.append(component)
innerHTML(sliced_soup)
# react(component)
def innerHTML(sliced_soup): # work in progress
first_closing = sliced_soup.index(">")
sliced_soup = "".join(sliced_soup[first_closing:]).split(" ")
def generate_components(components):
for c in components:
react(c)
if __name__ == "__main__":
process_html()
【问题讨论】:
标签: python parsing html-parsing