【问题标题】:How to generate JSON from a pandas data frame for dynamic d3.js tree如何从动态 d3.js 树的 pandas 数据帧生成 JSON
【发布时间】:2021-06-29 15:13:02
【问题描述】:

我是 Pandas 的新手,有一个复杂的要求。

我有一个数据框,其中包含多个列,如下所示

Parent                            Child      Color
IT;Programming;Trending;Demand    Python     #6200ea
IT;Programming;Trending;Demand    Docker     #7c4dff
IT;Programming;Testing            Selenium   #b388ff
IT;Programming;Old                C/C++/C#   #ff1744
IT-Tools;Testing                  HP UFT     #aa00ff
IT-Tools;IDE                      PyCharm    #9c27b0

我已经使用 str.split(';') 在数据框中生成多个 Parent 列

df = df.join(df.Parent.str.split(";", expand=True).add_prefix('branch'))
df.drop(columns=['Parent'], inplace=True)
print(df)

输出:

branch0    branch1        branch2    branch3   Child      Color
IT         Programming    Trending   Demand    Python     #6200ea
IT         Programming    Trending   Demand    Docker     #7c4dff
IT         Programming    Testing    None      Selenium   #b388ff
IT         Programming    Old        None      C/C++/C#   #ff1744
IT-Tools   Testing        None       None      HP UFT     #aa00ff
IT-Tools   IDE            None       None      PyCharm    #9c27b0

我需要生成一个分类树,我需要为它生成一个 JSON(连同颜色值),在下面的网站上提到

https://bl.ocks.org/swayvil/b86f8d4941bdfcbfff8f69619cd2f460#data-example.json

谁能帮帮我 谢谢!

【问题讨论】:

    标签: python django pandas dataframe d3.js


    【解决方案1】:

    您的数据需要更新 d3 树:

    • d3 树需要一个根节点 - ITIT-Tools 需要一个父节点
    • Testing 节点不明确,因为它既是 ProgrammingIT-Tools 的子节点 - 所以您需要更新,例如Testing1Testing2

    由于branch2branch3 中的Nones,您的第二个数据框显示您的层次结构是unbalanced(叶在不同的深度)。因此,您的输出 JSON 应该是单独的分支定义,而不是每行的多个定义,如下所示:

    parent,child,color
    IT,Programming,None
    Programming,Trending,None
    Trending,Demand,None
    Demand,Python,#6200ea
    etc
    

    这比具有冗余的每行多分支更有效。例如。 IT 作为 Programming 的父级被多次定义,而父/子结构则被定义一次。

    以下代码将您的输入转换为输出,您可以将其作为响应发送给客户端,然后使用 d3 构建树。

    我们可以使用一个集合来存储一对父母的唯一字符串,然后为最后一个父母/孩子添加一个项目。然后从这个集合中创建另一个数据框(创建基于 ; 拆分的列,与您的 OP 类似),然后导出为 JSON:

    import io
    import pandas as pd
    
    # data as a string
    text = '''Parent Child Color
    IT;Programming;Trending;Demand    Python     #6200ea
    IT;Programming;Trending;Demand    Docker     #7c4dff
    IT;Programming;Testing1           Selenium   #b388ff
    IT;Programming;Old                C/C++/C#   #ff1744
    IT-Tools;Testing2                 HP-UFT     #aa00ff
    IT-Tools;IDE                      PyCharm    #9c27b0'''
    
    # your original data frame
    df = pd.read_csv(io.StringIO(text), sep=r'\s+')
    
    # prepend a Root to Parent column
    df.Parent = 'Root;' + df.Parent
    
    # dataframe to set for unique branches
    # start with just root in the set
    branch_strings = set([';Root;'])
    for index, row in df.iterrows():
      parents = row.Parent.split(';')
      for curr, next in zip(parents, parents[1:]):
        branch_strings.add(';'.join([curr, next, '']))
      branch_strings.add(';'.join([next, row.Child, row.Color]))
    
    # set to list
    branches = list(map(lambda row: row.split(';'), branch_strings))
    
    # new dataframe from relations
    df2 = pd.DataFrame(branches, columns=['parent', 'child', 'color'])
    
    # JSON
    json = df2.to_json(orient='records')
    print(json)
    

    这会产生这个 JSON 输出:

    const data = [
      {"parent":"Programming","child":"Old","color":""},
      {"parent":"Testing1","child":"Selenium","color":"#b388ff"},
      {"parent":"","child":"Root","color":""},{"parent":"IT-Tools","child":"IDE","color":""},
      {"parent":"IDE","child":"PyCharm","color":"#9c27b0"},
      {"parent":"Programming","child":"Trending","color":""},
      {"parent":"IT","child":"Programming","color":""},
      {"parent":"Trending","child":"Demand","color":""},
      {"parent":"Root","child":"IT","color":""},
      {"parent":"Old","child":"C\/C++\/C#","color":"#ff1744"},
      {"parent":"Programming","child":"Testing1","color":""},
      {"parent":"Root","child":"IT-Tools","color":""},{"parent":"IT-Tools","child":"Testing2","color":""},
      {"parent":"Demand","child":"Docker","color":"#7c4dff"},
      {"parent":"Demand","child":"Python","color":"#6200ea"},
      {"parent":"Testing2","child":"HP-UFT","color":"#aa00ff"}
    ];
    

    对于具有该 JSON 的 D3 示例 - 请查看已接受的答案 here。下面的改编只是您的不平衡层次结构输入的概念证明。在您的 OP 中转换为块超出了 Stack Overflow 答案的范围,但这应该会让您朝着正确的方向前进:

    const data = [
      {"parent":"Programming","child":"Old","color":""},
      {"parent":"Testing1","child":"Selenium","color":"#b388ff"},
      {"parent":"","child":"Root","color":""},{"parent":"IT-Tools","child":"IDE","color":""},
      {"parent":"IDE","child":"PyCharm","color":"#9c27b0"},
      {"parent":"Programming","child":"Trending","color":""},
      {"parent":"IT","child":"Programming","color":""},
      {"parent":"Trending","child":"Demand","color":""},
      {"parent":"Root","child":"IT","color":""},
      {"parent":"Old","child":"C\/C++\/C#","color":"#ff1744"},
      {"parent":"Programming","child":"Testing1","color":""},
      {"parent":"Root","child":"IT-Tools","color":""},{"parent":"IT-Tools","child":"Testing2","color":""},
      {"parent":"Demand","child":"Docker","color":"#7c4dff"},
      {"parent":"Demand","child":"Python","color":"#6200ea"},
      {"parent":"Testing2","child":"HP-UFT","color":"#aa00ff"}
    ];
    
    const root = d3.stratify()
      .id(d => d["child"])
      .parentId(d => d["parent"])
      (data);
    
    const margin = {left: 40, top: 40, right: 40, bottom: 40}
    const width = 500;
    const height = 200;
    const svg = d3.select("body")
      .append("svg")
      .attr("width", width)
      .attr("height", height);
          
    const g = svg.append("g")
      .attr('transform', `translate(${margin.left},${ margin.right})`);
    
    const tree = d3.tree()
      .size([height-margin.top-margin.bottom,width-margin.left-margin.right]);
    
    const link = g.selectAll(".link")
      .data(tree(root).links())
      .enter()
      .append("path")
      .attr("class", "link")
      .attr("d", d3.linkHorizontal()
        .x(function(d) { return d.y; })
        .y(function(d) { return d.x; })
      );
    
    const node = g.selectAll(".node")
      .data(root.descendants())
      .enter()
      .append("g")
      .attr("transform", function(d) { 
        return `translate(${d.y},${d.x})`; 
      });
    
    node.append("circle")
      .attr("r", 5)
      .style("fill", function(d) {
        return d.data.color ? d.data.color : "#000";
      });
          
    node.append("text")
      .attr("class", "label")
      .text(function(d) { return d.data.child; })
      .attr('y', -4)
      .attr('x', 0)
      .attr('text-anchor','middle');
    path {
      fill: none;
      stroke: steelblue;
      stroke-width: 1px;
    }
    
    .label {
      font-size: smaller;
    }
    <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/4.10.0/d3.min.js"></script>

    【讨论】:

    • 感谢您的详细解答。是的,IT 和 IT-Tools 有一个共同的父级。但是,根据我的要求,列数不固定,这在许多列中给出了“无”。
    • 是的,我正在假设您的输入有时(如果不是总是)会出现不平衡的层次结构,叶子深度不同。您还说列不固定-那么您还有未知最大深度的附加因素。在这种情况下,最好将层次结构描述为父/子列表(带有颜色等附加属性)并将其传递给 D3 并使用.stratify 函数来构建层次结构以进行可视化。 python 代码将接受与父列中的分号一样多的深度,并跳过None 问题
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-10-17
    • 1970-01-01
    • 1970-01-01
    • 2014-05-08
    • 1970-01-01
    • 1970-01-01
    • 2018-11-06
    相关资源
    最近更新 更多