【问题标题】:How to summarize an array with group and rollup from d3-array?如何用 d3-array 中的 group 和 rollup 总结一个数组?
【发布时间】:2021-04-20 08:48:06
【问题描述】:

我正在尝试使用d3-array 生成对象数组的两个摘要:

  • 每位老师都做了哪些动作?
  • 每位教师编辑了哪些帖子?

这是我目前的做法:

const data = [
  { post_id: 47469, action: "reply", teacher_username: "John" },
  { post_id: 47469, action: "edit", teacher_username: "John" },
  { post_id: 47468, action: "reply", teacher_username: "John" },
  { post_id: 47465, action: "reply", teacher_username: "Mary" },
  { post_id: 47465, action: "edit", teacher_username: "Mary" },
  { post_id: 47467, action: "edit", teacher_username: "Mary" },
  { post_id: 46638, action: "reply", teacher_username: "Paul" },
];

const teacherSummary = [
  ...d3.rollup(
    data,
    (x) => x.length,
    (d) => d.teacher_username,
    (d) => d.action
  ),
]
  .map((x) => {
    return {
      teacher_username: x[0],
      num_edits: x[1].get("edit") || 0,
      num_replies: x[1].get("reply") || 0,
    };
  })
  .sort((a, b) => d3.descending(a.num_edits, b.num_edits));
// [
//   { "teacher_username": "Mary", "num_edits": 2, "num_replies": 1 },
//   { "teacher_username": "John", "num_edits": 1, "num_replies": 2 },
//   { "teacher_username": "Paul", "num_edits": 0, "num_replies": 1 }
// ]

const postIdsByTeacher = d3.rollups(
  data.filter((x) => x.action === "edit"),
  (v) => [...new Set(v.map((d) => d.post_id))].join(", "), // Set() is used to get rid of duplicate post_ids
  (d) => d.teacher_username
);
// [
//  ["John","47469"],
//  ["Mary","47465, 47467"]
// ]

我对输出格式很灵活。我想优化的是效率和清晰度:

  • 我可以在一个rollup 电话中获得两个摘要吗?也许通过将edited_post_ids 添加到teacherSummary
  • 似乎应该有一种更优雅的方法来替换 [...Map/Set] 调用

编辑:出于好奇,我也使用alasql 尝试了这种方法。除了edited_post_ids 中的空值外,它几乎可以工作。

sql = alasql(`
select
  teacher_username,
  count(case when action = 'reply' then 1 end) num_replies,
  count(case when action = 'edit' then 1 end) num_edits,
  array(case when action = 'edit' then post_id end) as edited_post_ids
from ?
group by teacher_username
`, [data])
// [ 
//   { teacher_username: "John", num_replies: 2, num_edits: 1, edited_post_ids: [null, 47469, null], }, 
//   { teacher_username: "Mary", num_replies: 1, num_edits: 2, edited_post_ids: [null, 47465, 47467], }, 
//   { teacher_username: "Paul", num_replies: 1, num_edits: 0, edited_post_ids: [null], },
// ];

【问题讨论】:

    标签: javascript arrays d3.js alasql


    【解决方案1】:

    我最终简化了@Robin Mackenzie 的最后一个建议:

    const uniq = require('lodash.uniq');
    const teacherSummary = d3
      .groups(data, (d) => d.teacher_username)
      .map(([teacher_username, actions]) => {
        const edits = actions.filter((x) => x.action == "edit").map((x) => x.post_id);
        const replies = actions.filter((x) => x.action == "reply").map((x) => x.post_id);
        return {
          teacher_username,
          num_edits: edits.length,
          num_replies: replies.length,
          edited_post_ids: uniq(edits),
          replied_post_ids: uniq(replies),
        };
      })
    

    【讨论】:

      【解决方案2】:

      d3.rollup 的函数签名是:

      d3.rollup(iterable, reduce, ...keys)

      从表面上看,您可以在 reduce 中提供一项操作,例如计数或求和或其他一些操作 - 但只有一个。

      对于您的输出,您正在寻找两种不同的操作

      • 统计回复和编辑,以及
      • 获取post_ids 的数组操作,其中action == "edit"

      一旦您选择使用(x) => x.length,您就已经失去了使用不同reduce 操作的机会。如果您有多个操作,可以说d3.rollup 不是您需要的功能吗?

      您仍然可以将edited_post_ids 添加到teacherSummary,只需返回原始数据并应用filter 然后map

      const data = [
        { post_id: 47469, action: "reply", teacher_username: "John" },
        { post_id: 47469, action: "edit", teacher_username: "John" },
        { post_id: 47468, action: "reply", teacher_username: "John" },
        { post_id: 47465, action: "reply", teacher_username: "Mary" },
        { post_id: 47465, action: "edit", teacher_username: "Mary" },
        { post_id: 47467, action: "edit", teacher_username: "Mary" },
        { post_id: 46638, action: "reply", teacher_username: "Paul" },
      ];
      
      const teacherSummary = [...d3.rollup(
        data,
        v => v.length,
        d => d.teacher_username,
        d => d.action
      )].map(d => {
        return {
          teacher_username: d[0],
          num_edits: d[1].get("edit") || 0,
          num_replies: d[1].get("reply") || 0,
          edited_post_ids: data
            .filter(x => x.action === "edit" & x.teacher_username == d[0])
            .map(x => x.post_id)
        }
      });
        
      console.log(teacherSummary);
      <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>

      另一种方法是不使用d3.rollup/d3.rollups,而是使用d3.groupsrollupgroupsource 都是对 nest 的调用。你失去了rollup 为你做的计数,必须自己实现它。这个例子读起来有点像 SQL 例子:

      const data = [
        { post_id: 47469, action: "reply", teacher_username: "John" },
        { post_id: 47469, action: "edit", teacher_username: "John" },
        { post_id: 47468, action: "reply", teacher_username: "John" },
        { post_id: 47465, action: "reply", teacher_username: "Mary" },
        { post_id: 47465, action: "edit", teacher_username: "Mary" },
        { post_id: 47467, action: "edit", teacher_username: "Mary" },
        { post_id: 46638, action: "reply", teacher_username: "Paul" },
      ];
      
      // compare with
      // select
      //   teacher_username,
      //   count(case when action = 'reply' then 1 end) num_replies,
      //   count(case when action = 'edit' then 1 end) num_edits,
      //   array(case when action = 'edit' then post_id end) as 
      // edited_post_ids
      // from ?
      // group by teacher_username
      
      const teacherSummary = d3.groups(data, d => d.teacher_username)
        .map(k => {
          return {
            teacher_username: k[0],
            num_edits: k[1].filter(k2 => k2.action == "edit").length,
            num_replies: k[1].filter(k2 => k2.action == "reply").length,
            edited_post_ids: k[1].filter(k2 => k2.action == "edit").map(k3 => k3.post_id)
          }
        });
        
      console.log(teacherSummary);
      <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>

      作为旁注,您可以将postIdsByTeacher 归结为以下内容,并避免new Set(etc) 输入内容:

      const data = [
        { post_id: 47469, action: "reply", teacher_username: "John" },
        { post_id: 47469, action: "edit", teacher_username: "John" },
        { post_id: 47468, action: "reply", teacher_username: "John" },
        { post_id: 47465, action: "reply", teacher_username: "Mary" },
        { post_id: 47465, action: "edit", teacher_username: "Mary" },
        { post_id: 47467, action: "edit", teacher_username: "Mary" },
        { post_id: 46638, action: "reply", teacher_username: "Paul" },
      ];
      
      const postIdsByTeacher = d3.rollups(
        data.filter(d => d.action === "edit"),
        v => [].concat(v.map(k => k.post_id)),
        d => d.teacher_username
      );
      
      console.log(postIdsByTeacher);
      <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>

      但我的直觉是使用d3.rollup 的价值将是当您想要进行标准求和和计数之类的东西时。

      【讨论】:

      • 感谢您的出色回答。这正是我一直在寻找的。我忘了提到需要new Set() 的东西来避免重复的post_ids。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-06-11
      • 2023-03-20
      • 2020-09-25
      • 2021-12-28
      • 2021-03-02
      • 1970-01-01
      相关资源
      最近更新 更多