【问题标题】:D3.js - What is the best way to filter a d3.csv object to only include the top n (count) results?D3.js - 过滤 d3.csv 对象以仅包含前 n 个(计数)结果的最佳方法是什么?
【发布时间】:2021-10-06 23:32:33
【问题描述】:

我不确定我的问题表述得非常好,这也许可以解释为什么我无法找到一个好的答案,但我真的很感激一些建议!我是 d3 的新手(也是 JavaScript 初学者),正在条形图上绘制一些数据。这是我的 CSV 文件的几行:

Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
2012,London,Gymnastics,Trampoline,"HUANG, Shanshan",CHN,Women,Individual,Silver
2012,London,Gymnastics,Trampoline,"HE, Wenna",CHN,Women,Individual,Bronze
2012,London,Handball,Handball,"ABALO, Luc",FRA,Men,Handball,Gold
2012,London,Handball,Handball,"ACCAMBRAY, William",FRA,Men,Handball,Gold
2012,London,Handball,Handball,"BARACHET, Xavier",FRA,Men,Handball,Gold

我计划沿 x 轴绘制国家,但显示所有国家太宽了。按年过滤后,我还有 87 个不同的国家/地区

const startData = await d3.csv(‘olympics.csv’);
                console.log(startData);
                data = startData.filter(function (d) { return d.Year == year });
                console.log(data);
                console.log(d3.map(data, function (d) { return (d.Country) }).keys());  //Here I have 86 Countries remaining 

我想筛选出排名前 15 位的国家/地区。

我能想到的唯一方法是像这样为每个计数:

var countries = (d3.map(data, function (d) { return (d.Country) }).keys()
countries.forEach(function(country) {
    console.log(data.filter(function (d) { return d.Country == country }).length);
}

...保存该计数而不是 console.log,对结果进行排序并以某种方式使用前 15 个。

这似乎草率,我希望有人有建议并愿意分享。

提前感谢您,如果不清楚,我很乐意尝试进一步解释。

【问题讨论】:

  • 您需要使用特定版本的 D3 吗?
  • 到目前为止,我一直在使用 v5,不想影响我已有的任何东西。除此之外,我并不局限于特定版本。
  • 如果您经常这样做,请尝试查看 crossfilter 的 open fork,它专为此类任务而设计 - github.com/crossfilter/crossfilter

标签: javascript csv d3.js


【解决方案1】:

对于 d3 v5,您可以考虑 nest 函数,它可以 grouprollup 您的数据。在您的情况下,您正在寻找每个国家/地区的行数(长度):

首先按年份过滤数据,类似于您已经执行的操作:

// use Year as filter and set N for top countries
const yearFilter = "2012";
const n = 2;

// filter data
const filtered = data.filter(function(d) { return d.Year === yearFilter; });

然后进行分组和汇总:

// group and roll-up
const topN = d3.nest()
  .key(function(d) { return d.Country; })    // groups countries into arrays
  .rollup(function(d) { return d.length; })  // sums lengths of array (i.e. medal count)
  .entries(filtered)                         // passing filtered data to the nest

这会导致:

[
  {"key": "CHN", "value": 2}, 
  {"key": "FRA", "value": 3 }, 
  {"key": "AUS", "value": 1}
]

但你可以继续连锁操作:使用标准数组sort将奖牌多的国家放在首位; slice 关闭前 N 并返回国家名称(map):

// group, roll-up, sort, slice and map !
const topN = d3.nest()
  .key(function(d) { return d.Country; })                           // groups countries into arrays
  .rollup(function(d) { return d.length; })                         // sums lengths of array (i.e. medal count)
  .entries(filtered)                                                // passing filtered data to nest
  .sort(function(a, b) { return d3.descending(a.value, b.value); }) // then use normal JS sort 
  .slice(0, n)                                                      // and take top N countries
  .map(function(d) { return d.key; });                              // and just return the country name

这是一个工作示例,我为澳大利亚添加了 1 枚奖牌,因此我们需要从排名中删除一些内容:

// adding an single entry for Australia
// i.e. France = 3, China = 2, Australia = 1
const csv = `Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
2012,London,Gymnastics,Trampoline,"HUANG, Shanshan",CHN,Women,Individual,Silver
2012,London,Gymnastics,Trampoline,"HE, Wenna",CHN,Women,Individual,Bronze
2012,London,Handball,Handball,"ABALO, Luc",FRA,Men,Handball,Gold
2012,London,Handball,Handball,"ACCAMBRAY, William",FRA,Men,Handball,Gold
2012,London,Handball,Handball,"BARACHET, Xavier",FRA,Men,Handball,Gold
2012,London,Swimming,Swimming,"FAST, Freddie",AUS,Men,Swimming,Gold`;

// parse csv
const data = d3.csvParse(csv);

// use Year as filter and set N for top countries
const yearFilter = "2012";
const n = 2;

// filter data
const filtered = data.filter(function(d) { return d.Year === yearFilter; });

// group, roll-up, sort, slice and map !
const topN = d3.nest()
  .key(function(d) { return d.Country; }) // groups countries into arrays
  .rollup(function(d) { return d.length; })  // sums lengths of array (i.e. medal count)
  .entries(filtered) // passing filtered data to nest
  .sort(function(a, b) { return d3.descending(a.value, b.value); }) // then use normal JS sort 
  .slice(0, n) // and take top N countries
  .map(function(d) { return d.key; }); // and just return the country name
  
console.log(topN);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.7.0/d3.min.js"></script>

在 v6 及更高版本中,nest 函数已被弃用,并在此用例中被 rollups 替换 - 它的作用与 nest 非常相似。这是一个例子

// adding an single entry for Australia
// i.e. France = 3, China = 2, Australia = 1
const csv = `Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
2012,London,Gymnastics,Trampoline,"HUANG, Shanshan",CHN,Women,Individual,Silver
2012,London,Gymnastics,Trampoline,"HE, Wenna",CHN,Women,Individual,Bronze
2012,London,Handball,Handball,"ABALO, Luc",FRA,Men,Handball,Gold
2012,London,Handball,Handball,"ACCAMBRAY, William",FRA,Men,Handball,Gold
2012,London,Handball,Handball,"BARACHET, Xavier",FRA,Men,Handball,Gold
2012,London,Swimming,Swimming,"FAST, Freddie",AUS,Men,Swimming,Gold`;

// parse csv
const data = d3.csvParse(csv);

// use Year as filter and set N for top countries
const yearFilter = "2012";
const n = 2;

// filter data
const filtered = data.filter(d => d.Year === yearFilter);

// group, roll-up, sort, slice and map !
const topN = d3.rollups(
  filtered,
  v => v.length,
  d => d.Country
)
.sort((a, b) => d3.descending(a[1], b[1]))  
.slice(0, n) 
.map(d => d[0]);
 
console.log(topN);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/7.0.0/d3.min.js"></script>

【讨论】:

    猜你喜欢
    • 2020-03-02
    • 1970-01-01
    • 1970-01-01
    • 2019-05-21
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多