【问题标题】:Convert a table with subset to a Json format将带有子集的表转换为 Json 格式
【发布时间】:2026-01-19 17:00:01
【问题描述】:

我正在尝试处理上市公司的财务数据。我已经下载了数据,现在我正在尝试将其转换为JSON 格式。

表中有小节,用 4 ~'s 表示 1 个缩进,8 表示 2 个缩进如下:

  • 一个缩进代表向下一级
  • 双缩进表示向下 2 级

例如,Cost of Goods Sold (COGS) incl. D&A 是节标题,COGS Growth 应该被捕获为 Cost of Goods Sold (COGS) incl. D&A 的子元素。

您能否帮我确定如何将此数据帧转换为JSON 文件的方法?

代表数据框的表格

|                                       Item  Item|      2016|     2017 |    2018 |    2019   |     2020 |  5-year trend|
|                                     :---------: |    :----:|   :----: |  :----: |  :----:   |   :----: |:------------:|
| Sales/Revenue                                   |-         |-         |-        | -         |615.82K   | NaN          |
| ~~~~Sales Growth                                |-         |-         |-        | -         |-         | NaN          |
| Cost of Goods Sold (COGS) incl. D&A             |684       |5.44K     |3.14K    | 32.5K     |-         | NaN          |
| ~~~~COGS Growth                                 |-         |694.59%   |-42.19%  | 934.31%   |-         | NaN          |
| ~~~~COGS excluding D&A                          |-         |-         |-        | -         |-         | NaN          |
| ~~~~Depreciation & Amortization Expense         |684       |5.44K     |3.14K    | 32.5K     |41.83K    | NaN          |
| ~~~~~~~~Depreciation                            |684       |5.44K     |3.14K    | 32.5K     |41.83K    | NaN          |
| ~~~~~~~~Amortization of Intangibles             |-         |-         |-        | -         |-         | NaN          |
| Gross Income                                    |(684)     |(5.44K)   |(3.14K)  | (32.5K)   |-         | NaN          |
| ~~~~Gross Income Growth                         |-         |-694.59%  |42.19%   | -934.31%  |-         | NaN          |
| ~~~~Gross Profit Margin                         |-         |-         |-        | -         |-         | NaN          |
| SG&A Expense                                    |1.91M     |4.79M     |5.88M    | 9.5M      |9.63M     | NaN          |
| ~~~~SGA Growth                                  |-         |151.12%   |22.61%   | 61.51%    |1.37%     | NaN          |
| ~~~~Research & Development                      |-         |-         |-        | -         |-         | NaN          |
| ~~~~Other SG&A                                  |1.91M     |4.79M     |5.88M    | 9.5M      |9.63M     | NaN          |
| ~~~~Other Operating Expense                     |-         |-         |-        | -         |-         | NaN          |
| Unusual Expense                                 |-         |-         |-        | -         |-         | NaN          |
| EBIT after Unusual Expense                      |-         |-         |-        | -         |-         | NaN          |
| Non Operating Income/Expense                    |-         |-         |(52.76K) | 60.09K    |(2.2K)    | NaN          |
| Non-Operating Interest Income                   |8.9K      |170.93K   |59.8K    | 50.79K    |19.15K    | NaN          |
| Equity in Affiliates (Pretax)                   |-         |-         |-        | -         |-         | NaN          |
| Interest Expense                                |-         |-         |-        | -         |115.55K   | NaN          |
| ~~~~Interest Expense Growth                     |-         |-         |-        | -         |-         | NaN          |
| ~~~~Gross Interest Expense                      |-         |-         |-        | -         |115.55K   | NaN          |
| ~~~~Interest Capitalized                        |-         |-         |-        | -         |-         | NaN          |

按小节组织的表格

Item Item Subsection1 Subsection2 2016 2017 2018 2019 2020 5-year trend
Sales/Revenue - - - - 615.82K NaN
Sales Growth - - - - - NaN
Cost of Goods Sold (COGS) incl. D&A 684 5.44K 3.14K 32.5K - NaN
COGS Growth - 694.59% -42.19% 934.31% - NaN
COGS excluding D&A - - - - - NaN
Depreciation & Amortization Expense 684 5.44K 3.14K 32.5K 41.83K NaN
Depreciation 684 5.44K 3.14K 32.5K 41.83K NaN
Amortization of Intangibles - - - - - NaN
Gross Income (684) (5.44K) (3.14K) (32.5K) - NaN
Gross Income Growth - -694.59% 42.19% -934.31% - NaN
Gross Profit Mar - - - - - NaN
SG&A Expense 1.91M 4.79M 5.88M 9.5M 9.63M NaN
SGA Growth - 151.12% 22.61% 61.51% 1.37% NaN
Research & Development - - - - - NaN
Other SG&A 1.91M 4.79M 5.88M 9.5M 9.63M NaN
Other Operating Expense - - - - - NaN
Unusual Expense - - - - - NaN
EBIT after Unusual Expense - - - - - NaN
Non Operating Income/Expense - - (52.76K) 60.09K (2.2K) NaN
Non-Operating Interest Income 8.9K 170.93K 59.8K 50.79K 19.15K NaN
Equity in Affiliates (Pretax) - - - - - NaN
Interest Expense - - - - 115.55K NaN
Interest Expense Growth - - - - - NaN
Gross Interest Expense - - - - 115.55K NaN
Interest Capitalized - - - - - NaN

【问题讨论】:

  • 代表数据框的表格是什么意思?那个文件到底是什么?
  • 它描述了熊猫数据框中保存的数据。

标签: python json python-3.x pandas formatting


【解决方案1】:

我可以通过向缺失的单元格添加值然后在 3 列上进行分组来解决此问题,代码如下所示。这是我用来构建这段代码的reference

d = (dframe.fillna("-").groupby(['Item  Item','ItemSubsection1','ItemSubsection2'])['2016','2017','2018','2019','2020']
       .apply(lambda x: x.to_dict('r'))
       .reset_index(name='data')
       .groupby(['Item  Item','ItemSubsection1'])['ItemSubsection2','data']
       .apply(lambda x: x.to_dict('r'))
       .reset_index(name='data')
       .groupby('Item  Item')['ItemSubsection1','data']
       .apply(lambda x: x.set_index('ItemSubsection1', 'ItemSubsection2')['data'].to_dict())
       .to_json()
       )

【讨论】: