Jupyter notebook 并排显示两个 pandas 表答案

【问题标题】：Jupyter notebook display two pandas tables side by sideJupyter notebook 并排显示两个 pandas 表
【发布时间】：2016-12-11 12:04:06
【问题描述】：

我有两个 pandas 数据框，我想在 Jupyter 笔记本中显示它们。

做类似的事情：

display(df1)
display(df2)

将它们显示在另一个下方：

我想在第一个数据框的右侧有第二个数据框。有a similar question，但看起来有人对将它们合并到一个数据框中显示它们之间的差异感到满意。

这对我不起作用。在我的情况下，数据框可以表示完全不同的（不可比较的元素），并且它们的大小可以不同。因此，我的主要目标是节省空间。

【问题讨论】：

我发布了 Jake Vanderplas 的解决方案。漂亮干净的代码。

标签： pandas ipython-notebook jupyter-notebook

【解决方案1】：

我最终编写了一个可以执行此操作的函数： [更新：根据建议添加标题（thnx @Antony_Hatchkins 等）]

from IPython.display import display_html
from itertools import chain,cycle
def display_side_by_side(*args,titles=cycle([''])):
    html_str=''
    for df,title in zip(args, chain(titles,cycle(['</br>'])) ):
        html_str+='<th style="text-align:center"><td style="vertical-align:top">'
        html_str+=f'<h2>{title}</h2>'
        html_str+=df.to_html().replace('table','table style="display:inline"')
        html_str+='</td></th>'
    display_html(html_str,raw=True)

示例用法：

df1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns=['A','B','C','D',])
df2 = pd.DataFrame(np.arange(16).reshape((4,4)),columns=['A','B','C','D',])
display_side_by_side(df1,df2,df1, titles=['Foo','Foo Bar']) #we left 3rd empty...

【讨论】：

这真的很棒，谢谢。您认为在每个输出上方添加数据框名称有多容易？
感谢您的回答，我已经added headers 以与您在上一条评论中描述的方式类似的方式。
惊人的答案。这也是我正在寻找的。我仍在学习如何解决它，所以我想知道：1）你为什么使用*args 而不仅仅是df？是因为您可以使用*args 进行多个输入吗？ 2）你的函数的哪一部分使第二个和后续的 df 添加到第一个的右侧而不是它的下方？是'table style="display:inline"' 部分吗？再次感谢
感谢您的出色解决方案！如果您想在显示数据框之前对其进行样式设置，输入将是Stylers，而不是DataFrames。在这种情况下，请使用html_str+=df.render() 而不是html_str+=df.to_html()。
@RichLysakowskiPhD 我不能说为什么，但是这种没有标题的变体在 JupyterLab 中有效（v3.1.11 已尝试）：newbedev.com/…

【解决方案2】：

您可以覆盖输出代码的 CSS。它默认使用flex-direction: column。请尝试将其更改为 row。这是一个例子：

import pandas as pd
import numpy as np
from IPython.display import display, HTML

CSS = """
.output {
    flex-direction: row;
}
"""

HTML('<style>{}</style>'.format(CSS))

当然，您可以根据需要进一步自定义 CSS。

如果您只想定位一个单元格的输出，请尝试使用:nth-child() 选择器。例如，此代码将修改笔记本中仅第 5 个单元格的输出的 CSS：

CSS = """
div.cell:nth-child(5) .output {
    flex-direction: row;
}
"""

【讨论】：

此解决方案影响所有单元格，我如何仅对一个单元格执行此操作？
@jrovegno 我更新了我的答案以包含您要求的信息。
@ntg 您需要确保HTML('<style>{}</style>'.format(CSS)) 行是单元格中的最后一行（并且不要忘记使用第n 个子选择器）。但是，这可能会导致格式出现问题，因此您的解决方案会更好。 (+1)
@zarak Thanx 的客气话 :) 在您的解决方案中，您可以使用 display(HTML(''.format(CSS))) 而不是 HTML( ''.format(CSS)) 。然后它可以在任何地方。我仍然遇到第 n 个单元格的问题（意思是，如果我复制粘贴，n 可能会改变）
HTML('<style>.output {flex-direction: row;}</style>') 为简单起见

【解决方案3】：

从pandas 0.17.1开始DataFrames的可视化可以直接用pandas styling methods修改

要并排显示两个 DataFrame，您必须使用 set_table_attributes 和 "style='display:inline'" 参数，如 ntg answer 中所建议的那样。这将返回两个 Styler 对象。要显示对齐的数据框，只需通过 IPython 的 display_html 方法传递它们连接的 HTML 表示。

使用这种方法也更容易添加其他样式选项。以下是根据here 的要求添加标题的方法：

import numpy as np
import pandas as pd   
from IPython.display import display_html 

df1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns=['A','B','C','D',])
df2 = pd.DataFrame(np.arange(16).reshape((4,4)),columns=['A','B','C','D',])

df1_styler = df1.style.set_table_attributes("style='display:inline'").set_caption('Caption table 1')
df2_styler = df2.style.set_table_attributes("style='display:inline'").set_caption('Caption table 2')

display_html(df1_styler._repr_html_()+df2_styler._repr_html_(), raw=True)

【讨论】：

没注意到，这看起来很不错，并且在更多情况下可能会有所帮助，例如添加颜色等（+1）
@gibbone 有没有办法指定表格之间的间距？

【解决方案4】：

结合 gibbone（设置样式和标题）和 stevi（添加空格）的方法，我制作了我的函数版本，它将 pandas 数据帧并排输出为表格：

from IPython.core.display import display, HTML

def display_side_by_side(dfs:list, captions:list):
    """Display tables side by side to save vertical space
    Input:
        dfs: list of pandas.DataFrame
        captions: list of table captions
    """
    output = ""
    combined = dict(zip(captions, dfs))
    for caption, df in combined.items():
        output += df.style.set_table_attributes("style='display:inline'").set_caption(caption)._repr_html_()
        output += "\xa0\xa0\xa0"
    display(HTML(output))

用法：

display_side_by_side([df1, df2, df3], ['caption1', 'caption2', 'caption3'])

输出：

【讨论】：

【解决方案5】：

我的解决方案只是在没有任何 CSS hack 的情况下用 HTML 构建一个表格并输出它：

import pandas as pd
from IPython.display import display,HTML

def multi_column_df_display(list_dfs, cols=3):
    html_table = "<table style='width:100%; border:0px'>{content}</table>"
    html_row = "<tr style='border:0px'>{content}</tr>"
    html_cell = "<td style='width:{width}%;vertical-align:top;border:0px'>{{content}}</td>"
    html_cell = html_cell.format(width=100/cols)

    cells = [ html_cell.format(content=df.to_html()) for df in list_dfs ]
    cells += (cols - (len(list_dfs)%cols)) * [html_cell.format(content="")] # pad
    rows = [ html_row.format(content="".join(cells[i:i+cols])) for i in range(0,len(cells),cols)]
    display(HTML(html_table.format(content="".join(rows))))

list_dfs = []
list_dfs.append( pd.DataFrame(2*[{"x":"hello"}]) )
list_dfs.append( pd.DataFrame(2*[{"x":"world"}]) )
multi_column_df_display(2*list_dfs)

【讨论】：

【解决方案6】：

这为@nts 的回答添加了（可选的）标题、索引和Series 支持：

from IPython.display import display_html

def mydisplay(dfs, names=[], index=False):
    def to_df(x):
        if isinstance(x, pd.Series):
            return pd.DataFrame(x)
        else:
            return x
    html_str = ''
    if names:
        html_str += ('<tr>' + 
                     ''.join(f'<td style="text-align:center">{name}</td>' for name in names) + 
                     '</tr>')
    html_str += ('<tr>' + 
                 ''.join(f'<td style="vertical-align:top"> {to_df(df).to_html(index=index)}</td>' 
                         for df in dfs) + 
                 '</tr>')
    html_str = f'<table>{html_str}</table>'
    html_str = html_str.replace('table','table style="display:inline"')
    display_html(html_str, raw=True)

【讨论】：

这看起来很有用，但给我一个问题。对于 mydisplay((df1,df2)) 仅给出 df.to_html(index=False) df.to_html(index=False) 而不是数据框内容。此外，在 f'string' 处还有额外的 '}' 符号。
有点不相关，但是否可以修改您的函数以隐藏单元格输出的代码？
@alpenmilch411 见“隐藏输入”扩展
知道如何在其中添加“max_rows”吗？
当使用多索引数据帧时，这也会丢失多索引。

【解决方案7】：

这是我前几天遇到的 Jake Vanderplas 的解决方案：

import numpy as np
import pandas as pd

class display(object):
    """Display HTML representation of multiple objects"""
    template = """<div style="float: left; padding: 10px;">
    <p style='font-family:"Courier New", Courier, monospace'>{0}</p>{1}
    </div>"""

    def __init__(self, *args):
        self.args = args

    def _repr_html_(self):
        return '\n'.join(self.template.format(a, eval(a)._repr_html_())
                     for a in self.args)

    def __repr__(self):
       return '\n\n'.join(a + '\n' + repr(eval(a))
                       for a in self.args)

信用：https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.08-Aggregation-and-Grouping.ipynb

【讨论】：

你能解释一下这个答案吗？ Jake VanderPlas 没有在他的网站上解释它。这是唯一在顶部打印数据集名称的解决方案。
你想知道什么？
可能是对所有函数的描述/它们是如何工作的，它们是如何被调用的等等......以便新手python程序员能够正确理解它。

【解决方案8】：

这是@Anton Golubev 引入的display_side_by_side() 函数的另一个变体，它结合了gibbone（设置样式和标题）和stevi（添加空格），我添加了一个额外的参数来在运行时更改表格之间的间距-时间。

from IPython.core.display import display, HTML

def display_side_by_side(dfs:list, captions:list, tablespacing=5):
    """Display tables side by side to save vertical space
    Input:
        dfs: list of pandas.DataFrame
        captions: list of table captions
    """
    output = ""
    for (caption, df) in zip(captions, dfs):
        output += df.style.set_table_attributes("style='display:inline'").set_caption(caption)._repr_html_()
        output += tablespacing * "\xa0"
    display(HTML(output))
    
display_side_by_side([df1, df2, df3], ['caption1', 'caption2', 'caption3'])

tablespacing=5 默认参数值（此处显示 = 5）确定表格之间的垂直间距。

【讨论】：

很方便，谢谢。

【解决方案9】：

Gibbone 的回答对我有用！如果您希望表格之间有额外的空间，请转到他提出的代码并将此 "\xa0\xa0\xa0" 添加到以下代码行。

display_html(df1_styler._repr_html_()+"\xa0\xa0\xa0"+df2_styler._repr_html_(), raw=True)

【讨论】：

【解决方案10】：

我决定在 Yasin 的优雅答案中添加一些额外的功能，其中可以选择列数和行；然后将任何额外的dfs添加到底部。此外，可以选择填充网格的顺序（只需根据需要将填充关键字更改为“cols”或“rows”）

import pandas as pd
from IPython.display import display,HTML

def grid_df_display(list_dfs, rows = 2, cols=3, fill = 'cols'):
    html_table = "<table style='width:100%; border:0px'>{content}</table>"
    html_row = "<tr style='border:0px'>{content}</tr>"
    html_cell = "<td style='width:{width}%;vertical-align:top;border:0px'>{{content}}</td>"
    html_cell = html_cell.format(width=100/cols)

    cells = [ html_cell.format(content=df.to_html()) for df in list_dfs[:rows*cols] ]
    cells += cols * [html_cell.format(content="")] # pad

    if fill == 'rows': #fill in rows first (first row: 0,1,2,... col-1)
        grid = [ html_row.format(content="".join(cells[i:i+cols])) for i in range(0,rows*cols,cols)]

    if fill == 'cols': #fill columns first (first column: 0,1,2,..., rows-1)
        grid = [ html_row.format(content="".join(cells[i:rows*cols:rows])) for i in range(0,rows)]

    display(HTML(html_table.format(content="".join(grid))))

    #add extra dfs to bottom
    [display(list_dfs[i]) for i in range(rows*cols,len(list_dfs))]

list_dfs = []
list_dfs.extend((pd.DataFrame(2*[{"x":"hello"}]), 
             pd.DataFrame(2*[{"x":"world"}]), 
             pd.DataFrame(2*[{"x":"gdbye"}])))

grid_df_display(3*list_dfs)

test output

【讨论】：

【解决方案11】：

@zarak 代码非常小，但会影响整个笔记本的布局。其他选项对我来说有点乱。

我在answer 中添加了一些清晰的 CSS，仅影响当前单元格输出。您还可以在数据框下方或上方添加任何内容。

from ipywidgets import widgets, Layout
from IPython import display
import pandas as pd
import numpy as np

# sample data
df1 = pd.DataFrame(np.random.randn(8, 3))
df2 = pd.DataFrame(np.random.randn(8, 3))

# create output widgets
widget1 = widgets.Output()
widget2 = widgets.Output()

# render in output widgets
with widget1:
    display.display(df1.style.set_caption('First dataframe'))
    df1.info()
with widget2:
    display.display(df2.style.set_caption('Second dataframe'))
    df1.info()


# add some CSS styles to distribute free space
box_layout = Layout(display='flex',
                    flex_flow='row',
                    justify_content='space-around',
                    width='auto'
                   )
    
# create Horisontal Box container
hbox = widgets.HBox([widget1, widget2], layout=box_layout)

# render hbox
hbox

【讨论】：

这很棒。我喜欢提供有关数据框的其他元数据的选项。

【解决方案12】：

我最终使用了 HBOX

import ipywidgets as ipyw

def get_html_table(target_df, title):
    df_style = target_df.style.set_table_attributes("style='border:2px solid;font-size:10px;margin:10px'").set_caption(title)
    return df_style._repr_html_()

df_2_html_table = get_html_table(df_2, 'Data from Google Sheet')
df_4_html_table = get_html_table(df_4, 'Data from Jira')
ipyw.HBox((ipyw.HTML(df_2_html_table),ipyw.HTML(df_4_html_table)))

【讨论】：

【解决方案13】：

安东尼回答的扩展如果您想将表格的可视化限制为逐行的一些块，请使用 maxTables 变量。

def mydisplay(dfs, names=[]):

    count = 0
    maxTables = 6

    if not names:
        names = [x for x in range(len(dfs))]

    html_str = ''
    html_th = ''
    html_td = ''

    for df, name in zip(dfs, names):
        if count <= (maxTables):
            html_th += (''.join(f'<th style="text-align:center">{name}</th>'))
            html_td += (''.join(f'<td style="vertical-align:top"> {df.to_html(index=False)}</td>'))
            count += 1
        else:
            html_str += f'<tr>{html_th}</tr><tr>{html_td}</tr>'
            html_th = f'<th style="text-align:center">{name}</th>'
            html_td = f'<td style="vertical-align:top"> {df.to_html(index=False)}</td>'
            count = 0


    if count != 0:
        html_str += f'<tr>{html_th}</tr><tr>{html_td}</tr>'


    html_str += f'<table>{html_str}</table>'
    html_str = html_str.replace('table','table style="display:inline"')
    display_html(html_str, raw=True)

【讨论】：

当应用于多索引数据帧时，这会丢失多索引