【问题标题】:Get data frame in shape of table in word document在word文档中获取表格形状的数据框
【发布时间】:2021-01-02 21:21:48
【问题描述】:

我正在阅读一个 excel 文件,提取一个特定的 df 并将其放入 word 文档中。我面临的问题是:

  1. DF 一旦添加到 para.变得完全没用。

完整的代码写在下面。

#importing required libraries
import pandas as pd
import numpy as np
eod = pd.read_excel('df.xlsx')
import datetime
import docx 
from datetime import date
legal = docx.Document('legal.docx')

#Calculating No. days from SCN
eod['SCN Days'] = (pd.Timestamp('now').floor('d') - eod['SCN Date']).dt.days

#Generation list of EFE for Final Showcause Notice to be issued today
FSCN_today = eod.where(eod['SCN Days']>20)
#Dropping Null from generated list
FSCN_today = FSCN_today.dropna(how ="all")
FSCN_today = FSCN_today[['Exporter Name','EFE','DESTINATION','VALUE']]

#Getting Unique Values in the list generated
s_values = FSCN_today['Exporter Name'].unique()

#Iterating through List
for c in s_values:
    df1 = FSCN_today[FSCN_today['Exporter Name'] == c]
    legal.paragraphs[7].text = c
    legal.paragraphs[8].text = df1.iloc[10:1]
    legal.paragraphs[15].text = str(df1)
    notice_name = str(c)+ ".docx"
    legal.save(notice_name)

#Update Date & Status of FSCN Issued today
eod['FSCN Date'] = np.where((eod['Status']=="SCN ISSUED") & (eod['SCN Days']>20),date.today(),eod['FSCN Date'])
eod['Status'] = np.where((eod['Status']=="SCN ISSUED") & (eod['SCN Days']>20),"FSCN ISSUED",eod['Status'])

#In progress
name = "EOD "+ str(date.today())+ ".xlsx"
#eod.to_excel(name,index =False)  

以下行有错误。

legal.paragraphs[15].text = str(df1)

【问题讨论】:

  • 是否可以共享您在代码中使用的示例文件。您可以将其替换为虚拟数据。主要原因是了解类型。还要检查您是否有可能没有 15 个段落。
  • 将错误与虚拟数据一起分享,问题需要更具描述性
  • 您可以在github.com/iqbalhusnain/Export-Overdue查看虚拟数据
  • 您的 GitHub 存储库中的代码没有出错。 df-table 在文档中(看起来不太好,但它在那里)?除了 S Mayer 指出的 paragraphs[8]-statement 之外,您的代码在这里也可以工作?

标签: python pandas numpy docx


【解决方案1】:

您可以通过创建一个表来完成这项工作,将数据框传输到该表(as explained in this post),然后将该表放在 legal.paragraphs[15] 所在的位置:

#importing required libraries
import pandas as pd
import numpy as np
eod = pd.read_excel('df.xlsx')
import datetime
import docx 
from datetime import date

#Calculating No. days from SCN
eod['SCN Days'] = (pd.Timestamp('now').floor('d') - eod['SCN Date']).dt.days

#Generation list of EFE for Final Showcause Notice to be issued today
FSCN_today = eod.where(eod['SCN Days']>20)
#Dropping Null from generated list
FSCN_today = FSCN_today.dropna(how ="all")
FSCN_today = FSCN_today[['Exporter Name','EFE','DESTINATION','VALUE']]

#Getting Unique Values in the list generated
s_values = FSCN_today['Exporter Name'].unique()

#Iterating through List
for c in s_values:
    legal = docx.Document('legal.docx')
    df1 = FSCN_today[FSCN_today['Exporter Name'] == c]
    legal.paragraphs[7].text = c
    legal.paragraphs[8].text = df1.iloc[10:1].iloc
    legal.paragraphs[15].text = ""
    t = legal.add_table(df1.shape[0]+1, df1.shape[1])
    for j in range(df1.shape[-1]):
        t.cell(0,j).text = df1.columns[j]
    for i in range(df1.shape[0]):
        for j in range(df1.shape[-1]):
            t.cell(i+1,j).text = str(df1.values[i,j])    
    legal.paragraphs[15]._p.addnext(t._tbl)
    notice_name = str(c)+ ".docx"
    legal.save(notice_name)

#Update Date & Status of FSCN Issued today
eod['FSCN Date'] = np.where((eod['Status']=="SCN ISSUED") & (eod['SCN Days']>20),date.today(),eod['FSCN Date'])
eod['Status'] = np.where((eod['Status']=="SCN ISSUED") & (eod['SCN Days']>20),"FSCN ISSUED",eod['Status'])

#In progress
name = "EOD "+ str(date.today())+ ".xlsx"
#eod.to_excel(name,index =False) 

(我将 legal = docx.Document('legal.docx') 移动到循环中,因为连续的 docx 保留了较旧的导出器值)

【讨论】:

    【解决方案2】:

    我从未与python-docx 合作过,所以我很确定我的尝试并不理想。以下确实适用于示例数据。

    本质上,我在文档中添加了一个表格,并将 DataFrame 的列标签和内容插入到表格中。有一些令人讨厌的部分是我无法解决的(我访问paragraphtable_ 属性的部分)。

    我替换了上面代码的以下部分

    #Iterating through List
    for c in s_values:
        df1 = FSCN_today[FSCN_today['Exporter Name'] == c]
        legal.paragraphs[7].text = c
        legal.paragraphs[8].text = df1.iloc[10:1]
        legal.paragraphs[15].text = str(df1)
        notice_name = str(c)+ ".docx"
        legal.save(notice_name)
    

    用这个(cmets 用于突出我所做的,换行以提高可读性):

    for c in s_values:
        df1 = FSCN_today[FSCN_today['Exporter Name'] == c]
        legal.paragraphs[7].text = c
        legal.paragraphs[8].text = df1[10:1].iloc # <- Changed
    
        # Add a table with the same amount of columns as the DataFrame
        table = legal.add_table(0, len(df1.columns))
        table.autofit = True
    
        # Create the header line (= column labels of the DataFrame)
        header = table.add_row()
        for col, cell in enumerate(header.cells):
            cell.text = str(df1.columns[col])
    
        # Insert the content of DataFrame in the table
        for ind in df1.index:
            row = table.add_row()
            for pos, col in enumerate(df1.columns):
                row.cells[pos].text = df1.loc[ind, col]
    
        # Add a break in paragraph 15 (before the table)
        legal.paragraphs[15].add_run().add_break()
    
        # Add the table to paragraph 15
        legal.paragraphs[15]._p.addnext(table._tbl)
    
        notice_name = str(c)+ ".docx"
        legal.save(notice_name)
    
        # Remove the table
        table._element.getparent().remove(table._element)
    
    

    【讨论】:

      【解决方案3】:

      我注意到legal.paragraphs[8].text = df1.iloc[10:1] 看起来很奇怪。

      如果您将其更改为 legal.paragraphs[8].text = df1[10:1].iloc,则生成的 .docx 文件对我来说看起来更合理。

      我不知道您想要的输出是什么,所以这是我对所呈现内容的最佳猜测。

      【讨论】:

      • 第8段已经不错了。问题出在第 15 段。我想在那里将 df 显示为表格。我不知道该怎么做。
      猜你喜欢
      • 2013-11-10
      • 2021-05-26
      • 2019-11-21
      • 1970-01-01
      • 2011-04-03
      • 1970-01-01
      • 2013-05-29
      • 1970-01-01
      • 2018-07-01
      相关资源
      最近更新 更多