本文重点介绍预料库的一般操作。

    1.  使用nltk加载自己的预料库

 1 >>> from nltk.corpus import PlaintextCorpusReader
 2 >>> corpus_root=r'D:/00001/2002/Annual_txt'
 3 >>> reader=PlaintextCorpusReader(corpus_root, '.*')
 4 >>> reader.fileids()
 5 ['2001 Business Highlights .txt', 'Back Cover.txt', 'Balance Sheet.txt', 'Cheung Kong Infrastructure Holdings Limited.txt', 'Consolidated Balance Sheet.txt', 'Consolidated Cash Flow Statement.txt', 'C
 6 onsolidated Profit & Loss Account .txt', 'Consolidated Statement of Recognised Gains and Losses.txt', 'Contents.txt', 'Corporate Information.txt', 'Cover.txt', 'Development Projects.txt', "Directors'
 7 Biographical Information.txt", 'Extracts from Hutchison Whampoa Limited Financial Statements.txt', 'Financial Highlights.txt', 'Group Financial Summary.txt', 'Group Structure.txt', 'Hongkong Electric
 8 Holdings Limited.txt', 'Hutchison Whampoa Limited.txt', 'Management Discussion and Analysis.txt', 'Notes to Financial Statements.txt', 'Notice of Annual General Meeting.txt', 'Overseas Properties.txt'
 9 , 'Rental Properties.txt', 'Report of the Auditors.txt', 'Report of the Chairman and the Managing Director.txt', 'Report of the Directors.txt', 'Schedule of Major Properties.txt']
10 >>>
View Code

相关文章:

  • 2022-12-23
  • 2022-12-23
  • 2021-12-25
  • 2021-07-15
  • 2022-12-23
  • 2021-12-12
猜你喜欢
  • 2021-07-27
  • 2021-10-14
  • 2022-01-18
  • 2022-02-02
  • 2021-12-04
  • 2022-12-23
  • 2021-11-08
相关资源
相似解决方案