本文重点介绍预料库的一般操作。
1. 使用nltk加载自己的预料库
1 >>> from nltk.corpus import PlaintextCorpusReader 2 >>> corpus_root=r'D:/00001/2002/Annual_txt' 3 >>> reader=PlaintextCorpusReader(corpus_root, '.*') 4 >>> reader.fileids() 5 ['2001 Business Highlights .txt', 'Back Cover.txt', 'Balance Sheet.txt', 'Cheung Kong Infrastructure Holdings Limited.txt', 'Consolidated Balance Sheet.txt', 'Consolidated Cash Flow Statement.txt', 'C 6 onsolidated Profit & Loss Account .txt', 'Consolidated Statement of Recognised Gains and Losses.txt', 'Contents.txt', 'Corporate Information.txt', 'Cover.txt', 'Development Projects.txt', "Directors' 7 Biographical Information.txt", 'Extracts from Hutchison Whampoa Limited Financial Statements.txt', 'Financial Highlights.txt', 'Group Financial Summary.txt', 'Group Structure.txt', 'Hongkong Electric 8 Holdings Limited.txt', 'Hutchison Whampoa Limited.txt', 'Management Discussion and Analysis.txt', 'Notes to Financial Statements.txt', 'Notice of Annual General Meeting.txt', 'Overseas Properties.txt' 9 , 'Rental Properties.txt', 'Report of the Auditors.txt', 'Report of the Chairman and the Managing Director.txt', 'Report of the Directors.txt', 'Schedule of Major Properties.txt'] 10 >>>