【发布时间】:2016-09-22 21:01:28
【问题描述】:
我想创建一个由以 JSON 格式存储的不同文章的正文组成的语料库。它们位于以年份命名的不同文件中,例如:
with open('Scot_2005.json') as f:
data = [json.loads(line) for line in f]
对应于 2005 年的《苏格兰人报》。此外,该报的其余文件名为:APJ_2006....APJ2015。还。我还有另一份报纸,苏格兰每日邮报,它仅从 2014 年到 1015 年发行:SDM_2014, SDM_2015。我想创建一个包含所有这些文章正文的通用列表:
doc_set = [d['body'] for d in data]
我的问题是循环我发布的代码的第一部分,以便数据对应于所有文章,而不仅仅是给定年份的给定报纸的文章。关于如何完成这项任务的任何想法?在我的尝试中,我尝试使用 Pandas:
for i in range(2005,2016):
df = pandas.DataFrame([json.loads(l) for l in open('Scot_%d.json' % i)])
doc_set = df.body
在我看来,这种方法的问题是:它不会追加所有年份;我不知道如何包括 2005-15 年以外时间间隔的其他报纸。此方法的结果如下所示:
date
2015-12-31 The Institute of Directors (IoD) has added its...
2015-12-31 It is startling to see how much the Holyrood l...
2015-12-31 A hike in interest rates in the new year will ...
2015-12-31 The First Minister has resolved to make 2016 a...
2015-12-30 The Scottish Government announced yesterday th...
2015-12-30 The Footsie closed lower amid falling oil pric...
2015-12-28 BEFORE we start the guessing game for 2016, a ...
2015-12-27 AS WE ushered in 2015, few would have predicte...
2015-12-23 No matter how hard Derek McInnes and his Aberd...
2015-12-21 THE HEAD of a Scottish Government task force s...
2015-12-17 A Scottish local authority has fought off a le...
2015-12-17 Markets lifted after the Federal Reserve hiked...
2015-12-17 Significant increases in UK quotas for fish in...
2015-12-17 WAR of words with Donald Trump suggests its ti...
2015-12-16 SCOTLAND'S national performance companies have...
2015-12-15 Markets jumped ahead of what investors expect ...
2015-12-14 Political uncertainty in back seat as transpor...
2015-12-11 The International Monetary Fund (IMF) has warn...
2015-12-08 Scotland has a "spring in its step" with the j...
2015-12-07 London's leading share index struggled for dir...
2015-12-03 REDUCING carbon is just the start of it, write...
2015-11-26 One of the country's most prized salmon rivers...
2015-11-23 Tax and legislative changes undermine strong f...
2015-11-23 A second House of Lords committee has called f...
2015-11-14 At first glance, Scotland's economic performan...
2015-11-13 THE United States has long been viewed as the ...
2015-11-12 IT IS vital for a new governance group to rest...
2015-11-12 Former SSE chief Ian Marchant has criticised r...
2015-11-11 Telecoms firm TalkTalk said it will take a hit...
2015-11-09 Improvements to consumer rights legislation ma...
...
2015-02-25 Traders baulked at an assault on the 7,000 lev...
2015-02-24 BRITISH military personnel are to be deployed ...
2015-02-20 DAVID Cameron has announced a £859 million inv...
2015-02-16 Falling oil prices and slowing inflation have ...
2015-02-14 DEFENCE spending cuts and falling oil prices h...
2015-02-14 Brent crude rallied to a 2015 high and helped ...
2015-02-12 THE HOUSING markets in Scotland and Northern I...
2015-02-10 INVESTMENT in Scotland's commercial property m...
2015-02-09 Investors took flight after Greece's new gover...
2015-02-01 Experts say large numbers are delaying decisio...
2015-01-29 MORE than 300 jobs are at risk after Tesco sai...
2015-01-27 THE Three Bears have hit out at the Rangers bo...
2015-01-21 GEORGE Osborne has challenged the right of SNP...
2015-01-19 Employment figures this week should show Briti...
2015-01-19 Why haven't petrol pump prices fallen as fast ...
2015-01-18 Without an agreement on immediate action, the...
2015-01-17 A SECOND independence referendum could be trig...
2015-01-14 THE RETAILER, which like its rivals has come u...
2015-01-14 HOUSE prices in Scotland rose by more than 4 p...
2015-01-13 HOUSE builder Taylor Wimpey is preparing for a...
2015-01-13 Supermarket group Sainsbury's today said it wo...
2015-01-13 INFLATION has tumbled to its lowest level on r...
2015-01-12 BUSINESSES are bullish about their prospects ...
2015-01-11 FOR decades, oil has dripped through our natio...
2015-01-09 Shares in the housebuilding sector fell heavil...
2015-01-08 THE Bank of England is expected to leave inter...
2015-01-05 COMPANIES in Scotland are more optimistic abou...
2015-01-04 UK is doing OK, but uncertainty looms on mid-y...
2015-01-02 The London market began the new year in a subd...
2015-01-02 The famous election mantra of Bill Clinton's c...
Name: body, dtype: object
【问题讨论】:
-
那么你的尝试的minimal reproducible example在哪里,它有什么问题?
-
我没有看到任何试图遍历报纸名称或年份的尝试。可以试试吗?
-
@jonrshape,我刚刚更新了问题,正如你所见,使用 Pandas 我无法生成列表
-
不,你会得到一个
DataFrame,这正是你要求的。有什么问题?! -
然后我要制作一个列表?问题在于整合另一份报纸,例如 2014-15 年的苏格兰每日邮报。