[training@localhost ~]$ hdfs dfs -cat cats.txt

The cat on the mat
The aardvark sat on the sofa
[training@localhost ~]$

mydata001=sc.textFile('cats.txt')

mydata002=mydata001.flatMap(lambda line: line.split(" "))

In [12]: mydata002.take(1)
Out[12]: [u'The']

In [13]: mydata002.take(2)
Out[13]: [u'The', u'cat']

mydata003=mydata002.map(lambda word : (word,1))

In [10]: mydata003.take(1)
Out[10]: [(u'The', 1)]

In [11]: mydata003.take(2)
Out[11]: [(u'The', 1), (u'cat', 1)]


mydata004 = mydata003.reduceByKey(lambda x,y : x+y)

In [15]: mydata004.take(1)
Out[15]: [(u'on', 2)]

In [16]: mydata004.take(2)
Out[16]: [(u'on', 2), (u'mat', 1)]

In [17]: mydata004.take(3)
Out[17]: [(u'on', 2), (u'mat', 1), (u'sofa', 1)]

 

相关文章:

  • 2021-07-30
  • 2021-09-28
  • 2021-09-16
  • 2022-03-10
  • 2022-12-23
  • 2022-12-23
  • 2022-03-04
  • 2021-08-15
猜你喜欢
  • 2021-04-19
  • 2021-06-06
  • 2021-11-24
  • 2022-01-09
  • 2022-12-23
  • 2021-04-02
  • 2021-07-31
相关资源
相似解决方案