从 python 输出中删除字符答案

【问题标题】：Removing Characters from python Output从 python 输出中删除字符
【发布时间】：2015-11-30 12:19:03
【问题描述】：

我做了很多工作来从 spark python 输出中删除字符，例如 u u' u" [()/'"，这给我做进一步的工作带来了问题。所以请把重点放在同样的地方。

我有这样的输入，

(u"(u'[25145,   12345678'", 0.0)
(u"(u'[25146,   25487963'", 43.0) when i applied code to summing out the result. this gives me the output like
(u'(u"(u\'[54879,    5125478\'"', 0.0)
(u"(u'[25145,   25145879'", 11.0)
(u'(u"(u\'[56897,    22548793\'"', 0.0) so i want to remove all the character like (u'(u"(u\'["'')

我想要像

这样的输出

54879,5125478,0.0

25145,25145879,11.0

我尝试的代码是

from pyspark import SparkContext
import os
import sys

sc = SparkContext("local", "aggregate")

file1 = sc.textFile("hdfs://localhost:9000/data/first/part-00000")
file2 = sc.textFile("hdfs://localhost:9000/data/second/part-00000")

file3 = file1.union(file2).coalesce(1).map(lambda line: line.split(','))

result = file3.map(lambda x: ((x[0]+', '+x[1],float(x[2][:-1])))).reduceByKey(lambda a,b:a+b).coalesce(1)

result.saveAsTextFile("hdfs://localhost:9000/Test1")

【问题讨论】：

你的代码的结果是什么？
此代码用于根据即将到来的关键输出聚合结果很好，但它包含一些我想要删除的 u u' u" [()/'" 字符。输出就像 (u '(u"(u\'[54879, 5125478\'"', 0.0) (u"(u'[25145, 25145879'", 11.0)。所以我想删除所有字符并希望输出像 54879,5125478 ,0.0 25145,25145879,11.0

标签： python apache-spark

【解决方案1】：

我认为您唯一的问题是您必须在将结果保存到文件之前重新格式化结果，例如：

result.map(lambda x:x[0]+','+str(x[1])).saveAsTextFile("hdfs://localhost:9000/Test1")

【讨论】：

谢谢标记，但它给了我错误，result.map(lambda x:x[0]+','+x[1]).saveAsTextFile("hdfs://localhost:9000 /Test1") TypeError: coercing to Unicode: need string or buffer, float found
这是因为x[1] 是一个浮点数：您需要将其转换为字符串。我已经相应地更新了答案
但是这些字符也在输入中吗？
yes Mark，当我应用 my它给出的输出代码如 (u'(u"(u\'[54879, 5125478\'"', 0.0) (u"(u'[25145, 25145879'", 11.0) m 想知道 / 从哪里来并且还想删除所有字符
import string s = "@John, 这是一个很棒的#week-end%, 你怎么样" for c in "!@#%&*()[]{}/?": s = string.replace(s, c, "") print s 如何在我的代码中使用它