【发布时间】:2015-08-24 07:16:16
【问题描述】:
我必须映射一个表格,其中记录了应用程序的使用历史。该表有这些元组:
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
AppId总是不同的,因为很多app都引用了date这样的格式来表示dd/mm/yyyy hh/mmcpuUsage和memoryUsage是用%来表示所以例如:
<3ghffh3t482age20304,230720142245,0.2,3,5>
我是这样从cassandra中检索数据的(小sn-p):
public static void main(String[] args) {
Cluster cluster;
Session session;
cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
session = cluster.connect();
session.execute("CREATE KEYSPACE IF NOT EXISTS foo WITH replication "
+ "= {'class':'SimpleStrategy', 'replication_factor':3};");
String createTableAppUsage = "CREATE TABLE IF NOT EXISTS foo.appusage"
+ "(appid text,date text, cpuusage double, memoryusage double, "
+ "PRIMARY KEY(appid,date) " + "WITH CLUSTERING ORDER BY (time ASC);";
session.execute(createTableAppUsage);
// Use select to get the appusage's table rows
ResultSet resultForAppUsage = session.execute("SELECT appid,cpuusage FROM foo.appusage");
for (Row row: resultForAppUsage)
System.out.println("appid :" + row.getString("appid") +" "+ "cpuusage"+row.getString("cpuusage"));
// Clean up the connection by closing it
cluster.close();
}
所以,我现在的问题是通过key value 映射数据并创建一个集成此代码的元组(sn-p 不起作用):
<AppId,cpuusage>
JavaPairRDD<String, Integer> saveTupleKeyValue =someStructureFromTakeData.mapToPair(new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String x) {
return new Tuple2(x, y);
}
如何使用 RDD 和 reduce eg. cpuusage >50 映射 appId 和 cpuusage?
有什么帮助吗?
提前致谢。
【问题讨论】:
-
不确定我是否理解这个问题。您想用等效的 spark - cassandra 连接 API 表达式替换 `session.execute("SELECT appid,cpuusage FROM foo.appusage");`?
-
@maasg 嗨,我的问题是,在从 cassandra 检索数据后,如上面的代码所示,我想创建一个数据集 RDD 来映射
并对 reduce 进行操作在这个.. 例如。减少cpu的利用率> 50 ..等等。我该怎么做?
标签: java mapreduce apache-spark rdd