Java Hadoop MapReduce 多值答案

【问题标题】：Java Hadoop MapReduce Multiple ValueJava Hadoop MapReduce 多值
【发布时间】：2025-11-24 01:00:01
【问题描述】：

我正在尝试做一个电影推荐系统，并且一直在关注这个网站。 LinkHere

def count_ratings_users_freq(self, user_id, values):
"""
For each user, emit a row containing their "postings"
(item,rating pairs)
Also emit user rating sum and count for use later steps.
output:
userid, number of movie rated by user, rating number count, (movieid, movie rating)

17    1,3,(70,3)
35    1,1,(21,1)
49    3,7,(19,2 21,1 70,4)
87    2,3,(19,1 21,2)
98    1,2,(19,2)
"""
item_count = 0
item_sum = 0
final = []
for item_id, rating in values:
    item_count += 1
    item_sum += rating
    final.append((item_id, rating))

yield user_id, (item_count, item_sum, final)

是否可以使用 Hadoop Map 和 Reduce 将上述代码转换为 Java？ userid 作为键
no. movie rated by user, rating number count, (movieid, movie ratings) 作为值。谢谢！

【问题讨论】：

您对输出的期望是什么？
基本上和上面的例子一样。 17 1,3,(70,3)userid, movie rated by user, rating number count, (movieid, movie rating
对不起。目前尚不清楚，您期望的输入是什么，输出是什么。如果你只是希望输出等于输入，那你为什么需要 MapReduce？
对不起。输入是 userid, movieid, ratings 所以我想统计 1 位用户评分的电影数量。
所以，例如如果输入是 (userid, movie id, movie rating) = (17, 70, 3)，那么输出将是 (userId, no. of movies by user, rating number count, (movie id, movie rating) = ( 17,1,3,(70,3))

标签： java hadoop mapreduce

【解决方案1】：

是的，您可以将其转换为 map reduce 程序。

映射器逻辑：

假设输入的格式为（用户 ID、电影 ID、电影评级）（例如 17、70、3），您可以用逗号 (,) 分割每一行，并将“用户 ID”作为键和 (电影 ID、电影评级）作为值。例如记录：（17,70,3），您可以发出键：（17）和值：（70,3）

reducer 逻辑：

您将保留 3 个变量：movieCount（整数）、movieRatingCount（整数）、movieValues（字符串）。
对于每个值，您需要解析该值并获取“电影评分”。例如对于值 (70,3)，您将解析电影评分 = 3。
对于每条有效记录，您将递增movieCount。您将解析后的“电影评分”添加到“movieRatingCount”，并将值附加到“movieValues”字符串。

你会得到想要的输出。

以下是实现此目的的一段代码：

package com.myorg.hadooptests;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class MovieRatings {


    public static class MovieRatingsMapper
            extends Mapper<LongWritable, Text , IntWritable, Text>{

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            String valueStr = value.toString();
            int index = valueStr.indexOf(',');

            if(index != -1) {
                try
                {
                    IntWritable keyUserID = new IntWritable(Integer.parseInt(valueStr.substring(0, index)));
                    context.write(keyUserID, new Text(valueStr.substring(index + 1)));
                }
                catch(Exception e)
                {
                    // You could get a NumberFormatException
                }
            }
        }
    }

    public static class MovieRatingsReducer
            extends Reducer<IntWritable, Text, IntWritable, Text> {

        public void reduce(IntWritable key, Iterable<Text> values,
                           Context context) throws IOException, InterruptedException {

            int movieCount = 0;
            int movieRatingCount = 0;
            String movieValues = "";

            for (Text value : values) {
                String[] tokens = value.toString().split(",");
                if(tokens.length == 2)
                {
                    movieRatingCount += Integer.parseInt(tokens[1].trim()); // You could get a NumberFormatException
                    movieCount++;
                    movieValues = movieValues.concat(value.toString() + " ");
                }
            }

            context.write(key, new Text(Integer.toString(movieCount) + "," + Integer.toString(movieRatingCount) + ",(" + movieValues.trim() + ")"));
        }
    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "CompositeKeyExample");
        job.setJarByClass(MovieRatings.class);
        job.setMapperClass(MovieRatingsMapper.class);
        job.setReducerClass(MovieRatingsReducer.class);

        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path("/in/in2.txt"));
        FileOutputFormat.setOutputPath(job, new Path("/out/"));

        System.exit(job.waitForCompletion(true) ? 0:1);

    }
}

对于输入：

17,70,3
35,21,1
49,19,2
49,21,1
49,70,4
87,19,1
87,21,2
98,19,2

我得到了输出：

17      1,3,(70,3)
35      1,1,(21,1)
49      3,7,(70,4 21,1  19,2)
87      2,3,(21,2 19,1)
98      1,2,(19,2)

【讨论】：