Mapper and Reducer Program Using MongoDB

Mangesh Prakash Jadhav
3 min readSep 5, 2021

1. Map-Reduce :

MapReduce is a processing technique and a program model for distributed computing based on java. It is designed for processing the data in parallel which is divided on various machines(nodes). The MapReduce algorithm contains two important tasks, namely Map and Reduce.

i) Mapper :

A mapper is a function or task which is used to process all input records from a file and generate the output which works as input for Reducer. It produces the output by returning new key-value pairs.

ii) Reducer :

Reducer is the second part of the Map-Reduce programming model. But before sending these intermediate key-value pairs directly to the Reducer some process will be done which shuffle and sort the key-value pairs according to their key values, which means the value of the key is the main decisive factor for sorting. The output generated by the Reducer will be the final output. Reducer mainly performs some computation operations like addition, filtration, and aggregation.

Now I am going to import the data in my local MongoDB server .

>mongoimport persons.json -d mydata -c data --jsonArray

Now let’s do some operations on top of this data to just see how it works

Let’s create a Pipeline to determine the number of male in each state.

db.data.aggregate([{$match: {gender: "male"}},{$group:{_id:{state:"$location.state"},total_males:{$sum:1}}}])

Let’s create a Map-Reduce program to calculate average age of all males and females -

Creating Mapper function-:

In very simple mapper means filter some data.

Creating Reducer function-:

Doing Operation-:

To view the results of the program we have to use command

Hence we get the result

--

--