Make everything as simple as possible, but not simpler. -Albert Einstein 

Home Gridgain MapReduce with Gridgain, Part 1: Count Votes
formats

MapReduce with Gridgain, Part 1: Count Votes

Summary:  Gridgain is advanced software middleware that combines compute and data grid capabilities to provide a framework for processing large data sets.    Gridgain implements a well known parallel design pattern developed by Google, called MapReduce.   It’s zero deployment feature allows it to be used for building high performance cloud applications.   In honor of the presidential election,  a fictitious Vote counting problem is used to showcase the  distributive computing features in Gridgain.   This article is the first of a series that will explore various characteristics of Gridgain.

Prerequisites: If you would like to obtain this article’s complete sample, it may be obtained from our GitHub repository.  All samples are Maven based java projects.  In addition, this article will require that you have Gridgain installed.

The following steps are required to use  Maven  to execute the sample Gridgain project:

1. Set GRIDGAIN_HOME environment variable to path of Gridgain installation.

On Windows: set GRIDGAIN_HOME=C:\netmilleRoot\tools\gridgain-4.3.1e-computegrid

2.Manually install the gridgain-4.3.1e.jar  into your local Maven repository.  The gridgain-4.3.1e.jar  is  located in the root Gridgain installation folder.

mvn install:install-file -Dfile=gridgain-4.3.1e.jar -DgroupId=org.gridgain  -DartifactId=gridgain -Dversion=4.3.1e -Dpackaging=jar

Lets Get Started:    GridTask and GridJob interfaces are the  two major abstractions within GridgainGridTask   represents a major unit of work, while GridJob represents a sub task.    Also,  GridTask is  responsible for dividing the unit of work into GridJobs, mapping the  GridJobs on to available compute nodes, and  aggregating results from GridJobs

The following steps further describe the MapReduce algorithm in Gridgain:

  1. A task (GridTask)  is split into subtasks  called GridJobs
  2. Next, the GridJobs are mapped and shipped to various nodes (compute resources) for parallel processing
  3. Upon completion, results of GridJobs are returned.
  4. All results from GridJobs are aggregated by GridTask into a final result.

Counting Votes with Gridgain: Given  a random population of 10,000 votes (Vote), we  use Gridgain to determine the winning party.  In this problem, a Vote can either be cast for Republican (Party.REPUBLICAN)  or Democrat (Party.DEMOCRAT) .

In order to implement this usecase, we will split our population of  10,000 votes (Vote)  into  a list of sub lists.  Next, we  assign each sub list of votes to a VoteCounterGridJob  to be calculated on an available compute resource.    The Gridgain middleware utilizes advanced load balancing features to ship  VoteCounterGridJobs to available nodes for processing.       Once each VoteCounterGridJob has counted it’s respective votes, the results are returned to the VoteCounterGridTask to be aggregated into a final result.

NOTE:  Gridgain relys on Spring Framework’s IOC architecture to enable customization of nearly every aspect of it’s functional behavior. We will see examples of this feature in future articles.

Please refer to Listing 1: VoteCounterGridTask for the following explanation:

In  line  18,  Our VoteCounterGridTask extends GridTaskSplitAdapter.  This type of GridTask relies on Gridgain infrastructure to map VoteCounterGridJobs onto available compute resources.

In  line  20,  Gridgain will invoke our split()  method.   In this routine, we split the population of Vote objects into a list of  lists.  Our list of Vote objects are assigned to VoteCounterGridJobs.  Once this method is called, Gridgain will internally ship our VoterCounterGridJobs to available nodes.

In  line  35,  Gridgain will invoke our reduce()  method.  In this routine, we aggregate the results returned from our VoteCounterGridJobs into a final result (VoteResult).

Listing 1: VoteCounterGridTask.java

package techbysample.gridgain4.sample1;

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;

import org.gridgain.grid.GridException;
import org.gridgain.grid.GridJob;
import org.gridgain.grid.GridJobResult;
import org.gridgain.grid.GridTaskSplitAdapter;

/**
 *
 * @author TechBySample.com
 *
 */

public class VoteCounterGridTask  extends GridTaskSplitAdapter, VoteResult> {

	protected Collection split(int gridSize, List votes) throws GridException {

		List> dividedVotes = divide(votes,50);

		List jobs = new ArrayList(dividedVotes.size());

	    for (List _votes: dividedVotes)
	     {
	    	 jobs.add(new VoteCounterGridJob(_votes));
	     }

		return jobs;

	}

	public VoteResult reduce(List results) throws GridException {

		int democrat=0;
		int republican=0;

		for (GridJobResult result: results)
		{
			VoteResult voteResult= result.getData();

			democrat = democrat + voteResult.getResults(Party.DEMOCRAT);
			republican= republican + voteResult.getResults(Party.REPUBLICAN);
		}

		VoteResult _voteResult = new VoteResult();
		_voteResult.setResults(Party.DEMOCRAT, democrat );
		_voteResult.setResults(Party.REPUBLICAN, republican );
		return _voteResult;
	}

	public static  List> divide(List list, int size)
            throws NullPointerException, IllegalArgumentException {
        if (list == null) {
            throw new NullPointerException("The list parameter is null.");
        }
        if (size             throw new IllegalArgumentException(
                "The list size parameter must be more than 0.");
        }
        int num = list.size() / size;
        int mod = list.size() % size;
        List> ret = new ArrayList>(mod > 0 ? num + 1 : num);
        for (int i = 0; i < num; i++) {             ret.add(list.subList(i * size, (i + 1) * size));         }         if (mod > 0) {
            ret.add(list.subList(num * size, list.size()));
        }
        return ret;
    }

}

Unit  Testing: A JUnit testcase (VoteCounterGridTest)  is utilized to demonstrate Gridgain’s distributive computing behavior.

Please refer to Listing 4: VoteCounterGridTest.java  for the following explanation:

In line 25, the initialize() method is used to start the Gridgain runtime.

In line 41, the testCountVotes() method is used to generate a random population of votes.

In line 54,  Gridgain  provides Grid object to execute our VoteCounterGridTask using votestobeCounted as an input parameter

In line 57, a GridFuture object is used to retrieve the final result (VoteResult).

Listing 2: VoteCounterGridTest.java

package techbysample.gridgain4.sample1;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

import org.gridgain.grid.Grid;
import org.gridgain.grid.GridTaskFuture;
import org.gridgain.grid.typedef.G;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

/**
 *
 * @author TechBySample.com
 *
 */

public class VoteCounterGridTest {

	private Grid grid = null;

	@Before
	public  void initialize() {

		try{
			G.start();

			grid = G.grid();
		}

		catch(Exception e)
		{
			System.out.println(e);
		}

	}

	@Test
	public void testCountVotes()
	{
	    Party[] parties = {Party.DEMOCRAT,Party.REPUBLICAN};

		List votesTobeCounted = new ArrayList();
		Random randomGenerator = new Random();

		for (int i=0;i		{
			int randomInt = randomGenerator.nextInt(2);
			votesTobeCounted.add(new Vote(parties[randomInt]));
		}
		try{
		// Execute task.
		GridTaskFuture future = grid.execute(VoteCounterGridTask.class, votesTobeCounted);

		// Wait for task completion.
		VoteResult  result = future.get();

		System.out.println("Democrat vote count=" + result.getResults(Party.DEMOCRAT));
		System.out.println("Republican vote count=" + result.getResults(Party.REPUBLICAN));

		if (result.getResults(Party.DEMOCRAT) == result.getResults(Party.REPUBLICAN))
		{
			System.out.println("We have tie!");
		}

		if (result.getResults(Party.DEMOCRAT) > result.getResults(Party.REPUBLICAN))
		{
			System.out.println("We have a Democratic president!");
		}
		else{
			System.out.println("We have a Republican president!");
		}
		}
		catch(Exception e)
		{
			System.out.println(e);
		}

	}

	@After
	public void tearDown()
	{
		grid=null;
	}
}

Running  VoteCounterGridTest:

Prior to running the JUnit test, we will start 2 standalone compute nodes to be available for processing our GridJobs.

NOTE: Its worth mentioning, that these nodes are ‘barebone’ nodes with only the Gridgain runtime.  Our classes are NOT pre-installed on each JVM node.   Its NOT necessary as  Gridgain takes care of ‘magically‘ shipping required  classes to remote nodes for processing.

Follow these steps:

1.  Navigate to your <Gridgain installation>/bin  folder  and type  the startup script corresponding to your os:

     ggstart.bat or ggstart.sh

2.  Repeat step 1.

3.  You should see a display similar to the following:

Node 1:

GridGain Command Line Loader, ver. 4.3.1e.10112012

2012 Copyright (C) GridGain Systems

[07:55:15]   _____     _     _______      _

[07:55:15]  / ___/____(_)___/ / ___/___ _(_)___

[07:55:15] / (_ // __/ // _  / (_ // _ `/ // _ \

[07:55:15] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/

[07:55:15]

[07:55:15]  —==++ IN-MEMORY BIG DATA ++==—

[07:55:15]         ver. 4.3.1e-10112012

[07:55:15] 2012 Copyright (C) GridGain Systems

[07:55:15] Quiet mode.

[07:55:15]   ^– To disable add -DGRIDGAIN_QUIET=false or “-v” to ggstart.{sh|bat}

[07:55:15] << Enterprise Edition >>

[07:55:15] Config URL: file:/C:/netmilleRoot/tools/gridgain-4.3.1e-computegrid/config/default-spring.xml

[07:55:15] Daemon mode: off

[07:55:15] Language runtime: Java Platform API Specification ver. 1.6

[07:55:15] JVM name: Java HotSpot(TM) Client VM

[07:55:15] Remote Management [restart: on, REST: on, JMX (remote: on, port: 49123, auth: off, ssl: off)]

[07:55:15] GRIDGAIN_HOME=C:\netmilleRoot\tools\gridgain-4.3.1e-computegrid

. . . .

[07:55:22]     OS: Windows Vista 6.0 x86, netmille

[07:55:22]     VM name: 65076@netmille-PC

[07:55:22] Local ports used [TCP:47100 UDP:47200 TCP:47300]

[07:55:22] GridGain started OK

Node 2:

GridGain Command Line Loader, ver. 4.3.1e.10112012

2012 Copyright (C) GridGain Systems

[07:58:17]   _____     _     _______      _

[07:58:17]  / ___/____(_)___/ / ___/___ _(_)___

[07:58:17] / (_ // __/ // _  / (_ // _ `/ // _ \

[07:58:17] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/

[07:58:17]

[07:58:17]  —==++ IN-MEMORY BIG DATA ++==—

[07:58:17]         ver. 4.3.1e-10112012

[07:58:17] 2012 Copyright (C) GridGain Systems

[07:58:17]

[07:58:17] Quiet mode.

[07:58:17]   ^– To disable add -DGRIDGAIN_QUIET=false or “-v” to ggstart.{sh|ba

[07:58:17] << Enterprise Edition >>

[07:58:17] Config URL: file:/C:/netmilleRoot/tools/gridgain-4.3.1e-computegrid/c

[07:58:17] Daemon mode: off

[07:58:17] Language runtime: Java Platform API Specification ver. 1.6

[07:58:17] JVM name: Java HotSpot(TM) Client VM

[07:58:17] Remote Management [restart: on, REST: on, JMX (remote: on, port: 4912

[07:58:17] GRIDGAIN_HOME=C:\netmilleRoot\tools\gridgain-4.3.1e-computegrid

[07:58:17] (wrn) SMTP is not configured – email notifications are off.

[07:58:17] (wrn) Cache is not configured – data grid is off.

[07:58:19] (wrn) Swap space is disabled (to enable use GridLevelDbSwapSpaceSpi).

[07:58:19] Security status [authentication=on, secure-session=on]

[07:58:20] Topology snapshot [nodes=1, CPUs=1, hash=0x517F43E8]

[07:58:20] Node JOINED [nodeId8=d489993f, addr=[192.168.2.7], order=135557971789

[07:58:23] Topology snapshot [nodes=2, CPUs=1, hash=0xE92CA1A6]

. . . .

[07:58:24] GridGain started OK

4.  From the ‘gridgain4-sample1′ project directory, type:

mvn -Dtest=VoteCounterGridTest test

On Node 1 ,Node 2, and  JUnit Node (where testcase was executed ), you should see  several intermediate results from various VoteCounterGridJobs similar to following:

Node 1/ Node 2/JUnit Node:

Local vote results:

DEMOCRAT=28 votes

REPUBLICAN=22 votes

Local vote results:

DEMOCRAT=18 votes

REPUBLICAN=32 votes

5. Finally,  when VoteCounterGridTest completes, on JUnit Node  you should see a final result:

JUnit Node:

Local vote results:

DEMOCRAT=28 votes

REPUBLICAN=22 votes

Local vote results:

DEMOCRAT=24 votes

REPUBLICAN=26 votes

Democrat vote count=4952

Republican vote count= 5048

We have a Republican president!

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 42.821 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[18:02:40] GridGain stopped OK [uptime=00:00:29:350]

[INFO] ————————————————————————

[INFO] BUILD SUCCESS

[INFO] ————————————————————————

Resources:

Official Gridgain website

Multicore Cloud Applications with Gridain and Amazon Web Services

 
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">

© Techbysample.com, all rights reserved.