Skip to content

danielnee/cardinality-mr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Description

Hadoop Map-Reduce job for running distributed cardinality estimation.

Building

Assuming you have Apache Maven installed and configured:

mvn package

The Maven assembly plugin will output the jar, Cardinality.MapReduce-1.0-SNAPSHOT-job.jar

The Hadoop job can be run using

bin/hadoop jar <jar-location>/Cardinality.MapReduce-1.0-SNAPSHOT-job.jar <input-dir> <output-dir>

Input

The input is expected to consist of files containing files containing string identifiers, one per line. The job will compute the estimated cardinality of these strings.

Output

The job will output a single file containing the estimated cardinality.

About

Cardinality estimation algorithms using Hadoop Map-Reduce

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages