GitHub - danielnee/cardinality-mr: Cardinality estimation algorithms using Hadoop Map-Reduce

Description

Hadoop Map-Reduce job for running distributed cardinality estimation.

Building

Assuming you have Apache Maven installed and configured:

mvn package

The Maven assembly plugin will output the jar, Cardinality.MapReduce-1.0-SNAPSHOT-job.jar

The Hadoop job can be run using

bin/hadoop jar <jar-location>/Cardinality.MapReduce-1.0-SNAPSHOT-job.jar <input-dir> <output-dir>

Input

The input is expected to consist of files containing files containing string identifiers, one per line. The job will compute the estimated cardinality of these strings.

Output

The job will output a single file containing the estimated cardinality.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.mdown		README.mdown
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Description

Building

Input

Output

About

Uh oh!

Releases

Packages

Languages

danielnee/cardinality-mr

Folders and files

Latest commit

History

Repository files navigation

Description

Building

Input

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages