Skip to content

Commit 24fbf52

Browse files
committed
Updated API to be similar to KMeans plus other changes requested by Xiangrui on the PR
1 parent c12dfc8 commit 24fbf52

File tree

6 files changed

+532
-187
lines changed

6 files changed

+532
-187
lines changed

data/mllib/pic_data.txt

Lines changed: 299 additions & 0 deletions
Large diffs are not rendered by default.

docs/mllib-clustering-pic.md

Lines changed: 0 additions & 30 deletions
This file was deleted.

docs/mllib-clustering.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,25 @@ a given dataset, the algorithm returns the best clustering result).
3434
* *initializationSteps* determines the number of steps in the k-means\|\| algorithm.
3535
* *epsilon* determines the distance threshold within which we consider k-means to have converged.
3636

37-
[Power Iteration Clustering](mllib-clustering-pic.md) that uses the Power Iteration method combined with KMeans clustering to
38-
cluster points based on a Gaussian measure of the input data pairwise similarity.
37+
### Power Iteration Clustering
38+
39+
Power iteration clustering is a scalable and efficient algorithm for clustering points given pointwise mutual affinity values. Internally the algorithm:
40+
41+
* accepts a [Graph](https://spark.apache.org/docs/0.9.2/api/graphx/index.html#org.apache.spark.graphx.Graph) that represents a normalized pairwise affinity between all input points.
42+
* calculates the principal eigenvalue and eigenvector
43+
* Clusters each of the input points according to their principal eigenvector component value
44+
45+
Details of this algorithm are found within [Power Iteration Clustering, Lin and Cohen]{www.icml2010.org/papers/387.pdf}
46+
47+
Example outputs for a dataset inspired by the paper - but with five clusters instead of three- have he following output from our implementation:
48+
49+
<p style="text-align: center;">
50+
<img src="img/PIClusteringFiveCirclesInputsAndOutputs.png"
51+
title="The Property Graph"
52+
alt="The Property Graph"
53+
width="50%" />
54+
<!-- Images are downsized intentionally to improve quality on retina displays -->
55+
</p>
3956

4057
### Examples
4158

mllib/pom.xml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -108,13 +108,6 @@
108108
<type>test-jar</type>
109109
<scope>test</scope>
110110
</dependency>
111-
<!-- <dependency>
112-
<groupId>org.apache.spark</groupId>
113-
<artifactId>spark-graphx_${scala.binary.version}</artifactId>
114-
<version>${project.version}</version>
115-
<type>test-jar</type>
116-
<scope>test</scope>
117-
</dependency> -->
118111
</dependencies>
119112
<profiles>
120113
<profile>

0 commit comments

Comments
 (0)