Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented Dec 3, 2014

Note that the usage of breezeSquaredDistance in
org.apache.spark.mllib.util.MLUtils.fastSquaredDistance
is in the critical path, and breezeSquaredDistance is slow.
We should replace it with our own implementation.

Here is the benchmark against mnist8m dataset.

Before
DenseVector: 70.04secs
SparseVector: 59.05secs

With this PR
DenseVector: 30.58secs
SparseVector: 21.14secs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use BLAS.axpy here.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24065 has started for PR 3565 at commit b185a77.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24065 has finished for PR 3565 at commit b185a77.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorWithNorm(val vector: Vector, val norm: Double) extends Serializable

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24065/
Test FAILed.

@dbtsai
Copy link
Member Author

dbtsai commented Dec 3, 2014

Calling BLAS will add very small extra overhead. The benchmark will now be

DenseVector: 33.19secs
SparseVector: 22.05secs

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24067 has started for PR 3565 at commit de24662.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24068 has started for PR 3565 at commit 08bc068.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24067 has finished for PR 3565 at commit de24662.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorWithNorm(val vector: Vector, val norm: Double) extends Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24067/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24068 has finished for PR 3565 at commit 08bc068.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorWithNorm(val vector: Vector, val norm: Double) extends Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24068/
Test PASSed.

asfgit pushed a commit that referenced this pull request Dec 3, 2014
…e/sparse sample

Note that the usage of `breezeSquaredDistance` in
`org.apache.spark.mllib.util.MLUtils.fastSquaredDistance`
is in the critical path, and `breezeSquaredDistance` is slow.
We should replace it with our own implementation.

Here is the benchmark against mnist8m dataset.

Before
DenseVector: 70.04secs
SparseVector: 59.05secs

With this PR
DenseVector: 30.58secs
SparseVector: 21.14secs

Author: DB Tsai <[email protected]>

Closes #3565 from dbtsai/kmean and squashes the following commits:

08bc068 [DB Tsai] restyle
de24662 [DB Tsai] address feedback
b185a77 [DB Tsai] cleanup
4554ddd [DB Tsai] first commit

(cherry picked from commit 7fc49ed)
Signed-off-by: Xiangrui Meng <[email protected]>
@asfgit asfgit closed this in 7fc49ed Dec 3, 2014
@mengxr
Copy link
Contributor

mengxr commented Dec 3, 2014

LGTM. Merged into master and branch-1.2. Thanks!

@dbtsai dbtsai deleted the kmean branch December 3, 2014 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants