Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented Jan 29, 2015

There is only a single stat.py file for the mllib.stat package. We recently added MultivariateGaussian under mllib.stat.distribution in Scala/Java. It would be nice to refactor stat.py and make it easy to expand. Note that ChiSqTestResult is moved from mllib.stat to mllib.stat.test. The latter is used in Scala/Java. It is only used in the return value of Statistics.chiSqTest, so this should be an okay change.

@davies

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26305 has started for PR 4266 at commit 1a5e1db.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26305 has finished for PR 4266 at commit 1a5e1db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26305/
Test PASSed.

@davies
Copy link
Contributor

davies commented Jan 29, 2015

Is it OK to break compatibility? Otherwise looks good to me.

@mengxr
Copy link
Contributor Author

mengxr commented Jan 29, 2015

It is unfortunate that we didn't put ChiSqTestResult under stat.test in the first version. We are not expecting users to construct ChiSqTestResult directly, but get it from Statistic.chiSqTest() instead. So I'd like to make this change earlier to keep the API consistent with Scala/Java's. I'm merging this in and will leave a note on the migration guide.

@mengxr
Copy link
Contributor Author

mengxr commented Jan 29, 2015

Merged into master.

@asfgit asfgit closed this in a3dc618 Jan 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants