Merged Apache bug fixes #103

markhamstra · 2015-10-15T16:52:28Z

Includes SPARK-11009

https://issues.apache.org/jira/browse/SPARK-10858 The issue here is that in resolveURI we default to calling new File(path).getAbsoluteFile().toURI(). But if the path passed in already has a # in it then File(path) will think that is supposed to be part of the actual file path and not a fragment so it changes # to %23. Then when we try to parse that later in Client as a URI it doesn't recognize there is a fragment. so to fix we just check if there is a fragment, still create the File like we did before and then add the fragment back on. Author: Tom Graves <[email protected]> Closes apache#9035 from tgravescs/SPARK-10858. (cherry picked from commit 63c340a)

…er column in inner select JIRA: https://issues.apache.org/jira/browse/SPARK-10960 When accessing a column in inner select from a select with window function, `AnalysisException` will be thrown. For example, an query like this: select area, rank() over (partition by area order by tmp.month) + tmp.tmp1 as c1 from (select month, area, product, 1 as tmp1 from windowData) tmp Currently, the rule `ExtractWindowExpressions` in `Analyzer` only extracts regular expressions from `WindowFunction`, `WindowSpecDefinition` and `AggregateExpression`. We need to also extract other attributes as the one in `Alias` as shown in the above query. Author: Liang-Chi Hsieh <[email protected]> Closes apache#9011 from viirya/fix-window-inner-column. (cherry picked from commit fcb37a0) Signed-off-by: Yin Huai <[email protected]>

The issue is that local paths on Windows, when provided with drive letters or backslashes, are not valid URIs. Instead of trying to figure out whether paths are URIs or not, use Utils.resolveURI() which does that for us. Author: Marcelo Vanzin <[email protected]> Closes apache#9049 from vanzin/SPARK-11023 and squashes the following commits: 77021f2 [Marcelo Vanzin] [SPARK-11023] [yarn] Avoid creating URIs from local paths directly. (cherry picked from commit 149472a)

…when asked for index after the last non-zero entry See apache#9009 for details. Author: zero323 <[email protected]> Closes apache#9064 from zero323/SPARK-10973_1.5.

This commit improves the documentation around building Spark to (1) recommend using SBT interactive mode to avoid the overhead of launching SBT and (2) refer to the wiki page that documents using SPARK_PREPEND_CLASSES to avoid creating the assembly jar for each compile. cc srowen Author: Kay Ousterhout <[email protected]> Closes apache#9068 from kayousterhout/SPARK-11056. (cherry picked from commit 091c2c3) Signed-off-by: Kay Ousterhout <[email protected]>

…es not train with given regParam and convergenceTol parameters" This reverts commit f95129c.

…park-submit --jars hdfs://user/foo.jar' when spark.yarn.user.classpath.first=true and using 'spark-submit --jars hdfs://user/foo.jar', it can not put foo.jar to system classpath. so we need to put yarn's linkNames of jars to the system classpath. vanzin tgravescs Author: Lianhui Wang <[email protected]> Closes apache#9045 from lianhuiwang/spark-11026. (cherry picked from commit 626aab7) Signed-off-by: Tom Graves <[email protected]>

Currently, All windows function could generate wrong result in cluster sometimes. The root cause is that AttributeReference is called in executor, then id of it may not be unique than others created in driver. Here is the script that could reproduce the problem (run in local cluster): ``` from pyspark import SparkContext, HiveContext from pyspark.sql.window import Window from pyspark.sql.functions import rowNumber sqlContext = HiveContext(SparkContext()) sqlContext.setConf("spark.sql.shuffle.partitions", "3") df = sqlContext.range(1<<20) df2 = df.select((df.id % 1000).alias("A"), (df.id / 1000).alias('B')) ws = Window.partitionBy(df2.A).orderBy(df2.B) df3 = df2.select("client", "date", rowNumber().over(ws).alias("rn")).filter("rn < 0") assert df3.count() == 0 ``` Author: Davies Liu <[email protected]> Author: Yin Huai <[email protected]> Closes apache#9050 from davies/wrong_window. Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala

….sh from scripts' old repo Spark's release packaging scripts used to live in a separate repository. Although these scripts are now part of the Spark repo, there are some minor patches made against the old repos that are missing in Spark's copy of the script. This PR ports those changes. /cc shivaram, who originally submitted these changes against https://github.com/rxin/spark-utils Author: Josh Rosen <[email protected]> Closes apache#8986 from JoshRosen/port-release-build-fixes-from-rxin-repo.

…rain with given regParam and StreamingLinearRegressionWithSGD intercept param is not in correct position. regParam was being passed into the StreamingLogisticRegressionWithSGD constructor, but not transferred to the call for model training. The param is added as a named argument to the call. For StreamingLinearRegressionWithSGC the intercept parameter was not in the correct position and was being passed in as the regularization value. Author: Bryan Cutler <[email protected]> Closes apache#9087 from BryanCutler/StreamingSGD-convergenceTol-bug-10959-branch-1.5.

…ression on Aggregate backport apache#8548 to 1.5 Author: Wenchen Fan <[email protected]> Closes apache#9102 from cloud-fan/branch-1.5.

I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file. Pull request because I filed this JIRA bug report: https://issues.apache.org/jira/browse/SPARK-10981 Author: Monica Liu <[email protected]> Closes apache#9029 from mfliu/master. (cherry picked from commit 8b32885) Signed-off-by: Shivaram Venkataraman <[email protected]>

should pick into spark 1.5.2 also. https://issues.apache.org/jira/browse/SPARK-10619 looks like this was broken by commit: apache@fb1d06f#diff-b8adb646ef90f616c34eb5c98d1ebd16 It looks like somethings were change to use the UIUtils.listingTable but executor page wasn't converted so when it removed sortable from the UIUtils. TABLE_CLASS_NOT_STRIPED it broke this page. Simply add the sortable tag back in and it fixes both active UI and the history server UI. Author: Tom Graves <[email protected]> Closes apache#9101 from tgravescs/SPARK-10619. (cherry picked from commit 135a2ce) Signed-off-by: Reynold Xin <[email protected]>

When refactoring SQL options from plain strings to the strongly typed `SQLConfEntry`, `spark.sql.hive.version` wasn't migrated, and doesn't show up in the result of `SET -v`, as `SET -v` only shows public `SQLConfEntry` instances. This affects compatibility with Simba ODBC driver. This PR migrates this SQL option as a `SQLConfEntry` to fix this issue. Author: Cheng Lian <[email protected]> Closes apache#8925 from liancheng/spark-10845/hive-version-conf. (cherry picked from commit 6f94d56) Signed-off-by: Reynold Xin <[email protected]>

https://issues.apache.org/jira/browse/SPARK-10577 Author: Jian Feng <[email protected]> Closes apache#8801 from Jianfeng-chs/master. (cherry picked from commit 0180b84) Signed-off-by: Reynold Xin <[email protected]> Conflicts: python/pyspark/sql/tests.py

…erwrite is false the fix is for jira https://issues.apache.org/jira/browse/SPARK-8386 Author: Huaxin Gao <[email protected]> Closes apache#9042 from huaxingao/spark8386. (cherry picked from commit 7e1308d) Signed-off-by: Reynold Xin <[email protected]>

Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala

Merged Apache bug fixes

Tom Graves and others added 17 commits October 9, 2015 14:08

[SPARK-10973] [ML] [PYTHON] Fix IndexError exception on SparseVector …

be68a4b

…when asked for index after the last non-zero entry See apache#9009 for details. Author: zero323 <[email protected]> Closes apache#9064 from zero323/SPARK-10973_1.5.

Revert "[SPARK-10959] [PYSPARK] StreamingLogisticRegressionWithSGD do…

2217f4f

…es not train with given regParam and convergenceTol parameters" This reverts commit f95129c.

[SPARK-10389] [SQL] [1.5] support order by non-attribute grouping exp…

94e6d8f

…ression on Aggregate backport apache#8548 to 1.5 Author: Wenchen Fan <[email protected]> Closes apache#9102 from cloud-fan/branch-1.5.

Merge branch 'branch-1.5' of github.com:apache/spark into csd-1.5

0b31268

Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala

markhamstra assigned yeweizhang Oct 15, 2015

fixed mismerge

ab6daa5

markhamstra added a commit that referenced this pull request Oct 16, 2015

Merge pull request #103 from markhamstra/csd-1.5

8906d69

Merged Apache bug fixes

markhamstra merged commit 8906d69 into alteryx:csd-1.5 Oct 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merged Apache bug fixes #103

Merged Apache bug fixes #103

Uh oh!

markhamstra commented Oct 15, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Merged Apache bug fixes #103

Merged Apache bug fixes #103

Uh oh!

Conversation

markhamstra commented Oct 15, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants