Skip to content

Conversation

markhamstra
Copy link

Includes SPARK-11009

Tom Graves and others added 17 commits October 9, 2015 14:08
https://issues.apache.org/jira/browse/SPARK-10858

The issue here is that in resolveURI we default to calling new File(path).getAbsoluteFile().toURI().  But if the path passed in already has a # in it then File(path) will think that is supposed to be part of the actual file path and not a fragment so it changes # to %23. Then when we try to parse that  later in Client as a URI it doesn't recognize there is a fragment.

so to fix we just check if there is a fragment, still create the File like we did before and then add the fragment back on.

Author: Tom Graves <[email protected]>

Closes apache#9035 from tgravescs/SPARK-10858.

(cherry picked from commit 63c340a)
…er column in inner select

JIRA: https://issues.apache.org/jira/browse/SPARK-10960

When accessing a column in inner select from a select with window function, `AnalysisException` will be thrown. For example, an query like this:

     select area, rank() over (partition by area order by tmp.month) + tmp.tmp1 as c1 from (select month, area, product, 1 as tmp1 from windowData) tmp

Currently, the rule `ExtractWindowExpressions` in `Analyzer` only extracts regular expressions from `WindowFunction`, `WindowSpecDefinition` and `AggregateExpression`. We need to also extract other attributes as the one in `Alias` as shown in the above query.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#9011 from viirya/fix-window-inner-column.

(cherry picked from commit fcb37a0)
Signed-off-by: Yin Huai <[email protected]>
The issue is that local paths on Windows, when provided with drive
letters or backslashes, are not valid URIs.

Instead of trying to figure out whether paths are URIs or not, use
Utils.resolveURI() which does that for us.

Author: Marcelo Vanzin <[email protected]>

Closes apache#9049 from vanzin/SPARK-11023 and squashes the following commits:

77021f2 [Marcelo Vanzin] [SPARK-11023] [yarn] Avoid creating URIs from local paths directly.

(cherry picked from commit 149472a)
…when asked for index after the last non-zero entry

See apache#9009 for details.

Author: zero323 <[email protected]>

Closes apache#9064 from zero323/SPARK-10973_1.5.
This commit improves the documentation around building Spark to
(1) recommend using SBT interactive mode to avoid the overhead of
launching SBT and (2) refer to the wiki page that documents using
SPARK_PREPEND_CLASSES to avoid creating the assembly jar for each
compile.

cc srowen

Author: Kay Ousterhout <[email protected]>

Closes apache#9068 from kayousterhout/SPARK-11056.

(cherry picked from commit 091c2c3)
Signed-off-by: Kay Ousterhout <[email protected]>
…es not train with given regParam and convergenceTol parameters"

This reverts commit f95129c.
…park-submit --jars hdfs://user/foo.jar'

when spark.yarn.user.classpath.first=true and using 'spark-submit --jars hdfs://user/foo.jar', it can not put foo.jar to system classpath. so we need to put yarn's linkNames of jars to the system classpath. vanzin tgravescs

Author: Lianhui Wang <[email protected]>

Closes apache#9045 from lianhuiwang/spark-11026.

(cherry picked from commit 626aab7)
Signed-off-by: Tom Graves <[email protected]>
Currently, All windows function could generate wrong result in cluster sometimes.

The root cause is that AttributeReference is called in executor, then id of it may not be unique than others created in driver.

Here is the script that could reproduce the problem (run in local cluster):
```
from pyspark import SparkContext, HiveContext
from pyspark.sql.window import Window
from pyspark.sql.functions import rowNumber

sqlContext = HiveContext(SparkContext())
sqlContext.setConf("spark.sql.shuffle.partitions", "3")
df =  sqlContext.range(1<<20)
df2 = df.select((df.id % 1000).alias("A"), (df.id / 1000).alias('B'))
ws = Window.partitionBy(df2.A).orderBy(df2.B)
df3 = df2.select("client", "date", rowNumber().over(ws).alias("rn")).filter("rn < 0")
assert df3.count() == 0
```

Author: Davies Liu <[email protected]>
Author: Yin Huai <[email protected]>

Closes apache#9050 from davies/wrong_window.

Conflicts:
	sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
….sh from scripts' old repo

Spark's release packaging scripts used to live in a separate repository. Although these scripts are now part of the Spark repo, there are some minor patches made against the old repos that are missing in Spark's copy of the script. This PR ports those changes.

/cc shivaram, who originally submitted these changes against https://github.com/rxin/spark-utils

Author: Josh Rosen <[email protected]>

Closes apache#8986 from JoshRosen/port-release-build-fixes-from-rxin-repo.
…rain with given regParam and StreamingLinearRegressionWithSGD intercept param is not in correct position.

regParam was being passed into the StreamingLogisticRegressionWithSGD constructor, but not transferred to the call for model training. The param is added as a named argument to the call.  For StreamingLinearRegressionWithSGC the intercept parameter was not in the correct position and was being passed in as the regularization value.

Author: Bryan Cutler <[email protected]>

Closes apache#9087 from BryanCutler/StreamingSGD-convergenceTol-bug-10959-branch-1.5.
…ression on Aggregate

backport apache#8548 to 1.5

Author: Wenchen Fan <[email protected]>

Closes apache#9102 from cloud-fan/branch-1.5.
I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file.
Pull request because I filed this JIRA bug report:
https://issues.apache.org/jira/browse/SPARK-10981

Author: Monica Liu <[email protected]>

Closes apache#9029 from mfliu/master.

(cherry picked from commit 8b32885)
Signed-off-by: Shivaram Venkataraman <[email protected]>
should pick into spark 1.5.2 also.

https://issues.apache.org/jira/browse/SPARK-10619

looks like this was broken by commit: apache@fb1d06f#diff-b8adb646ef90f616c34eb5c98d1ebd16
It looks like somethings were change to use the UIUtils.listingTable but executor page wasn't converted so when it removed sortable from the UIUtils. TABLE_CLASS_NOT_STRIPED it broke this page.

Simply add the sortable tag back in and it fixes both active UI and the history server UI.

Author: Tom Graves <[email protected]>

Closes apache#9101 from tgravescs/SPARK-10619.

(cherry picked from commit 135a2ce)
Signed-off-by: Reynold Xin <[email protected]>
When refactoring SQL options from plain strings to the strongly typed `SQLConfEntry`, `spark.sql.hive.version` wasn't migrated, and doesn't show up in the result of `SET -v`, as `SET -v` only shows public `SQLConfEntry` instances. This affects compatibility with Simba ODBC driver.

This PR migrates this SQL option as a `SQLConfEntry` to fix this issue.

Author: Cheng Lian <[email protected]>

Closes apache#8925 from liancheng/spark-10845/hive-version-conf.

(cherry picked from commit 6f94d56)
Signed-off-by: Reynold Xin <[email protected]>
https://issues.apache.org/jira/browse/SPARK-10577

Author: Jian Feng <[email protected]>

Closes apache#8801 from Jianfeng-chs/master.

(cherry picked from commit 0180b84)
Signed-off-by: Reynold Xin <[email protected]>

Conflicts:
	python/pyspark/sql/tests.py
…erwrite is false

the fix is for jira https://issues.apache.org/jira/browse/SPARK-8386

Author: Huaxin Gao <[email protected]>

Closes apache#9042 from huaxingao/spark8386.

(cherry picked from commit 7e1308d)
Signed-off-by: Reynold Xin <[email protected]>
Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala
markhamstra added a commit that referenced this pull request Oct 16, 2015
@markhamstra markhamstra merged commit 8906d69 into alteryx:csd-1.5 Oct 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.