forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 14
Merged Apache bug fixes #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://issues.apache.org/jira/browse/SPARK-10858 The issue here is that in resolveURI we default to calling new File(path).getAbsoluteFile().toURI(). But if the path passed in already has a # in it then File(path) will think that is supposed to be part of the actual file path and not a fragment so it changes # to %23. Then when we try to parse that later in Client as a URI it doesn't recognize there is a fragment. so to fix we just check if there is a fragment, still create the File like we did before and then add the fragment back on. Author: Tom Graves <[email protected]> Closes apache#9035 from tgravescs/SPARK-10858. (cherry picked from commit 63c340a)
…er column in inner select JIRA: https://issues.apache.org/jira/browse/SPARK-10960 When accessing a column in inner select from a select with window function, `AnalysisException` will be thrown. For example, an query like this: select area, rank() over (partition by area order by tmp.month) + tmp.tmp1 as c1 from (select month, area, product, 1 as tmp1 from windowData) tmp Currently, the rule `ExtractWindowExpressions` in `Analyzer` only extracts regular expressions from `WindowFunction`, `WindowSpecDefinition` and `AggregateExpression`. We need to also extract other attributes as the one in `Alias` as shown in the above query. Author: Liang-Chi Hsieh <[email protected]> Closes apache#9011 from viirya/fix-window-inner-column. (cherry picked from commit fcb37a0) Signed-off-by: Yin Huai <[email protected]>
The issue is that local paths on Windows, when provided with drive letters or backslashes, are not valid URIs. Instead of trying to figure out whether paths are URIs or not, use Utils.resolveURI() which does that for us. Author: Marcelo Vanzin <[email protected]> Closes apache#9049 from vanzin/SPARK-11023 and squashes the following commits: 77021f2 [Marcelo Vanzin] [SPARK-11023] [yarn] Avoid creating URIs from local paths directly. (cherry picked from commit 149472a)
…when asked for index after the last non-zero entry See apache#9009 for details. Author: zero323 <[email protected]> Closes apache#9064 from zero323/SPARK-10973_1.5.
This commit improves the documentation around building Spark to (1) recommend using SBT interactive mode to avoid the overhead of launching SBT and (2) refer to the wiki page that documents using SPARK_PREPEND_CLASSES to avoid creating the assembly jar for each compile. cc srowen Author: Kay Ousterhout <[email protected]> Closes apache#9068 from kayousterhout/SPARK-11056. (cherry picked from commit 091c2c3) Signed-off-by: Kay Ousterhout <[email protected]>
…es not train with given regParam and convergenceTol parameters" This reverts commit f95129c.
…park-submit --jars hdfs://user/foo.jar' when spark.yarn.user.classpath.first=true and using 'spark-submit --jars hdfs://user/foo.jar', it can not put foo.jar to system classpath. so we need to put yarn's linkNames of jars to the system classpath. vanzin tgravescs Author: Lianhui Wang <[email protected]> Closes apache#9045 from lianhuiwang/spark-11026. (cherry picked from commit 626aab7) Signed-off-by: Tom Graves <[email protected]>
Currently, All windows function could generate wrong result in cluster sometimes. The root cause is that AttributeReference is called in executor, then id of it may not be unique than others created in driver. Here is the script that could reproduce the problem (run in local cluster): ``` from pyspark import SparkContext, HiveContext from pyspark.sql.window import Window from pyspark.sql.functions import rowNumber sqlContext = HiveContext(SparkContext()) sqlContext.setConf("spark.sql.shuffle.partitions", "3") df = sqlContext.range(1<<20) df2 = df.select((df.id % 1000).alias("A"), (df.id / 1000).alias('B')) ws = Window.partitionBy(df2.A).orderBy(df2.B) df3 = df2.select("client", "date", rowNumber().over(ws).alias("rn")).filter("rn < 0") assert df3.count() == 0 ``` Author: Davies Liu <[email protected]> Author: Yin Huai <[email protected]> Closes apache#9050 from davies/wrong_window. Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
….sh from scripts' old repo Spark's release packaging scripts used to live in a separate repository. Although these scripts are now part of the Spark repo, there are some minor patches made against the old repos that are missing in Spark's copy of the script. This PR ports those changes. /cc shivaram, who originally submitted these changes against https://github.com/rxin/spark-utils Author: Josh Rosen <[email protected]> Closes apache#8986 from JoshRosen/port-release-build-fixes-from-rxin-repo.
…rain with given regParam and StreamingLinearRegressionWithSGD intercept param is not in correct position. regParam was being passed into the StreamingLogisticRegressionWithSGD constructor, but not transferred to the call for model training. The param is added as a named argument to the call. For StreamingLinearRegressionWithSGC the intercept parameter was not in the correct position and was being passed in as the regularization value. Author: Bryan Cutler <[email protected]> Closes apache#9087 from BryanCutler/StreamingSGD-convergenceTol-bug-10959-branch-1.5.
…ression on Aggregate backport apache#8548 to 1.5 Author: Wenchen Fan <[email protected]> Closes apache#9102 from cloud-fan/branch-1.5.
I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file. Pull request because I filed this JIRA bug report: https://issues.apache.org/jira/browse/SPARK-10981 Author: Monica Liu <[email protected]> Closes apache#9029 from mfliu/master. (cherry picked from commit 8b32885) Signed-off-by: Shivaram Venkataraman <[email protected]>
should pick into spark 1.5.2 also. https://issues.apache.org/jira/browse/SPARK-10619 looks like this was broken by commit: apache@fb1d06f#diff-b8adb646ef90f616c34eb5c98d1ebd16 It looks like somethings were change to use the UIUtils.listingTable but executor page wasn't converted so when it removed sortable from the UIUtils. TABLE_CLASS_NOT_STRIPED it broke this page. Simply add the sortable tag back in and it fixes both active UI and the history server UI. Author: Tom Graves <[email protected]> Closes apache#9101 from tgravescs/SPARK-10619. (cherry picked from commit 135a2ce) Signed-off-by: Reynold Xin <[email protected]>
When refactoring SQL options from plain strings to the strongly typed `SQLConfEntry`, `spark.sql.hive.version` wasn't migrated, and doesn't show up in the result of `SET -v`, as `SET -v` only shows public `SQLConfEntry` instances. This affects compatibility with Simba ODBC driver. This PR migrates this SQL option as a `SQLConfEntry` to fix this issue. Author: Cheng Lian <[email protected]> Closes apache#8925 from liancheng/spark-10845/hive-version-conf. (cherry picked from commit 6f94d56) Signed-off-by: Reynold Xin <[email protected]>
https://issues.apache.org/jira/browse/SPARK-10577 Author: Jian Feng <[email protected]> Closes apache#8801 from Jianfeng-chs/master. (cherry picked from commit 0180b84) Signed-off-by: Reynold Xin <[email protected]> Conflicts: python/pyspark/sql/tests.py
…erwrite is false the fix is for jira https://issues.apache.org/jira/browse/SPARK-8386 Author: Huaxin Gao <[email protected]> Closes apache#9042 from huaxingao/spark8386. (cherry picked from commit 7e1308d) Signed-off-by: Reynold Xin <[email protected]>
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Includes SPARK-11009