-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-21422][BUILD] Depend on Apache ORC 1.4.0 #18640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -132,6 +132,8 @@ | |
| <hive.version.short>1.2.1</hive.version.short> | ||
| <derby.version>10.12.1.1</derby.version> | ||
| <parquet.version>1.8.2</parquet.version> | ||
| <orc.version>1.4.0</orc.version> | ||
| <orc.classifier>nohive</orc.classifier> | ||
| <hive.parquet.version>1.6.0</hive.parquet.version> | ||
| <jetty.version>9.3.20.v20170531</jetty.version> | ||
| <javaxservlet.version>3.1.0</javaxservlet.version> | ||
|
|
@@ -207,6 +209,7 @@ | |
| <flume.deps.scope>compile</flume.deps.scope> | ||
| <hadoop.deps.scope>compile</hadoop.deps.scope> | ||
| <hive.deps.scope>compile</hive.deps.scope> | ||
| <orc.deps.scope>compile</orc.deps.scope> | ||
| <parquet.deps.scope>compile</parquet.deps.scope> | ||
| <parquet.test.deps.scope>test</parquet.test.deps.scope> | ||
|
|
||
|
|
@@ -1677,6 +1680,44 @@ | |
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.orc</groupId> | ||
| <artifactId>orc-core</artifactId> | ||
| <version>${orc.version}</version> | ||
| <classifier>${orc.classifier}</classifier> | ||
| <scope>${orc.deps.scope}</scope> | ||
| <exclusions> | ||
| <exclusion> | ||
| <groupId>org.apache.hadoop</groupId> | ||
| <artifactId>hadoop-common</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.apache.hive</groupId> | ||
| <artifactId>hive-storage-api</artifactId> | ||
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.orc</groupId> | ||
| <artifactId>orc-mapreduce</artifactId> | ||
| <version>${orc.version}</version> | ||
| <classifier>${orc.classifier}</classifier> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for review, @viirya .
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. I think they are come from https://issues.apache.org/jira/browse/ORC-174.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. The wording is a little bit different, but technically those jars come from that JIRA patch. |
||
| <scope>${orc.deps.scope}</scope> | ||
| <exclusions> | ||
| <exclusion> | ||
| <groupId>org.apache.hadoop</groupId> | ||
| <artifactId>hadoop-common</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.apache.orc</groupId> | ||
| <artifactId>orc-core</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.apache.hive</groupId> | ||
| <artifactId>hive-storage-api</artifactId> | ||
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.parquet</groupId> | ||
| <artifactId>parquet-column</artifactId> | ||
|
|
@@ -2710,6 +2751,9 @@ | |
| <profile> | ||
| <id>hive-provided</id> | ||
| </profile> | ||
| <profile> | ||
| <id>orc-provided</id> | ||
| </profile> | ||
| <profile> | ||
| <id>parquet-provided</id> | ||
| </profile> | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -86,6 +86,16 @@ | |
| <scope>test</scope> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.apache.orc</groupId> | ||
| <artifactId>orc-core</artifactId> | ||
| <classifier>${orc.classifier}</classifier> | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry a dumb question, what does
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what exactly is the storage api? confused about this too ...
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In Maven,
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you, @rxin !
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok good to learn the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. sbt understands
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rxin Storage-API is a separately released artifact from the Hive project. Basically, Storage-API are the in-memory format for Hive's vectorization. You could draw the analogy that Storage-Api is for Hive what Arrow is for Drill. It allows formats to read and write directly in the format that is needed by the execution engine. With the nohive classifier, ORC shades the storage-api jar into the ORC namespace so that it is compatible with any version of Hive.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you, @omalley ! |
||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.orc</groupId> | ||
| <artifactId>orc-mapreduce</artifactId> | ||
| <classifier>${orc.classifier}</classifier> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.parquet</groupId> | ||
| <artifactId>parquet-column</artifactId> | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the orc core module still contains hive related stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and to confirm, this exclusion is safe only if we don't use hive storage api of orc in sql/core, right?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for review, @cloud-fan .
orc-core-1.4.0.jarhashive-storage-apidependency. (Maven Repo)orc-core-1.4.0-nohive.jaris a shaded jar file includinghive-storage-apiunderorg.apache.orcnamespace.orc-core-1.4.0-nohive.jaris designed for users and apps who don't want to depend on (or consider)hive.nohiveis a classifier for this purpose.This PR uses
orc-core-1.4.0-nohiveonly. To avoid Maven confusion, this exclusion makes it sure by removing thehive-storage-apidependency explicitly fromorc-coreartifact.