Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 70 additions & 11 deletions docs/content.zh/docs/connectors/table/hive/hive_dialect.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,26 +335,85 @@ CREATE FUNCTION function_name AS class_name;
DROP FUNCTION [IF EXISTS] function_name;
```

## DML
## DML & DQL _`Beta`_

### INSERT
Hive 方言支持常用的 Hive [DML](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML)
和 [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select) 。 下表列出了一些 Hive 方言支持的语法。

```sql
INSERT (INTO|OVERWRITE) [TABLE] table_name [PARTITION partition_spec] SELECT ...;
```
- [SORT/CLUSTER/DISTRIBUTE BY](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy)
- [Group By](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy)
- [Join](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)
- [Union](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union)
- [LATERAL VIEW](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView)
- [Window Functions](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics)
- [SubQueries](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries)
- [CTE](https://cwiki.apache.org/confluence/display/Hive/Common+Table+Expression)
- [INSERT INTO dest schema](https://issues.apache.org/jira/browse/HIVE-9481)
- [Implicit type conversions](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-AllowedImplicitConversions)

为了实现更好的语法和语义的兼容,强烈建议使用 [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule)
并将其放在 Module 列表的首位,以便在函数解析时优先使用 Hive 内置函数。

如果指定了 `partition_spec`,可以是完整或者部分分区列。如果是部分指定,则可以省略动态分区的列名。
Hive 方言不再支持 [Flink SQL 语法]({{< ref "docs/dev/table/sql/queries" >}}) 。 若需使用 Flink 语法,请切换到 `default` 方言。

以下是一个使用 Hive 方言的示例。

```bash
Flink SQL> create catalog myhive with ('type' = 'hive', 'hive-conf-dir' = '/opt/hive-conf');
[INFO] Execute statement succeed.

## DQL
Flink SQL> use catalog myhive;
[INFO] Execute statement succeed.

目前,对于DQL语句 Hive 方言和 Flink SQL 支持的语法相同。有关更多详细信息,请参考[Flink SQL 查询]({{< ref "docs/dev/table/sql/queries" >}})。并且建议切换到 `default` 方言来执行 DQL 语句。
Flink SQL> load module hive;
[INFO] Execute statement succeed.

Flink SQL> use modules hive,core;
[INFO] Execute statement succeed.

Flink SQL> set table.sql-dialect=hive;
[INFO] Session property has been set.

Flink SQL> select explode(array(1,2,3)); -- call hive udtf
+-----+
| col |
+-----+
| 1 |
| 2 |
| 3 |
+-----+
3 rows in set

Flink SQL> create table tbl (key int,value string);
[INFO] Execute statement succeed.

Flink SQL> insert overwrite table tbl values (5,'e'),(1,'a'),(1,'a'),(3,'c'),(2,'b'),(3,'c'),(3,'c'),(4,'d');
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:

Flink SQL> select * from tbl cluster by key; -- run cluster by
2021-04-22 16:13:57,005 INFO org.apache.hadoop.mapred.FileInputFormat [] - Total input paths to process : 1
+-----+-------+
| key | value |
+-----+-------+
| 1 | a |
| 1 | a |
| 5 | e |
| 2 | b |
| 3 | c |
| 3 | c |
| 3 | c |
| 4 | d |
+-----+-------+
8 rows in set
```

## 注意

以下是使用 Hive 方言的一些注意事项。

- Hive 方言只能用于操作 Hive 表,不能用于一般表。Hive 方言应与[HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}})一起使用。
- Hive 方言只能用于操作 Hive 对象,并要求当前 Catalog 是一个 [HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}) 。
- Hive 方言只支持 `db.table` 这种两级的标识符,不支持带有 Catalog 名字的标识符。
- 虽然所有 Hive 版本支持相同的语法,但是一些特定的功能是否可用仍取决于你使用的[Hive 版本]({{< ref "docs/connectors/table/hive/overview" >}}#支持的hive版本)。例如,更新数据库位置
只在 Hive-2.4.0 或更高版本支持。
- Hive 和 Calcite 有不同的保留关键字集合。例如,`default` 是 Calcite 的保留关键字,却不是 Hive 的保留关键字。即使使用 Hive 方言, 也必须使用反引号 ( ` ) 引用此类关键字才能将其用作标识符。
- 由于扩展的查询语句的不兼容性,在 Flink 中创建的视图是不能在 Hive 中查询的。
- 执行 DML 和 DQL 时应该使用 [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule) 。
24 changes: 24 additions & 0 deletions docs/content.zh/docs/connectors/table/hive/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
// Hive dependencies
hive-exec-2.3.4.jar

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 1.0.0" >}}
Expand All @@ -146,6 +149,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 1.1.0" >}}
Expand All @@ -165,6 +171,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 1.2.1" >}}
Expand All @@ -184,6 +193,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 2.0.0" >}}
Expand All @@ -197,6 +209,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
// Hive dependencies
hive-exec-2.0.0.jar

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 2.1.0" >}}
Expand All @@ -210,6 +225,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
// Hive dependencies
hive-exec-2.1.0.jar

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 2.2.0" >}}
Expand All @@ -227,6 +245,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
orc-core-1.4.3.jar
aircompressor-0.8.jar // transitive dependency of orc-core

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 3.1.0" >}}
Expand All @@ -241,6 +262,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
hive-exec-3.1.0.jar
libfb303-0.9.3.jar // libfb303 is not packed into hive-exec in some versions, need to add it separately

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< /tabs >}}
Expand Down
93 changes: 74 additions & 19 deletions docs/content/docs/connectors/table/hive/hive_dialect.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,8 +300,6 @@ CREATE VIEW [IF NOT EXISTS] view_name [(column_name, ...) ]

#### Alter

**NOTE**: Altering view only works in Table API, but not supported via SQL client.

##### Rename

```sql
Expand Down Expand Up @@ -346,33 +344,90 @@ CREATE FUNCTION function_name AS class_name;
DROP FUNCTION [IF EXISTS] function_name;
```

## DML
## DML & DQL _`Beta`_

### INSERT
Hive dialect supports a commonly-used subset of Hive's [DML](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML)
and [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select). The following lists some examples of
HiveQL supported by the Hive dialect.

```sql
INSERT (INTO|OVERWRITE) [TABLE] table_name [PARTITION partition_spec] SELECT ...;
```
- [SORT/CLUSTER/DISTRIBUTE BY](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy)
- [Group By](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy)
- [Join](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)
- [Union](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union)
- [LATERAL VIEW](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView)
- [Window Functions](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics)
- [SubQueries](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries)
- [CTE](https://cwiki.apache.org/confluence/display/Hive/Common+Table+Expression)
- [INSERT INTO dest schema](https://issues.apache.org/jira/browse/HIVE-9481)
- [Implicit type conversions](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-AllowedImplicitConversions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all the supported syntax are listed above? e.g. Select, Join, Union, Subqueries, Lateral views, Over, CTE.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-MoreSelectSyntax

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have added these since they're widely used. But it's difficult to make an exhaustive list of supported features.


In order to have better syntax and semantic compatibility, it's highly recommended to use [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule)
and place it first in the module list, so that Hive built-in functions can be picked up during function resolution.

Hive dialect no longer supports [Flink SQL queries]({{< ref "docs/dev/table/sql/queries" >}}). Please switch to `default`
dialect if you'd like to write in Flink syntax.

Following is an example of using hive dialect to run some queries.

The `partition_spec`, if present, can be either a full spec or partial spec. If the `partition_spec` is a partial
spec, the dynamic partition column names can be omitted.
```bash
Flink SQL> create catalog myhive with ('type' = 'hive', 'hive-conf-dir' = '/opt/hive-conf');
[INFO] Execute statement succeed.

Flink SQL> use catalog myhive;
[INFO] Execute statement succeed.

Flink SQL> load module hive;
[INFO] Execute statement succeed.

## DQL
Flink SQL> use modules hive,core;
[INFO] Execute statement succeed.

At the moment, Hive dialect supports the same syntax as Flink SQL for DQLs. Refer to
[Flink SQL queries]({{< ref "docs/dev/table/sql/queries" >}}) for more details. And it's recommended to switch to
`default` dialect to execute DQLs.
Flink SQL> set table.sql-dialect=hive;
[INFO] Session property has been set.

Flink SQL> select explode(array(1,2,3)); -- call hive udtf
+-----+
| col |
+-----+
| 1 |
| 2 |
| 3 |
+-----+
3 rows in set

Flink SQL> create table tbl (key int,value string);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to create and use a hive catalog before this?

[INFO] Execute statement succeed.

Flink SQL> insert overwrite table tbl values (5,'e'),(1,'a'),(1,'a'),(3,'c'),(2,'b'),(3,'c'),(3,'c'),(4,'d');
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:

Flink SQL> select * from tbl cluster by key; -- run cluster by
2021-04-22 16:13:57,005 INFO org.apache.hadoop.mapred.FileInputFormat [] - Total input paths to process : 1
+-----+-------+
| key | value |
+-----+-------+
| 1 | a |
| 1 | a |
| 5 | e |
| 2 | b |
| 3 | c |
| 3 | c |
| 3 | c |
| 4 | d |
+-----+-------+
8 rows in set
```

## Notice

The following are some precautions for using the Hive dialect.

- Hive dialect should only be used to manipulate Hive tables, not generic tables. And Hive dialect should be used together
with a [HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}).
- Hive dialect should only be used to process Hive meta objects, and requires the current catalog to be a
[HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}).
- Hive dialect only supports 2-part identifiers, so you can't specify catalog for an identifier.
- While all Hive versions support the same syntax, whether a specific feature is available still depends on the
[Hive version]({{< ref "docs/connectors/table/hive/overview" >}}#supported-hive-versions) you use. For example, updating database
location is only supported in Hive-2.4.0 or later.
- Hive and Calcite have different sets of reserved keywords. For example, `default` is a reserved keyword in Calcite and
a non-reserved keyword in Hive. Even with Hive dialect, you have to quote such keywords with backtick ( ` ) in order to
use them as identifiers.
- Due to expanded query incompatibility, views created in Flink cannot be queried in Hive.
- Use [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule)
to run DML and DQL.
24 changes: 24 additions & 0 deletions docs/content/docs/connectors/table/hive/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,9 @@ Please find the required dependencies for different Hive major versions below.
// Hive dependencies
hive-exec-2.3.4.jar

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 1.0.0" >}}
Expand All @@ -150,6 +153,9 @@ Please find the required dependencies for different Hive major versions below.
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 1.1.0" >}}
Expand All @@ -169,6 +175,9 @@ Please find the required dependencies for different Hive major versions below.
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 1.2.1" >}}
Expand All @@ -188,6 +197,9 @@ Please find the required dependencies for different Hive major versions below.
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 2.0.0" >}}
Expand All @@ -201,6 +213,9 @@ Please find the required dependencies for different Hive major versions below.
// Hive dependencies
hive-exec-2.0.0.jar

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 2.1.0" >}}
Expand All @@ -214,6 +229,9 @@ Please find the required dependencies for different Hive major versions below.
// Hive dependencies
hive-exec-2.1.0.jar

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 2.2.0" >}}
Expand All @@ -231,6 +249,9 @@ Please find the required dependencies for different Hive major versions below.
orc-core-1.4.3.jar
aircompressor-0.8.jar // transitive dependency of orc-core

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< tab "Hive 3.1.0" >}}
Expand All @@ -245,6 +266,9 @@ Please find the required dependencies for different Hive major versions below.
hive-exec-3.1.0.jar
libfb303-0.9.3.jar // libfb303 is not packed into hive-exec in some versions, need to add it separately

// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar

```
{{< /tab >}}
{{< /tabs >}}
Expand Down