Skip to content

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Apr 27, 2021

What changes were proposed in this pull request?

This PR proposes to add a new built-in function mask a data masking function which Hive already supports.

Actually, Hive supports 6 data masking functions.
I'll implement all of them but I'd like to introduce the most basic one in this PR.

mask in this PR is implemented to be compatible with the Hive's implementation.
In Hive, there is a [document}(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DataMaskingFunctions) for data masking functions but it seems outdated so I referred to the Hive's source code.
https://github.com/apache/hive/blob/rel/release-2.3.8/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMask.java

Actually, Hive's data masking functions can be applied to not only characters but also numeric types and date type.
In this PR, mask supports only StringType but I'll extends it for other types. So, this new function is categolized as a new group data_masking_funcs rather than string_funcs.

Why are the changes needed?

For data masking, mask is more useful than replace or translate.

Does this PR introduce any user-facing change?

Yes. Users can use the new built-in function.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label Apr 27, 2021
@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42526/

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42526/

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Test build #138007 has finished for PR 32372 at commit f6baec9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Mask(
  • class MaskTransformer()

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42533/

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42533/

@yaooqinn
Copy link
Member

The Hive data masking functions were removed in #21786

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Test build #138014 has finished for PR 32372 at commit 09227e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member Author

sarutak commented Apr 28, 2021

The Hive data masking functions were removed in #21786

oh... I didn't notice.

@sarutak sarutak closed this May 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants