- 
                Notifications
    You must be signed in to change notification settings 
- Fork 28.9k
[SPARK-8234][SQL] misc function: md5 #6779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @rxin @liancheng can you trigger the unit test? | 
| Jenkins, ok to test. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't make sense -- since it is using child's data type to check data type... you should remove ExpectsInputTypes and explicitly define the type check
| Test build #34756 has finished for   PR 6779 at commit  
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test would be simpler/clearer if you just pass literals to the expression. There are other tests that make sure that we can extract values from rows.
Additionally It would be good to test some edge cases: an empty string, null, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for edge cases.
@qiansl127 you can write code like
checkEvaluation(Md5(Literal("ABC"), "902fbdd2b1df0c4f70b4a5d23525e932")
checkEvaluation(Md5(Literal.create(null, BinaryType), "...")
checkEvaluation(Md5(Literal.create(null, StringType), "...")| Test build #34765 has finished for   PR 6779 at commit  
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I set a bad example in strlen, we'd better insert an empty line after the doc and right before the @ group
| Test build #34907 has finished for   PR 6779 at commit  
 | 
| Test build #34908 has finished for   PR 6779 at commit  
 | 
| Test build #34922 has finished for   PR 6779 at commit  
 | 
| Test build #34964 has finished for   PR 6779 at commit  
 | 
| @rxin @chenghao-intel Any thing need to be improved? | 
| add a test using  LGTM otherwise. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This override is not necessary now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean we can remove this function overriding, as we implemented the expectedChildTypes already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please see doc for ExpectsInputTypes. We don't need to do type check for it.
| Test build #35013 has finished for   PR 6779 at commit  
 | 
| LGTM, cc @rxin | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need the toString here ... since case class already handles it.
| Looks good - but please add a codegen version of this. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a new line here? maybe just
"abcd" +
  "efgh"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not a big problem, as we don't want to make the generated code more than 100 characters per line, right? Or @qiansl127 can update that in the follow up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have called stripMargin, so it will still result in one line.
Anyway I think the current code is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the generated code can be as long as we want.
in this case indenting this way is really weird. we should indent it this way
"org.apache.spark.unsafe.types.UTF8String.fromString" +
  s"(org.apache.commons.codec.digest.DigestUtils.md5Hex($c))"
| Test build #35210 has finished for   PR 6779 at commit  
 | 
| @rxin I think this is ready to be merged. | 
| Can you fix the indent? LGTM other than that. | 
| Test build #35228 has finished for   PR 6779 at commit  
 | 
| @rxin Done! | 
| Thanks. Merging. | 
No description provided.