Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

package org.apache.spark.sql.catalyst.util

import java.lang.StringBuilder
import java.util.regex.{Pattern, PatternSyntaxException}

import org.apache.spark.unsafe.types.UTF8String
Expand All @@ -27,21 +28,28 @@ object StringUtils {
// replace the % with .*, match 0 or more times with any character
def escapeLikeRegex(v: String): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: while you are at it, can you also make the var names better ?

v -> input
c -> currentChar
prev -> previousChar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could further simplify this by replacing previousChar with a boolean, nextCharacterIsEscaped (or inEscape).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm wrong: we can't quite do that simplification because the old code has a subtle bug related to backslash-escaping. Due to the complexity and terseness the old implementation, it's a little non-obvious to spot that the case (prev, '\\') => "" has the effect of always ignoring backslash characters, so this method is incapable of producing a backslash in its output. This is a problem if the user wants to write a LIKE pattern to match backslashes then this is impossible with the current code.

It turns out that this is covered by #15398, which also implements performance improvements for this code, so I guess this PR and JIRA is redundant :(

I thought #15398 had been merged / fixed by now, but I guess not.

if (!v.isEmpty) {
"(?s)" + (' ' +: v.init).zip(v).flatMap {
case (prev, '\\') => ""
case ('\\', c) =>
c match {
case '_' => "_"
case '%' => "%"
case _ => Pattern.quote("\\" + c)
}
case (prev, c) =>
c match {
case '_' => "."
case '%' => ".*"
case _ => Pattern.quote(Character.toString(c))
}
}.mkString
val sb = new StringBuilder("(?s)")
var prev = ' '
for (c <- v) {
val out = (prev, c) match {
case (prev, '\\') => ""
case ('\\', c) =>
c match {
case '_' => "_"
case '%' => "%"
case _ => Pattern.quote("\\" + c)
}
case (prev, c) =>
c match {
case '_' => "."
case '%' => ".*"
case _ => Pattern.quote(Character.toString(c))
}
}
prev = c
sb.append(out)
}
sb.toString()
} else {
v
}
Expand Down