Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion python/pyspark/sql/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@
#

import py4j
import sys

if sys.version >= '3':
unicode = str


class CapturedException(Exception):
Expand All @@ -24,7 +28,11 @@ def __init__(self, desc, stackTrace):
self.stackTrace = stackTrace

def __str__(self):
return repr(self.desc)
desc = self.desc
if isinstance(desc, unicode):
return str(desc.encode('utf-8'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin, you are right and I misread the codes. We need to

  • unicode in Python 2 => u.encode("utf-8").
  • others in Python 2 => return str(s).
  • others in Python 3 => return str(s).

Root cause for #17267 (comment) looks because encode on string (also same as unicode in Python 2) in Python 3 produces 8-bit bytes, b"...", (also same as normal string, "..." and b"...", where b is ignored, in Python 2). And str function works differently as below:

Python 2

>>> str(b"aa")
'aa'
>>> b"aa"
'aa'

Python 3

>>> str(b"aa")
"b'aa'"
>>> "aa"
'aa'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I previously thought str works like Python2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zero323 and @davies too. Would you have some time to take a look for this one? This is a typical annoying problem between unicode and byte strings. There are many similar PRs (at least I can identify few PRs trying to handle this problem. One good example might help resolving other PRs too.

else:
return str(desc)


class AnalysisException(CapturedException):
Expand Down