- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
bpo-39040: Fix parsing of email mime headers with whitespace between encoded-words. #17620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ed-words. In certain malformed content-disposition headers, parameter values are quoted and split as encoded words on two lines with extra whitespaces. This fixes the issue by removing the extra whitespace between the two encoded words.
        
          
                Misc/NEWS.d/next/Library/2019-12-15-18-47-20.bpo-39040.tKa0Qs.rst
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I forgot to start a review :(
| When you're done making the requested changes, leave the comment:  | 
| I have made the requested changes; please review again. | 
| Thanks for making the requested changes! @bitdancer: please review the changes made to this pull request. | 
| [], | ||
| 'attachment; filename="File Name With Spaces.pdf"', | ||
| ('Content-Disposition: attachment; ' | ||
| 'filename="File Name With Spaces.pdf"\n'), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. We need a test with something like "File =?utf-8?q?Name?= With Spaces.pdf". That should have spaces around Name...we want to make sure we aren't removing spaces around encoded words that aren't next to other encoded words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be looking at the outdated diff, from a previous commit.
There should be 4 commits in the PR, and your requested changes are in the last 2 commits.
| When you're done making the requested changes, leave the comment:  | 
| I have made the requested changes; please review again. | 
| Thanks for making the requested changes! @bitdancer: please review the changes made to this pull request. | 
| Thanks @maxking for the PR, and @bitdancer for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8, 3.9. | 
…encoded-words. (pythongh-17620) * bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string. It is fairly common to find malformed mime headers (especially content-disposition headers) where the parameter values, instead of being encoded to RFC standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and then enclosing the whole thing in quotes. The processing of these malformed headers was incorrectly leaving the spaces between encoded words in the decoded text (whitespace between adjacent encoded words is supposed to be stripped on decoding). This changeset fixes the encoded word processing inside quoted strings (bare-quoted-string) to do correct RFC 2047 decoding by stripping that whitespace. (cherry picked from commit 21017ed) Co-authored-by: Abhilash Raj <[email protected]>
…encoded-words. (pythongh-17620) * bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string. It is fairly common to find malformed mime headers (especially content-disposition headers) where the parameter values, instead of being encoded to RFC standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and then enclosing the whole thing in quotes. The processing of these malformed headers was incorrectly leaving the spaces between encoded words in the decoded text (whitespace between adjacent encoded words is supposed to be stripped on decoding). This changeset fixes the encoded word processing inside quoted strings (bare-quoted-string) to do correct RFC 2047 decoding by stripping that whitespace. (cherry picked from commit 21017ed) Co-authored-by: Abhilash Raj <[email protected]>
| Thanks @maxking for the PR, and @bitdancer for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8. | 
…encoded-words. (gh-17620) * bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string. It is fairly common to find malformed mime headers (especially content-disposition headers) where the parameter values, instead of being encoded to RFC standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and then enclosing the whole thing in quotes. The processing of these malformed headers was incorrectly leaving the spaces between encoded words in the decoded text (whitespace between adjacent encoded words is supposed to be stripped on decoding). This changeset fixes the encoded word processing inside quoted strings (bare-quoted-string) to do correct RFC 2047 decoding by stripping that whitespace. (cherry picked from commit 21017ed) Co-authored-by: Abhilash Raj <[email protected]>
…encoded-words. (gh-17620) * bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string. It is fairly common to find malformed mime headers (especially content-disposition headers) where the parameter values, instead of being encoded to RFC standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and then enclosing the whole thing in quotes. The processing of these malformed headers was incorrectly leaving the spaces between encoded words in the decoded text (whitespace between adjacent encoded words is supposed to be stripped on decoding). This changeset fixes the encoded word processing inside quoted strings (bare-quoted-string) to do correct RFC 2047 decoding by stripping that whitespace. (cherry picked from commit 21017ed) Co-authored-by: Abhilash Raj <[email protected]>
…encoded-words. (gh-17620) * bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string. It is fairly common to find malformed mime headers (especially content-disposition headers) where the parameter values, instead of being encoded to RFC standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and then enclosing the whole thing in quotes. The processing of these malformed headers was incorrectly leaving the spaces between encoded words in the decoded text (whitespace between adjacent encoded words is supposed to be stripped on decoding). This changeset fixes the encoded word processing inside quoted strings (bare-quoted-string) to do correct RFC 2047 decoding by stripping that whitespace. (cherry picked from commit 21017ed) Co-authored-by: Abhilash Raj <[email protected]>
* 'master' of github.com:python/cpython: (497 commits) bpo-40061: Fix a possible refleak in _asynciomodule.c (pythonGH-19748) bpo-40798: Generate a different message for already removed elements (pythonGH-20483) closes bpo-29017: Update the bindings for Qt information with PySide2 (pythonGH-20149) bpo-39885: Make IDLE context menu cut and copy work again (pythonGH-18951) bpo-29882: Add an efficient popcount method for integers (python#771) Further de-linting of zoneinfo module (python#20499) bpo-40780: Fix failure of _Py_dg_dtoa to remove trailing zeros (pythonGH-20435) Indicate that abs() method accept argument that implement __abs__(), just like call() method in the docs (pythonGH-20509) bpo-39040: Fix parsing of email mime headers with whitespace between encoded-words. (pythongh-17620) bpo-40784: Fix sqlite3 deterministic test (pythonGH-20448) bpo-30064: Properly skip unstable loop.sock_connect() racing test (pythonGH-20494) Note the output ordering of combinatoric functions (pythonGH-19732) bpo-40474: Updated coverage.yml to better report coverage stats (python#19851) bpo-40806: Clarify that itertools.product immediately consumes its inpt (pythonGH-20492) bpo-1294959: Try to clarify the meaning of platlibdir (pythonGH-20332) bpo-37878: PyThreadState_DeleteCurrent() was not removed (pythonGH-20489) bpo-40777: Initialize PyDateTime_IsoCalendarDateType.tp_base at run-time (pythonGH-20493) bpo-40755: Add missing multiset operations to Counter() (pythonGH-20339) bpo-25920: Remove socket.getaddrinfo() lock on macOS (pythonGH-20177) bpo-40275: Fix test.support.threading_helper (pythonGH-20488) ...
In certain malformed content-disposition headers, parameter values are quoted
and split as encoded words on two lines with extra whitespaces. This fixes the
issue by removing the extra whitespace between the two encoded words.
https://bugs.python.org/issue39040