Skip to content

Conversation

@stevengj
Copy link
Member

@stevengj stevengj commented Nov 25, 2020

Closes #36618, using the new utf8proc_islower and utf8proc_isupper functions from utf8proc 2.6 (which we upgraded to in #38551).

Technically breaking, but I'm not sure who would be relying on the slight differences between the old behavior and the Unicode standard definitions?

@stevengj stevengj added unicode Related to unicode characters and encodings minor change Marginal behavior change acceptable for a minor release needs news A NEWS entry is required for this change labels Nov 25, 2020
@stevengj
Copy link
Member Author

Along the way, I noticed that:

  1. Maybe isletter should correspond to the Unicode "Alphabetic" derived property?
  2. titlecase(::String) really needs to conform more closely to the UAX #29's definition of word boundaries. Right now, it can break right in the middle of a grapheme if there are combining characters: titlecase("bôrked") == "BôRked" seems like a bug to me.

@stevengj
Copy link
Member Author

Probably this is too late for 1.6, so I'll wait to add NEWS until the 1.7-dev cycle.

@musm
Copy link
Contributor

musm commented Dec 16, 2020

Probably this is too late for 1.6, so I'll wait to add NEWS until the 1.7-dev cycle.

Since we've branched it sounds like now is a good time to add that.

@stevengj stevengj removed the needs news A NEWS entry is required for this change label Dec 16, 2020
@stevengj
Copy link
Member Author

Fixed the NEWS.

Copy link
Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. This can be squash-merged if you're done with it, @stevengj.

@stevengj stevengj merged commit 17de527 into master Dec 18, 2020
@stevengj stevengj deleted the sgj/islowerupper branch December 18, 2020 15:34
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
* Unicode-compliant islower/uppercase

* don't test isletter for non-L* letters

* include titlecase in alphas test

* add news
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

minor change Marginal behavior change acceptable for a minor release unicode Related to unicode characters and encodings

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make isuppercase and islowercase agree with Unicode standard

4 participants