Skip to content

Conversation

@stevengj
Copy link
Member

@stevengj stevengj commented Jan 14, 2023

The previous implementation of isascii(c::Char), by @StefanKarpinski in #24999, was bswap(reinterpret(UInt32, c)) < 0x80. However, I noticed that base/io.jl is using a nicer trick: c ≤ '\x7f', which oddly enough is also by @StefanKarpinski in #24999. The latter seems to compile to one fewer instruction because it omits the bswap.

@StefanKarpinski, any comment here?

The PR also cleans up io.jl to use isascii(c) rather than c ≤ '\x7f', since they should now be equivalent.

@stevengj stevengj added performance Must go faster strings "Strings!" labels Jan 14, 2023
@matthias314
Copy link
Contributor

The new version looks cleaner, but are you sure that it's faster? On master I've tried

s = String(' ':'~')
@btime all(isascii, $s)

and the run time is the same as before.

@stevengj
Copy link
Member Author

The new version looks cleaner, but are you sure that it's faster?

Not sure; it's really hard to benchmark such a simple function. However, I'm operating on the principle that 1 cmp instruction is better than 1 cmp + 1 bswap.

Also, it's weird to have two different versions of the isascii check in our codebase — seems like we should settle on one, and we might as well settle on the one with fewer instructions.

@stevengj
Copy link
Member Author

Actually, it looks like the c ≤ '\x7f' is simpler because it is simply wrong for malformed Char

julia> c = reinterpret(Char, 0x7f000000 - 0x01)
'\x7e\xff\xff\xff': Malformed UTF-8 (category Ma: Malformed, bad data)

julia> c  '\x7f'
true

julia> isascii(c)
false

So, this actually seems like a bug in io.jl.

I'll close this PR then, change io.jl to use isascii, and add a test.

@stevengj stevengj closed this Jan 15, 2023
@giordano giordano deleted the stevengj-patch-3 branch January 15, 2023 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster strings "Strings!"

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants