-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Fix #24214: Extended isvalid() for SubString #24231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test/strings/basic.jl
Outdated
| @test isvalid(String, UInt8[0xfe, 0x80, 0x80, 0x80, 0x80, 0x80]) == false | ||
| #issue #24214 Check valid SubString | ||
| @test isvalid(lstrip(" ghjki")) == true | ||
| @test isvalid(String(UInt8[0xfe, 0x80, 0x80, 0x80, 0x80, 0x80]), 1,2) == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you forgot SubString(.
|
|
|
@Vaibhavdixit02 I would also to recommend to update the docstrings of |
| # 2: valid UTF-8 | ||
|
|
||
| isvalid(::Type{String}, s::Union{Vector{UInt8},String}) = byte_string_classify(s) != 0 | ||
| isvalid(::Type{String}, s::Union{Vector{UInt8},String,SubString{String}}) = byte_string_classify(s) != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vaibhavdixit02 I would recommend to add isvalid(<:SubString{String}) definition and update an appropriate docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that required? I am not familiar with any subtype of SubString{String}, are there any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right - there are none. I meant that single argument isvalid should be defined as:
isvalid(s::Union{String,SubString{String}}) = isvalid(String, s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and that this change should be also reflected in docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that in current implementation it seems that without explicitly telling String even Vector{UInt8} returns false so is this extension something that would be added or is it not in line with isvalid()'s required features? @stevengj perhaps you could give your view too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that isvalid(::Vector{UInt8}) would now throw method error and since isvalid(::Type{String}, ::Vector{UInt8}) is allowed I would also allow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isvalid(::Vector{UInt8}) would be really confusing. There's way more things Vector{UInt8} can be other than a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isvalid(SubString{String}) should work though, right? @yuyichao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isvalid(String, ::Vector{UInt8}) should continue to work. I agree that isvalid(::Vector{UInt8}) makes no sense.
base/strings/utf8proc.jl
Outdated
| Returns `true` if the given value is valid for that type. Types currently can | ||
| be either `Char` or `String`. Values for `Char` can be of type `Char` or [`UInt32`](@ref). | ||
| Values for `String` can be of that type, or `Vector{UInt8}`. | ||
| Values for `String` can be of that type, or `Vector{UInt8}`, or `SubString`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vaibhavdixit02 This should be SubString{String} as generic strings are not supported - right?
|
@Vaibhavdixit02 One more thing (I do not know why Github has eaten this comment twice): |
|
Better create a |
|
@nalimilan agreed. @Vaibhavdixit02 on 0.6 |
base/strings/utf8proc.jl
Outdated
| Returns `true` if the given value is valid for its type, which currently can be either | ||
| `Char` or `String`. | ||
| `Char` or `String` or `SubString` (it is considered a valid `String` type here). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or SubString{String}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what "(it is considered a valid String type here)" means. SubString is not a String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that SubString is not a String but what I meant to say here was that here we get a true on isvalid() for SubString because we are kind of treating them as String of sorts. I guess the wording is a little misleading but just saying "Char or String or SubString{String}." would be a little inappropriate too in my opinion because the entire reason of the issue and this pull request was that SubString{String} should be treated as valid in isvalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's indeed something weird with isvalid(String, ss::SubString). Maybe it should be isvalid(AbstractString, ss::SubString). Then it would be more natural for SubString to be a supported type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree or else if SubString was a subtype of String it would make more sense. I wrote the docs thinking something along those lines to explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc should be SubString{String} as I have suggested above. Additionally the code should remain String not AbstractString.
Actually the documentation it is good as:
Returns `true` if the given value is valid for its type, which currently can be either
`Char` or `String` or `SubString{String}`.
Why? We are checking if s::SubString{String} is valid for its type (this is what documentation states) and it is valid if String(s) would be a valid String as it is a substing of String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense.
|
I think everything is fixed now? |
|
Should this PR get the 1.0 Tag? It fixes a break for code that works on 0.6 |
|
Bug fixes are not release-blocking, but if this gets updated and passes tests, we can merge it. |
|
I know this isn't a blocker issue but neither are all tagged 1.0 issues (e.g. #12902). And the test errors is something not easy to understand for me. For this PR the first error says *** This error is usually fixed by running and looking at the PR's list, at least in the first page all PRs are marked as failing the tests. |
|
Not sure what's up with the tests. #12902 is on the milestone, trivial though it is, because changing a name is breaking. Milestone tags are not a priority assignment. |
|
Hey, does it seem like this PR will be merged? |
|
Yes, this is good to have it just fell off my radar. Can you rebase it on top of #24999? |
|
(Note that in general you should do a forced push to update the existing PR rather than opening a new PR.) |

Fixes #24214.