Skip to content

Conversation

@Vaibhavdixit02
Copy link
Contributor

@Vaibhavdixit02 Vaibhavdixit02 commented Oct 20, 2017

Fixes #24214.

@KristofferC KristofferC added needs tests Unit tests are required for this change strings "Strings!" labels Oct 20, 2017
@test isvalid(String, UInt8[0xfe, 0x80, 0x80, 0x80, 0x80, 0x80]) == false
#issue #24214 Check valid SubString
@test isvalid(lstrip(" ghjki")) == true
@test isvalid(String(UInt8[0xfe, 0x80, 0x80, 0x80, 0x80, 0x80]), 1,2) == false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you forgot SubString(.

@bkamins
Copy link
Member

bkamins commented Oct 21, 2017

SubString does not have to be valid in general as checking this would introduce too much performance overhead (and SubString should be a light wrapper around AbstractString).
The check that is performed only makes sure that the first and last index passed to SubString constructor pass isvalid test (and the indices between first and last can be anything).

@bkamins
Copy link
Member

bkamins commented Oct 21, 2017

@Vaibhavdixit02 I would also to recommend to update the docstrings of isvalid in utf8proc.jl.

# 2: valid UTF-8

isvalid(::Type{String}, s::Union{Vector{UInt8},String}) = byte_string_classify(s) != 0
isvalid(::Type{String}, s::Union{Vector{UInt8},String,SubString{String}}) = byte_string_classify(s) != 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vaibhavdixit02 I would recommend to add isvalid(<:SubString{String}) definition and update an appropriate docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that required? I am not familiar with any subtype of SubString{String}, are there any?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right - there are none. I meant that single argument isvalid should be defined as:

isvalid(s::Union{String,SubString{String}}) = isvalid(String, s)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and that this change should be also reflected in docstrings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Got it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that in current implementation it seems that without explicitly telling String even Vector{UInt8} returns false so is this extension something that would be added or is it not in line with isvalid()'s required features? @stevengj perhaps you could give your view too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that isvalid(::Vector{UInt8}) would now throw method error and since isvalid(::Type{String}, ::Vector{UInt8}) is allowed I would also allow it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isvalid(::Vector{UInt8}) would be really confusing. There's way more things Vector{UInt8} can be other than a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isvalid(SubString{String}) should work though, right? @yuyichao

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isvalid(String, ::Vector{UInt8}) should continue to work. I agree that isvalid(::Vector{UInt8}) makes no sense.

Returns `true` if the given value is valid for that type. Types currently can
be either `Char` or `String`. Values for `Char` can be of type `Char` or [`UInt32`](@ref).
Values for `String` can be of that type, or `Vector{UInt8}`.
Values for `String` can be of that type, or `Vector{UInt8}`, or `SubString`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vaibhavdixit02 This should be SubString{String} as generic strings are not supported - right?

@bkamins
Copy link
Member

bkamins commented Oct 21, 2017

@Vaibhavdixit02 One more thing (I do not know why Github has eaten this comment twice):
I think that you need also to define isvalid(<:SubString{String}) (and add an appropriate docstring update). Probably using Union, as now isvalid(lstrip(" sadfs ")) will fail. Right?

@Vaibhavdixit02
Copy link
Contributor Author

It seems that I am getting a String type from lstrip in my system for some reason.

And why should isvalid(lstrip(" sadfs ")) fail? I mean the type returned by isvalid(lstrip(" sadfs")) and isvalid(lstrip(" sadfs ")) are same, right? I am not able recreate the problem in my system.
kufdsk

@nalimilan
Copy link
Member

Better create a SubString explicitly than calling lstrip, as the implementation could change and then the test wouldn't serve its purpose anymore.

@bkamins
Copy link
Member

bkamins commented Oct 21, 2017

@nalimilan agreed.

@Vaibhavdixit02 on 0.6 lstrip returns String but this has been changed in 0.7.

Returns `true` if the given value is valid for its type, which currently can be either
`Char` or `String`.
`Char` or `String` or `SubString` (it is considered a valid `String` type here).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or SubString{String}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what "(it is considered a valid String type here)" means. SubString is not a String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that SubString is not a String but what I meant to say here was that here we get a true on isvalid() for SubString because we are kind of treating them as String of sorts. I guess the wording is a little misleading but just saying "Char or String or SubString{String}." would be a little inappropriate too in my opinion because the entire reason of the issue and this pull request was that SubString{String} should be treated as valid in isvalid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's indeed something weird with isvalid(String, ss::SubString). Maybe it should be isvalid(AbstractString, ss::SubString). Then it would be more natural for SubString to be a supported type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree or else if SubString was a subtype of String it would make more sense. I wrote the docs thinking something along those lines to explain it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc should be SubString{String} as I have suggested above. Additionally the code should remain String not AbstractString.

Actually the documentation it is good as:

 Returns `true` if the given value is valid for its type, which currently can be either
`Char` or `String` or `SubString{String}`.

Why? We are checking if s::SubString{String} is valid for its type (this is what documentation states) and it is valid if String(s) would be a valid String as it is a substing of String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

@Vaibhavdixit02
Copy link
Contributor Author

I think everything is fixed now?

@joa-quim
Copy link

Should this PR get the 1.0 Tag? It fixes a break for code that works on 0.6

@StefanKarpinski
Copy link
Member

Bug fixes are not release-blocking, but if this gets updated and passes tests, we can merge it.

@joa-quim
Copy link

I know this isn't a blocker issue but neither are all tagged 1.0 issues (e.g. #12902). And the test errors is something not easy to understand for me. For this PR the first error says

*** This error is usually fixed by running make clean. If the error persists, try make cleanall. ***

and looking at the PR's list, at least in the first page all PRs are marked as failing the tests.

@StefanKarpinski
Copy link
Member

Not sure what's up with the tests. #12902 is on the milestone, trivial though it is, because changing a name is breaking. Milestone tags are not a priority assignment.

@Vaibhavdixit02
Copy link
Contributor Author

Hey, does it seem like this PR will be merged?

@StefanKarpinski
Copy link
Member

Yes, this is good to have it just fell off my radar. Can you rebase it on top of #24999?

@Vaibhavdixit02
Copy link
Contributor Author

I have created a new PR #25285 because of the changes in #24999, I should close this PR now I think?

@stevengj stevengj closed this Dec 29, 2017
@stevengj
Copy link
Member

(Note that in general you should do a forced push to update the existing PR rather than opening a new PR.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs tests Unit tests are required for this change strings "Strings!"

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants