Skip to content

Commit d1eb31c

Browse files
committed
Always resize input to zero length in String(::Vector{UInt8})
This makes the behavior more predictable than only resizing Vector{UInt8} inputs when they have been allocated via StringVector, as the caller may have obtained them from other code without knowing how they were created. This ensures code will not rely on the fact that a copy is made in many common cases. The behavior is also simpler to document.
1 parent b0303ca commit d1eb31c

File tree

4 files changed

+33
-18
lines changed

4 files changed

+33
-18
lines changed

NEWS.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -434,6 +434,11 @@ Library improvements
434434
* The function `thisind(s::AbstractString, i::Integer)` returns the largest valid index
435435
less or equal than `i` in the string `s` or `0` if no such index exists ([#24414]).
436436

437+
* `String(array)` now accepts an arbitrary `AbstractVector{UInt8}` and "steals" the
438+
memory buffer of mutable arrays, leaving the byte vector with an empty buffer which
439+
is guaranteed not to be shared with the `String` object; if the byte vector is
440+
immutable, it simply shares memory with the string and is not truncated ([#26093]).
441+
437442
* `Irrational` is now a subtype of `AbstractIrrational` ([#24245]).
438443

439444
* Introduced the `empty` function, the functional pair to `empty!` which returns a new,
@@ -1324,3 +1329,4 @@ Command-line option changes
13241329
[#25745]: https://github.com/JuliaLang/julia/issues/25745
13251330
[#25896]: https://github.com/JuliaLang/julia/issues/25896
13261331
[#25998]: https://github.com/JuliaLang/julia/issues/25998
1332+
[#26093]: https://github.com/JuliaLang/julia/issues/26093

base/strings/string.jl

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,22 @@ const ByteArray = Union{Vector{UInt8},Vector{Int8}}
1616
# String constructor docstring from boot.jl, workaround for #16730
1717
# and the unavailability of @doc in boot.jl context.
1818
"""
19-
String(v::Vector{UInt8})
20-
21-
Create a new `String` from a vector `v` of bytes containing
22-
UTF-8 encoded characters. This function takes "ownership" of
23-
the array, which means that you should not subsequently modify
24-
`v` (since strings are supposed to be immutable in Julia) for
25-
as long as the string exists.
26-
27-
If you need to subsequently modify `v`, use `String(copy(v))` instead.
19+
String(v::AbstractVector{UInt8})
20+
21+
Create a new `String` object from a byte vector `v` containing UTF-8 encoded
22+
characters. If `v` is `Vector{UInt8}` it will be truncated to zero length and
23+
future modification of `v` cannot affect the contents of the resulting string.
24+
To avoid truncation use `String(copy(v))`.
25+
26+
When possible, the memory of `v` will be used without copying when the `String`
27+
object is created. This is guaranteed to be the case for byte vectors returned
28+
by [`take!`](@ref) on a writable [`IOBuffer`](@ref) and by calls to
29+
[`read(io, nb)`](@ref). This allows zero-copy conversion of I/O data to strings.
30+
In other cases, `Vector{UInt8}` data may be copied, but `v` is truncated anyway
31+
to guarantee consistent behavior.
2832
"""
29-
function String(v::Array{UInt8,1})
30-
ccall(:jl_array_to_string, Ref{String}, (Any,), v)
31-
end
33+
String(v::AbstractVector{UInt8}) = String(copyto!(StringVector(length(v)), v))
34+
String(v::Vector{UInt8}) = ccall(:jl_array_to_string, Ref{String}, (Any,), v)
3235

3336
"""
3437
unsafe_string(p::Ptr{UInt8}, [length::Integer])
@@ -64,8 +67,6 @@ unsafe_wrap(::Type{Vector{UInt8}}, s::String) = ccall(:jl_string_to_array, Ref{V
6467

6568
(::Type{Vector{UInt8}})(s::CodeUnits{UInt8,String}) = copyto!(Vector{UInt8}(uninitialized, length(s)), s)
6669

67-
String(a::AbstractVector{UInt8}) = String(copyto!(StringVector(length(a)), a))
68-
6970
String(s::CodeUnits{UInt8,String}) = s.s
7071

7172
## low-level functions ##

src/array.c

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -434,13 +434,14 @@ JL_DLLEXPORT jl_array_t *jl_pchar_to_array(const char *str, size_t len)
434434

435435
JL_DLLEXPORT jl_value_t *jl_array_to_string(jl_array_t *a)
436436
{
437+
size_t len = jl_array_len(a);
437438
if (a->flags.how == 3 && a->offset == 0 && a->elsize == 1 &&
438439
(jl_array_ndims(a) != 1 ||
439-
((a->maxsize + sizeof(void*) + 1 <= GC_MAX_SZCLASS) == (jl_array_len(a) + sizeof(void*) + 1 <= GC_MAX_SZCLASS)))) {
440+
((a->maxsize + sizeof(void*) + 1 <= GC_MAX_SZCLASS) == (len + sizeof(void*) + 1 <= GC_MAX_SZCLASS)))) {
440441
jl_value_t *o = jl_array_data_owner(a);
441442
if (jl_is_string(o)) {
442443
a->flags.isshared = 1;
443-
*(size_t*)o = jl_array_len(a);
444+
*(size_t*)o = len;
444445
a->nrows = 0;
445446
#ifdef STORE_ARRAY_LEN
446447
a->length = 0;
@@ -449,7 +450,12 @@ JL_DLLEXPORT jl_value_t *jl_array_to_string(jl_array_t *a)
449450
return o;
450451
}
451452
}
452-
return jl_pchar_to_string((const char*)jl_array_data(a), jl_array_len(a));
453+
a->nrows = 0;
454+
#ifdef STORE_ARRAY_LEN
455+
a->length = 0;
456+
#endif
457+
a->maxsize = 0;
458+
return jl_pchar_to_string((const char*)jl_array_data(a), len);
453459
}
454460

455461
JL_DLLEXPORT jl_value_t *jl_pchar_to_string(const char *str, size_t len)

test/strings/basic.jl

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,10 @@
33
using Random
44

55
@testset "constructors" begin
6-
@test String([0x61,0x62,0x63,0x21]) == "abc!"
6+
v = [0x61,0x62,0x63,0x21]
7+
@test String(v) == "abc!" && isempty(v)
78
@test String("abc!") == "abc!"
9+
@test String(0x61:0x63) == "abc"
810

911
@test isempty(string())
1012
@test eltype(GenericString) == Char

0 commit comments

Comments
 (0)