Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 63 additions & 1 deletion base/regex.jl
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,46 @@ in a string using an `AbstractPattern`.
"""
abstract type AbstractMatch end

"""
A type representing a single match to a `Regex` found in a string.
Typically created from the [`match`](@ref) function.

The `match` field stores the substring of the entire matched string.
The `captures` field stores the substrings for each capture group, indexed by number.
To index by capture group name, the entire match object should be indexed instead,
as shown in the examples.
The location of the start of the match is stored in the `offset` field.
The `offsets` field stores the locations of the start of each capture group,
with 0 denoting a group that was not captured.

This type can be used as an iterator over the capture groups of the `Regex`,
yielding the substrings captured in each group.
Because of this, the captures of a match can be destructured.
If a group was not captured, `nothing` will be yielded instead of a substring.

Methods that accept a `RegexMatch` object are defined for [`iterate`](@ref),
[`length`](@ref), [`eltype`](@ref), [`keys`](@ref keys(::RegexMatch)), [`haskey`](@ref), and
[`getindex`](@ref), where keys are the the names or numbers of a capture group.
See [`keys`](@ref keys(::RegexMatch)) for more information.

# Examples
```jldoctest
julia> m = match(r"(?<hour>\\d+):(?<minute>\\d+)(am|pm)?", "11:30 in the morning")
RegexMatch("11:30", hour="11", minute="30", 3=nothing)

julia> hr, min, ampm = m
RegexMatch("11:30", hour="11", minute="30", 3=nothing)

julia> hr
"11"

julia> m["minute"]
"30"

julia> m.match
"11:30"
```
"""
struct RegexMatch <: AbstractMatch
match::SubString{String}
captures::Vector{Union{Nothing,SubString{String}}}
Expand All @@ -153,6 +193,28 @@ struct RegexMatch <: AbstractMatch
regex::Regex
end

"""
keys(m::RegexMatch) -> Vector

Returns a vector of keys for all capture groups of the underlying regex.
A key is included even if the capture group fails to match.
That is, `idx` will be in the return value even if `m.captures[idx] == nothing`.

Unnamed capture groups will have integer keys corresponding to their index.
Named capture groups will have string keys.

!!! compat "Julia 1.7"
This method was added in Julia 1.7

# Examples
```jldoctest
julia> keys(match(r"(?<hour>\\d+):(?<minute>\\d+)(am|pm)?", "11:30"))
3-element Vector{Any}:
"hour"
"minute"
3
```
"""
function keys(m::RegexMatch)
idx_to_capture_name = PCRE.capture_names(m.regex.regex)
return map(eachindex(m.captures)) do i
Expand Down Expand Up @@ -275,7 +337,7 @@ end
"""
match(r::Regex, s::AbstractString[, idx::Integer[, addopts]])

Search for the first match of the regular expression `r` in `s` and return a `RegexMatch`
Search for the first match of the regular expression `r` in `s` and return a [`RegexMatch`](@ref)
object containing the match, or nothing if the match failed. The matching substring can be
retrieved by accessing `m.match` and the captured sequences can be retrieved by accessing
`m.captures` The optional `idx` argument specifies an index at which to start the search.
Expand Down
2 changes: 2 additions & 0 deletions doc/src/base/strings.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ Base.isvalid(::Any, ::Any)
Base.isvalid(::AbstractString, ::Integer)
Base.match
Base.eachmatch
Base.RegexMatch
Base.keys(::RegexMatch)
Base.isless(::AbstractString, ::AbstractString)
Base.:(==)(::AbstractString, ::AbstractString)
Base.cmp(::AbstractString, ::AbstractString)
Expand Down
6 changes: 3 additions & 3 deletions doc/src/manual/strings.md
Original file line number Diff line number Diff line change
Expand Up @@ -801,7 +801,7 @@ else
end
```

If a regular expression does match, the value returned by [`match`](@ref) is a `RegexMatch`
If a regular expression does match, the value returned by [`match`](@ref) is a [`RegexMatch`](@ref)
object. These objects record how the expression matches, including the substring that the pattern
matches and any captured substrings, if there are any. This example only captures the portion
of the substring that matches, but perhaps we want to capture any non-blank text after the comment
Expand Down Expand Up @@ -882,10 +882,10 @@ julia> m.offsets
```

It is convenient to have captures returned as an array so that one can use destructuring syntax
to bind them to local variables:
to bind them to local variables. As a convinience, the `RegexMatch` object implements iterator methods that pass through to the `captures` field, so you can destructure the match object directly:

```jldoctest acdmatch
julia> first, second, third = m.captures; first
julia> first, second, third = m; first
"a"
```

Expand Down