Skip to content

Conversation

@jakobnissen
Copy link
Member

This commit intentionally omits expanding the documentation of mark etc., which although related, is a somewhat different mechanism.

I'm interested in discussion about what the right behaviour is:

  • Here, I documented seek is able to seek past the end of the IO, if the IO is seekable, in which case writing to it will fill in zero bytes. But I don't actually like this. It feels like that API is mixing up the behaviour of truncate and seek. It really feels like this should be an error. It also makes the behaviour of skip more awkward, because implementing this by reading a bunch of bytes no longer corresponds to seeking. It also creates a weird mismatch between reading and writing IOs. IMO, the congruence with libc's lseek is not enough to warrant this. IMO this should throw an exception when the seeking goes out of bounds in either direction.

  • This PR kind of chickens out and doesn't go ahead and say that position should return a zero-based Int. Having it be more generic is nice for e.g. BGZF's virtualoffset, but it also makes it harder to use. Probably, IOs will not document what their position returns, so in practice, users will just rely on it returning a zero-based Int without it being documented.

  • A comment from Jameson suggests position's return value should support arithmetic. That makes sense, since it then allows skip and relative seeking. However, the only non-integer position type I know of, BGZF's virtual offset, does not support arithmetic. So it feels like documenting this gets the worst of both worlds: It neither enables custom position types, nor do you get the convenience of knowing the return value is a zero-indexed Int

TODO

  • Discuss the points above and reach a conclusion
  • Make sure Base's types actually follow this behaviour. They currently behave inconsistently.

Closes #57618
Closes #57639

This commit intentionally omits expanding the documentation of `mark` etc.,
which although related, is a somewhat different mechanism.
@jakobnissen jakobnissen added needs decision A decision on this change is needed io Involving the I/O subsystem: libuv, read, write, etc. labels Apr 6, 2025
@jakobnissen jakobnissen changed the title RFC: Expand documentation of seek, position` and related functions RFC: Expand documentation of seek, position and related functions Apr 6, 2025
@vtjnash
Copy link
Member

vtjnash commented Apr 10, 2025

LGTM. My understanding is that BGZF implementation does not provide position (since it is annoying to implement), but if it did, it would not be based on virtualoffset but rather would follow this documentation

The `pos` argument should be a value that can be returned by `position(s)`.
Seeking before the first position should throw an error. Seeking after the
final position should throw an error if the stream is not writable.
If the stream is writable, reading will throw an error, and writing will fill
Copy link
Member

@nhz2 nhz2 Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be okay for read to return an empty vector instead of throwing an error. That is how IOStream, and GZipStream currently behave after seeking past the end.

Seek a stream relative to the current position.
This is equivalent to `seek(s, position(s) + offset)`, except where seeking would
seek beyond the end of the stream, where `skip` will throw an `EOFError`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to leave the case of skip beyond the end of the file as implementation-defined behavior.
Allowing skip to go beyond the end of the stream to match the behavior of IOStream in TranscodingStreams is possible, but I think it would require keeping track of a bunch of additional states like zlib does in https://github.com/madler/zlib/blob/5a82f71ed1dfc0bec044d9702463dbdf84ea3b71/gzguts.h#L188-L196, basically making skip lazy and then running any requested skips on every read. https://github.com/madler/zlib/blob/5a82f71ed1dfc0bec044d9702463dbdf84ea3b71/gzread.c#L276-L281

Co-authored-by: Jameson Nash <[email protected]>
@nsajko nsajko added the docs This change adds or pertains to documentation label Jul 24, 2025
nsajko added a commit to nsajko/julia that referenced this pull request Sep 21, 2025
The intention is to add just a minimal doc string. Other PRs attempt to
document the interface, such as:

* JuliaLang#41291

* JuliaLang#58024

* JuliaLang#59069
oscardssmith pushed a commit that referenced this pull request Sep 23, 2025
The intention is to add just a minimal doc string. Other PRs attempt to
document the interface, such as:

* #41291

* #58024

* #59069

---------

Co-authored-by: Jakob Nybo Nissen <[email protected]>
xal-0 pushed a commit to xal-0/julia that referenced this pull request Sep 30, 2025
The intention is to add just a minimal doc string. Other PRs attempt to
document the interface, such as:

* JuliaLang#41291

* JuliaLang#58024

* JuliaLang#59069

---------

Co-authored-by: Jakob Nybo Nissen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs This change adds or pertains to documentation io Involving the I/O subsystem: libuv, read, write, etc. needs decision A decision on this change is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Semantics of position Semantics of seek

4 participants