Skip to content

Commit aa626e1

Browse files
alambetseidlalbertlockett
authored
[Parquet] Add ParquetMetadataPushDecoder (#8080)
# Which issue does this PR close? - Part of #8000 - Closes #8164 # Rationale for this change Metadata is needed when implementing a push decoder for Parquet: - #7983 If we want to truly separate IO and CPU we also need a way to decode the metadata without explicit IO, and hence this PR that provides a way to decode metadata "push style" where it tells you what bytes are needed. It follows the same API as the parquet push decoder This PR also introduces some of the common infrastructure needed in the parquet push decoder # What changes are included in this PR? 1. Add `PushBuffers` to hold byte ranges 2. Add `DecodeResult` to communicate back to the caller 3. Add `ParquetMetaDataPushDecoder` for decoding metadata # Are these changes tested? Yes, there are several fully working doc tests that show how to use this API # Are there any user-facing changes? There is a new API --------- Co-authored-by: Ed Seidl <[email protected]> Co-authored-by: albertlockett <[email protected]>
1 parent 7e38bbb commit aa626e1

File tree

7 files changed

+785
-4
lines changed

7 files changed

+785
-4
lines changed

parquet/src/errors.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,9 @@ pub enum ParquetError {
5252
/// Returned when a function needs more data to complete properly. The `usize` field indicates
5353
/// the total number of bytes required, not the number of additional bytes.
5454
NeedMoreData(usize),
55+
/// Returned when a function needs more data to complete properly.
56+
/// The `Range<u64>` indicates the range of bytes that are needed.
57+
NeedMoreDataRange(std::ops::Range<u64>),
5558
}
5659

5760
impl std::fmt::Display for ParquetError {
@@ -69,6 +72,9 @@ impl std::fmt::Display for ParquetError {
6972
}
7073
ParquetError::External(e) => write!(fmt, "External: {e}"),
7174
ParquetError::NeedMoreData(needed) => write!(fmt, "NeedMoreData: {needed}"),
75+
ParquetError::NeedMoreDataRange(range) => {
76+
write!(fmt, "NeedMoreDataRange: {}..{}", range.start, range.end)
77+
}
7278
}
7379
}
7480
}

parquet/src/file/metadata/mod.rs

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,10 @@
4040
//! metadata into parquet files. To work with metadata directly,
4141
//! the following APIs are available:
4242
//!
43-
//! * [`ParquetMetaDataReader`] for reading
43+
//! * [`ParquetMetaDataReader`] for reading from a reader for I/O
44+
//! * [`ParquetMetaDataPushDecoder`] for decoding from bytes without I/O
4445
//! * [`ParquetMetaDataWriter`] for writing.
4546
//!
46-
//! [`ParquetMetaDataReader`]: https://docs.rs/parquet/latest/parquet/file/metadata/struct.ParquetMetaDataReader.html
47-
//! [`ParquetMetaDataWriter`]: https://docs.rs/parquet/latest/parquet/file/metadata/struct.ParquetMetaDataWriter.html
4847
//!
4948
//! # Examples
5049
//!
@@ -92,6 +91,7 @@
9291
//! * Same name, different struct
9392
//! ```
9493
mod memory;
94+
mod push_decoder;
9595
pub(crate) mod reader;
9696
mod writer;
9797

@@ -120,6 +120,7 @@ use crate::schema::types::{
120120
};
121121
#[cfg(feature = "encryption")]
122122
use crate::thrift::{TCompactSliceInputProtocol, TSerializable};
123+
pub use push_decoder::ParquetMetaDataPushDecoder;
123124
pub use reader::{FooterTail, PageIndexPolicy, ParquetMetaDataReader};
124125
use std::ops::Range;
125126
use std::sync::Arc;

0 commit comments

Comments
 (0)