Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jun 23, 2025

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Rationale for this change

The parquet-variant tests fail when run as part of verify-release-candidate.sh due to the parquet-testing directory being checked out in a different location

What changes are included in this PR?

Update the test to look at the PARQUET_TEST_DATA environment variable as well

How are these changes tested?

I tested this manually:

# note this is a different name than the submodule:
git clone https://github.com/apache/parquet-testing.git parquet-testing-data
export PARQUET_TEST_DATA=$PWD/parquet-testing-data/data
# checkout my fork
git clone https://github.com/alamb/arrow-rs.git
cd arrow-rs
# This fails on main
git checkout main
cargo test -p parquet-variant
# PASSES on branch with fix
git checkout alamb/fix_variant_tests
cargo test -p parquet-variant

Are there any user-facing changes?

No this is a test only change

@alamb alamb added the development-process Related to development process of arrow-rs label Jun 23, 2025
fn cases_dir() -> PathBuf {
Path::new(env!("CARGO_MANIFEST_DIR"))
// which we expect to point at "../parquet-testing/data"
let env_name = "PARQUET_TEST_DATA";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is largely copied from test_util.rs in arrow

/// Returns the parquest test data directory, which is by default
/// stored in a git submodule rooted at
/// `arrow/parquest-testing/data`.
///
/// The default can be overridden by the optional environment variable
/// `PARQUET_TEST_DATA`
///
/// panics when the directory can not be found.
///
/// Example:
/// ```
/// let testdata = arrow::util::test_util::parquet_test_data();
/// let filename = format!("{}/binary.parquet", testdata);
/// assert!(std::path::PathBuf::from(filename).exists());
/// ```
pub fn parquet_test_data() -> String {
match get_data_dir("PARQUET_TEST_DATA", "../parquet-testing/data") {
Ok(pb) => pb.display().to_string(),
Err(err) => panic!("failed to get parquet data dir: {err}"),
}
}

However I didn't want to add a dependency on arrow for this crate (even for development)

@alamb
Copy link
Contributor Author

alamb commented Jun 24, 2025

Thank you for the review @crepererum

@alamb alamb merged commit b6240b3 into apache:main Jun 24, 2025
14 of 16 checks passed
@alamb alamb deleted the alamb/fix_variant_tests branch June 24, 2025 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of arrow-rs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error verifying parquet-variant crate on 55.2.0 with verify-release-candidate.sh

2 participants