Skip to content

Conversation

@Marlin-Na
Copy link
Contributor

According to the notes in htslib vcf header file:

Use the returned number of written values for accessing valid entries of dst, as ndst is only a
watermark that can be higher than the returned value, i.e. the end of dst can contain carry-over
values from previous calls to bcf_get_format_*() on lines with more values per sample.

I had a case that when accessing a format field of string type, even there was only one sample, I got two strings returned (albeit the second one was empty). With this change, the issue was fixed.

@coveralls
Copy link

coveralls commented Jun 20, 2023

Pull Request Test Coverage Report for Build 5327586180

  • 31 of 37 (83.78%) changed or added relevant lines in 1 file are covered.
  • 5 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.4%) to 79.758%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/bcf/record.rs 31 37 83.78%
Files with Coverage Reduction New Missed Lines %
src/bcf/record.rs 1 78.83%
src/bam/record.rs 4 76.31%
Totals Coverage Status
Change from base Build 5324511866: 0.4%
Covered Lines: 2439
Relevant Lines: 3058

💛 - Coveralls

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add one or two test cases which show that ret is the right thing to use and also works in case of multiple strings?

@dlaehnemann
Copy link
Member

Another thing I noticed: It seems like we then simply have ret returned once as a usize and once as an i32. And indeed, most of the calls to these internal functions seem to use only one version. And in the other cases, one could simply cast it into the other type. So maybe just return Ok(Some(ret)) as is and then cast it to usize whenever that is necessary?

@Marlin-Na
Copy link
Contributor Author

Looking at the code again. I think maybe an alternative fix is to change

self.data(htslib::BCF_HT_STR).map(|(n, _)| {

in Format reader to use ret instead of n.

The Info reader actually got it correct:

data.map(|(_, ret)| {

@Marlin-Na
Copy link
Contributor Author

But float() and integer() in Format read also got it wrong. Yeah I think we should just use ret instead of n. I will implement the fix @dlaehnemann suggested.

self.data(htslib::BCF_HT_REAL).map(|(n, _)| {

@Marlin-Na
Copy link
Contributor Author

I think this PR is ready for merging.

@johanneskoester johanneskoester merged commit f9a1981 into rust-bio:master Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants