Skip to content

Conversation

tompecina
Copy link
Contributor

When creating files using the stamper-mode of PdfDocument, I found out that many PDFs generated were complained of by the qpdf tool as having erroneous XREF data. Indeed, the XREF dictionary had a discrepancy between /Index and /Size. I fixed it by recreating sections after creating and adding the XREF stream.

@mkl-public
Copy link
Contributor

Can you share an example input file and some code that allows to reproduce the issue? Probably there is some issue in the original file, in your code, or in qpdf. If there actually is an issue in iText, that example file and code could also be the base of a unit test for the fix.

@tompecina
Copy link
Contributor Author

Please clone https://github.com/tompecina/bug1 to reproduce the bug, then follow README.txt. The original file is ok, approx. 50% of all files I've checked out behave equally. Moreover, the mismatch between the /Index and /Size entries is clearly against the PDF specs. Therefore, we can rule out a bug in my code and/or in qpdf.

The fix could be better than just rebuilding the xref table after the stream is added, but it should be done by the the author of the original code.

@tompecina
Copy link
Contributor Author

This has to do with font embedding in append mode. Clone https://github.com/tompecina/bug2 for details on reproducing the bug.

@mkl-public
Copy link
Contributor

So essentially the bug 1 is that the cross reference stream does not contain a reference to its own object number but considers that object number used in the trailer Size entry.
This does not only violate the Size specification (The number one greater than the highest object number used in this section or in any section for which this shall be an update.) but also another, later clarification in the specification: Like any stream, a cross-reference stream shall be an indirect object. Therefore, an entry for it shall exist in either a cross-reference stream (usually itself) or in a cross-reference table (in hybrid-reference files).
Thus, this indeed is a bug and needs to be fixed.

Instead of re-creating the sections, as proposed here, one could also try and fix the existing sections list to optimize runtime characteristics. On the other hand the proposal uses a method that is known to work while an attempt to fix the sections could introduce bugs.
Thus, I'd merge the proposal as is.

@mkl-public
Copy link
Contributor

Bug 2 is another one of those situations in which marking a changed object as updated for append mode is forgotten because that object quite often is direct.

I'd merge the proposal as is.

@introfog
Copy link
Contributor

introfog commented Jun 8, 2020

Hi @tompecina , thanks for your PR!

We reviewed your pull request the part of the changes you proposed (namely, setting the modified flag to resourceCategory in the append mode) has been merged as is, while the second part with the cross-reference stream was implemented a bit differently to avoid re-creating the sections.

Anyway, thanks for the input and sorry for the delay.

@Snipx Snipx closed this Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants