Skip to content

Commit 72555b9

Browse files
committed
Add parsing of PDF titles
This commit attempts to parse the title metadata in a PDF file if it exists, otherwise it fallbacks to printing only the media type and size. We use the `pdf` crate to parse all of the response body (for some reason metadata and table of contents are kept at the end of PDFs) and then ask the PDF for the title that may be defined in the "[info dictionary]". [info dictionary]: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf
1 parent b4175be commit 72555b9

File tree

5 files changed

+273
-12
lines changed

5 files changed

+273
-12
lines changed

Cargo.lock

Lines changed: 217 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ stderrlog = "0.5.1"
5151
atty = "0.2.14"
5252
scraper = { version = "0.12.0", default-features = false, features = [] }
5353
phf = "0.7.24"
54+
pdf = "0.7.1"
5455

5556
[dependencies.image]
5657
version = "0.22.5"

0 commit comments

Comments
 (0)