Lossless JPEG decoding #193

vsaase · 2021-07-31T10:58:55Z

Hi,

this adds support for lossless decoding. As this is a special need mostly in medical imaging I tried to touch as little code as possible.
Performance is a bit low at the moment, help would be appreciated here.

If you think this is too special a use case I might also publish it in its own crate.

197g

I'm pretty happy with this in total, except that the numerical part of prediction isn't very SIMD friendly (which is probably the reason for the slowness). This is necessarily a breaking change due to the additional pixel formats that are being introduced which makes it really easy to justify the MSRV bump. Just one question, can you specify precisely where the 1.40 requirement comes from?

examples/decode.rs

197g · 2021-07-31T12:23:14Z

src/decoder/lossless.rs

+        &mut self,
+        frame: &FrameInfo,
+        scan: &ScanInfo,
+    ) -> Result<(Option<Marker>, Option<Vec<Vec<u16>>>)> {


I can't see any return path making use of Option<Vec<_>> here. In any case, when the types become this complicated I found it better to create a very simple struct and be it just for the ability to properly document each portion of the return type as field documentation.

It's the same return type the decode_scan function in decoder.rs has. However I now removed the Option around Vec<Vec<u16>>.

Gotcha, I see why it was initially chosen. Although, this increases the benefit of having a new type defined for this instance since it could be shared between methods.

197g · 2021-07-31T12:25:44Z

src/decoder/lossless.rs

+                for (i, component) in components.iter().enumerate() {
+                    let dc_table = self.dc_huffman_tables[scan.dc_table_indices[i]]
+                        .as_ref()
+                        .unwrap();
+                    let value = huffman.decode(reader, dc_table)?;
+                    let diff = match value {
+                        0 => 0,
+                        1..=15 => huffman.receive_extend(reader, value)? as i32,
+                        16 => 32768,
+                        _ => {
+                            // Section F.1.2.1.1
+                            // Table F.1
+                            return Err(Error::Format(
+                                "invalid DC difference magnitude category".to_owned(),
+                            ));
+                        }
+                    };
+                    if mcu_x > 0 {
+                        ra[i] = results[i][mcu_y * frame.image_size.width as usize + mcu_x - 1];
+                    }
+                    if mcu_y > 0 {
+                        rb[i] = results[i][(mcu_y - 1) * frame.image_size.width as usize + mcu_x];
+                        if mcu_x > 0 {
+                            rc[i] = results[i]
+                                [(mcu_y - 1) * frame.image_size.width as usize + (mcu_x - 1)];
+                        }
+                    }
+                    let prediction = predict(
+                        ra[i] as i32,
+                        rb[i] as i32,
+                        rc[i] as i32,
+                        scan.predictor_selection,
+                        scan.point_transform,
+                        frame.precision,
+                        mcu_x,
+                        mcu_y,
+                        self.restart_interval > 0
+                            && mcus_left_until_restart == self.restart_interval - 1,
+                    );
+                    let result = ((prediction + diff) & 0xFFFF) as u16; // modulo 2^16
+                    results[i].push(result << scan.point_transform);
+                }


This is probably the cause of the slowness because there are countless checked indices here and none of these computations can be effectively transformed to SIMD loops. We can work on this later though but a comment would make it clear we're aware of it.

thanks, I added a comment. There is probably some redundancy in checking the indices and the selection of the predictor. However I think this prediction method is generally not well suited to parallelization, since almost all computations depend on the immediate previous results when iterating in the standard way. There might be some potential performance gain when iterating diagonally (computing diagonals with mcu_x + mcu_y = const), as predictions are done from top left to bottom right

I'm pretty confident that we can exploit instruction level parallelism already if the prediction loop was on its own, and the selection of algorithm happened before. Note that for some of the predictor paths we effectively ignore 2-3 of the values passed in. In other words, those are useless loads and they occupy register space.

Additionally, this interleaving of decoding and computation does not mix too well with bounds checks since it adds extra complication for the compiler to notice that all accesses are in bounds. If the length in the bounds check of access to result[i] was a constant by that point this would be easier. See here for a similar change that had up 30% throughput gains.

There are some predictors that parallelize well, such as Rb and Rc which only depends on the previous line that is finished completely beforehand. Lastly, a data dependency doesn't rule out the use of simd entirely although by this point it would probably need to be done by hand instead of relying on auto-vectorization. In images with large enough lines (greater than SIMD width) only the input ra changes. We can calculate the prediction of a chunk of pixels at once using an assumed input of 0 and then iterate pixels and, once we know the true input, update the prediction. This still makes some of the computation parallel. For example in RaRbRc? predictors the update is simply delaying the add of ra. This should provide additional throughput in total.

Thanks for the ideas. The by far most common predictor seems to be Ra (default output of DCMTK library JPEG encoder). On the first column it is always Rb. Thus we can first compute the first column and then in parallel compute all the rows. I see 45% better performance with this.

src/decoder/lossless.rs

tests/reftest/images/lossless/LICENSE

tests/reftest/mod.rs

vsaase · 2021-08-01T13:22:55Z

Just one question, can you specify precisely where the 1.40 requirement comes from?

This is not related to my changes. When running cargo check in the master branch with the current version 1.34.2 under Windows I get this error in crossbeam-utils, which is a dependency of rayon:
error[E0161]: cannot move a value of type dyn std::ops::FnOnce() + std::marker::Send: the size of dyn std::ops::FnOnce() + std::marker::Send cannot be statically determined --> C:\Users\vsaas\.cargo\registry\src\gitproxy.zycloud.tk-1ecc6299db9ec823\crossbeam-utils-0.8.5\src\thread.rs:449:44

I thought I saw also some issues with 1.39, but that was maybe related to an old Cargo.lock file.
Now the lowest working version seems to be 1.36, I changed it to that.

CHANGELOG.md

src/decoder/lossless.rs

197g

Regarding version changes we'll do a big bump to 1.48 soon. I'm marking this as approved. There may be some other changes in the next few days and depending on order they will create a merge conflict with this one. But don't worry, me or another maintainer should take care of them—just leave the fork open and we should have push access to that specific branch.

Lossless JPEG decoding

vsaase added 25 commits July 25, 2021 15:44

initial

bd7cbd3

fix explicit panic

9984e28

no decoding error

e099592

diffing jpg and png example works

7ea804f

diffing jpeg and png works

6823490

lossless prediction decoding

fb2c126

use public png lib

4535363

tests pass

1919855

generalize png output in 16bit case

eeba223

isize to usize

881cb6f

comparison in hex editor seems fine

4005520

use u16

5958a08

tests passing

1146cca

full precision calculations

56383b1

all predictors working

527f935

tests pass for lossless RGB

3ac2364

make coding process public

ef52153

cleanup

e9dc16e

cleanup

9b69bef

decode returns big endian for 16bit jpeg

53f7c05

changelog and error on other precisions

d28dad0

lossless specific image computing

fa2e735

use u16 only for lossless

a4d33e6

clean

f6c94fe

changelog

ec505b6

197g reviewed Jul 31, 2021

View reviewed changes

vsaase added 3 commits July 31, 2021 20:36

resolving comments to pull request

e7f05cc

change MRSV to 1.36

ac55952

change MRSV to 1.36

ce56298

vsaase added 6 commits August 2, 2021 08:39

native endian output for 16bit decoding

8223775

also change to native endian in tests

6a032d6

add benchmark

9627436

performance regression

770a5a2

better to iterate columns in inner loop

a904088

cleanup

d719b19

fintelia reviewed Aug 2, 2021

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

src/decoder/lossless.rs Outdated Show resolved Hide resolved

vsaase added 4 commits August 3, 2021 08:26

fix comment

6add112

update workflow to 1.36.0

246d1b9

remove unused imports

a094090

fix wrong vec initialization

4e18d8d

197g mentioned this pull request Sep 23, 2021

Move most uses of the std crate to core and alloc #196

Merged

197g approved these changes Sep 23, 2021

View reviewed changes

197g merged commit 413869b into image-rs:master Oct 5, 2021

Shnatsel mentioned this pull request May 13, 2022

Lossless JPEG #191

Closed

wartmanm pushed a commit to wartmanm/jpeg-decoder that referenced this pull request Sep 15, 2022

Merge pull request image-rs#193 from vsaase/lossless

85e1a53

Lossless JPEG decoding

Shnatsel mentioned this pull request Mar 15, 2023

Replace jpeg_decoder with zune-jpeg image-rs/image#1877

Closed

5 tasks

Enet4 mentioned this pull request Sep 25, 2023

Admit more precision values in JPEG Lossless #272

Merged

Lossless JPEG decoding #193

Lossless JPEG decoding #193

Uh oh!

Conversation

vsaase commented Jul 31, 2021

Uh oh!

197g left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

197g Jul 31, 2021

Choose a reason for hiding this comment

Uh oh!

vsaase Jul 31, 2021

Choose a reason for hiding this comment

Uh oh!

197g Jul 31, 2021

Choose a reason for hiding this comment

Uh oh!

197g Jul 31, 2021

Choose a reason for hiding this comment

Uh oh!

vsaase Jul 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

197g Jul 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vsaase Aug 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vsaase commented Aug 1, 2021

Uh oh!

Uh oh!

Uh oh!

197g left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vsaase Jul 31, 2021 •

edited

Loading

197g Jul 31, 2021 •

edited

Loading

vsaase Aug 1, 2021 •

edited

Loading