- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 888
Preserve Gif color palettes and deduplicate frame pixels. #2455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| An API diff would be nice (for both the PR and the release notes) to make it obvious what specific API-s are we breaking. | 
| 
 Yep agreed. I'll get on that. Update. Here's the API diff. I've removed the  /// <summary>
/// Provides Gif specific metadata information for the image.
/// </summary>
public class GifMetadata : IDeepCloneable
{
-   /// <summary>
-   /// Gets or sets the length of the global color table if present.
-   /// </summary>
-   public int GlobalColorTableLength { get; set; }
+  /// <summary>
+  /// Gets or sets the global color table, if any.
+  /// </summary>
+  public ReadOnlyMemory<Color>? GlobalColorTable { get; set; }
+  /// <summary>
+  /// Gets or sets the index at the <see cref="GlobalColorTable"/> for the background color.
+  /// The background color is the color used for those pixels on the screen that are not covered by an image.
+  /// </summary>
+    public byte BackgroundColor { get; set; }
}/// <summary>
/// Provides Gif specific metadata information for the image frame.
/// </summary>
public class GifFrameMetadata : IDeepCloneable
{
-    /// <summary>
-    /// Gets or sets the length of the color table.
-    /// If not 0, then this field indicates the maximum number of colors to use when quantizing the
-    /// image frame.
-    /// </summary>
-    public int ColorTableLength { get; set; }
+   /// <summary>
+   /// Gets or sets the local color table, if any.
+   /// </summary>
+   public ReadOnlyMemory<Color>? LocalColorTable { get; set; }
+   /// <summary>
+   /// Gets or sets a value indicating whether the frame has transparency
+   /// </summary>
+   public bool HasTransparency { get; set; }
+   /// <summary>
+   /// Gets or sets the transparency index.
+   /// When <see cref="HasTransparency"/> is set to <see langword="true"/> this value indicates the index within
+   /// the color palette at which the transparent color is located.
+   /// </summary>
+   public byte TransparencyIndex { get; set; }
}/// <summary>
/// Acts as a base class for all image encoders that allow color palette generation via quantization.
/// </summary>
public abstract class QuantizingImageEncoder : ImageEncoder
{
-   /// <summary>
-   /// Gets the quantizer used to generate the color palette.
-   /// </summary>
-   public IQuantizer Quantizer { get; init; } = KnownQuantizers.Octree;
+   /// <summary>
+   /// Gets the quantizer used to generate the color palette.
+   /// </summary>
+   public IQuantizer? Quantizer { get; init; }
} | 
| @SixLabors/core Would it be possible to get some eyes on this please? | 
| My take on vectorizing the TrimTransparentPixels is in #2500 (to be merged into this PR / branch). | 
Hm, I remembered that movemask isn't the fastest, and ptest (TestZ in .NET-terms) is faster but current benchmarks didn't prove this, also Intel's instruction table didn't show any benefit in terms of latency or throughput. Thus simplified that check.
Vectorize TrimTransparentPixels in GifEncoderCore
| @SixLabors/core Can I please get some eyes on this? I'd like to keep moving as I have a huge backlog of work to do. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of recommendations and a few general comments.
| } | ||
| } | ||
|  | ||
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the PR,  but I wonder if our heavy usage of AggressiveInlining has any benefits, especially now that we target modern runtimes only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we need to start reviewing this. Need good tooling to inspect the codegen though.
| Vector128<short> signedMask = AdvSimd.ShiftRightArithmetic(mask.AsInt16(), 7); | ||
| return AdvSimd.BitwiseSelect(signedMask, right.AsInt16(), left.AsInt16()).AsByte(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, nothing to do with the PR but it's a bit messy now that it's unclear for a caller which instruction set is supported in various methods on SimdUtils and Numerics.  Originally public methods of SimdUtils were supposed to work regardless of the instructions and SimdUtils.HwIntrinsics used to contain private implementation details for the System.Runtime.Intrinsics branch. Would be nice to figure out new general rules later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has become a bit of a dumping ground over the years. My plan for v4 is a mass simplification of intrinsics utilizing things like Vector128<T> to allow us to implement better cross chipset approaches. We can do a tidy up at the same time.
| /// The background color is the color used for those pixels on the screen that are not covered by an image. | ||
| /// </summary> | ||
| public int GlobalColorTableLength { get; set; } | ||
| public byte BackgroundColor { get; set; } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be BackgroundColorIndex a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, much better, will change.
| { | ||
| this.WriteGraphicalControlExtension(metadata, transparencyIndex, stream); | ||
|  | ||
| Buffer2DRegion<byte> region = ((IPixelSource)quantized).PixelBuffer.GetRegion(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: do we need the region here? I don't see us slicing down the full buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WriteImageData requires a region as we slice down the buffer in subsequent frames. I'll rewrite it though, so it accepts a buffer and a rectangle.
| { | ||
| stream.Read(this.buffer.Span, 0, GifConstants.NetscapeLoopingSubBlockSize); | ||
| this.gifMetadata!.RepeatCount = GifNetscapeLoopingApplicationExtension.Parse(this.buffer.Span.Slice(1)).RepeatCount; | ||
| this.gifMetadata!.RepeatCount = GifNetscapeLoopingApplicationExtension.Parse(this.buffer.Span[1..]).RepeatCount; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a critical path to care about if Slice is faster. Simpler looking code is better.
| ref Rgb24 localBase = ref MemoryMarshal.GetReference(MemoryMarshal.Cast<byte, Rgb24>(this.currentLocalColorTable!.GetSpan()[..this.currentLocalColorTableSize])); | ||
| for (int i = 0; i < colorTable.Length; i++) | ||
| { | ||
| colorTable[i] = new Color(Unsafe.Add(ref localBase, (uint)i)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not a critical path, I wouldn't go unsafe here and just use span indexers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to avoid the secondary indexer on the raw Rgb24 span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is there a measurable perf gain from avoiding the indexer here? This method is only called once per frame on a relatively small (<256) set of data. On the other hand, each usage of Unsafe reduces readability and may introduce a serious bug in case we accidentally create a buffer overflow as we refactor the code in the future. IMO each usage of Unsafe  should be always justified (=hottest hot path) instead of being the default way of indexing things.
| /// Gets or sets the length of the global color table if present. | ||
| /// Gets or sets the global color table, if any. | ||
| /// </summary> | ||
| public ReadOnlyMemory<Color>? GlobalColorTable { get; set; } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is color resolution different from 24 bits ever expected according to the gif standard? If not, wouldn't it be better to go with ReadOnlyMemory<Rgb24> and avoiding the conversions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't and I did consider using Rgb24 originally but thought that Color gave us greater portability on the API surface. For example, someone could now easily create a pallet quantizer from this color table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we think such use-cases will be common it might be ok, but then I would extend the xml doc describing that GlobalColorTable is 24bit.
For me it still feels better to expose the raw gif data as-is (communicating the limitations on the API surface), and  provide helper API-s to make work with PaletteQuantizer easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment is probably best. When I started to convert, I realized while it meant the decoder was simpler, encoding then required an interim allocation of a Color[] to supply to the palette quantizer which got messy.
| if (this.transparentIndex >= 0 && rgba == default) | ||
| { | ||
| // We have explicit instructions. No need to search. | ||
| index = this.transparentIndex; | ||
| this.cache.Add(rgba, (byte)index); | ||
|  | ||
| if (index >= 0 && index < this.Palette.Length) | ||
| { | ||
| match = Unsafe.Add(ref paletteRef, (uint)index); | ||
| } | ||
| else | ||
| { | ||
| Unsafe.SkipInit(out TPixel pixel); | ||
| pixel.FromScaledVector4(Vector4.Zero); | ||
| match = pixel; | ||
| } | ||
|  | ||
| return index; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now penalizing all quantizers and decoders while only gif is using transparentIndex. Can't we  keep the transparency detection logic in GifDecoderCore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out a way to do it. Given that the slow code is actually the loop through the palette I thought this addition check outside the loop would be ok.
I actually think Bmp will end up using it eventually when we finally expose a decoded palette there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Slow" actually means we run it every time a new value is added to ColorDistanceCache which should be still quite common for an image with a wide original palette. I also don't see an easy way to optimize this, we should probably introduce multiple variants of EuclideanPixelMap<TPixel> and PalettQuantizer<TPixel> in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this and compared to the actual iteration the >=0 check should be barely detectable. The comparison operator won't even kick in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison is indeed cheap, the problem is not with that. We have a lot of code added to the method which may lead to more frequent L1 instruction cache misses. Branches can also hurt CPU pipelining.
Since GetQuantizedColor is being called per-pixel, the method should be as slim as possible, ideally containing no branching except the pixelmap lookup. We should be able to achieve this (and greatly reduce the memory footprint for pixel types without an alpha!) by separating different OctreeQuantizer<T> setups to separate classes.
If we could look into this together with optimization opportunities in ErrorDitherer, we could achieve significant perf improvements in the quantization space, and consequentially in lossless encoders. Not a small amount of work though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah true. I'll not touch this until you have a chance to make changes. We can look at greater optimization opportunities for v4
| index = this.transparentIndex; | ||
| this.cache.Add(rgba, (byte)index); | ||
|  | ||
| if (index >= 0 && index < this.Palette.Length) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that in this branch index == this.transparentIndex >= 0, the index >= 0 condition seems to be redundant.
Regarding the index < this.Palette.Length condition: shouldn't we make this an invariant rule of PaletteQuantizer & EuclideanPixelMap instead, and Guard it in the constructor and SetTransparentIndex?
This would simplify this code a lot, making it also faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. If you have time, could you push to the branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do it later this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, that's much better! Not sure what I was thinking when I wrote mine! 🤣
| @antonfirsov I think I've covered everything now. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the patience!
Prerequisites
Description
This PR does a few things.
Fixes #635
Fixes #2450
Fixes #2198
Touches #2404
The new deduplifier does a comparison of the quantized pixel indexes using the previous and current frame. If duplicates are detected in the current frame, then the indices are replaced with the indez of the background or transparent color. Given the increased repeats this enables better compression.
This functionality is limited to gifs encoded using global color palettes as we would require extra work to detect duplicates in local color tables and produce a map of index offsets which could be cumbersome (not to say this might come in a following PR in the future) .