Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JpegDecoder: post-process baseline spectral data per MCU-row #1597

Closed
antonfirsov opened this issue Apr 14, 2021 · 35 comments · Fixed by #1694
Closed

JpegDecoder: post-process baseline spectral data per MCU-row #1597

antonfirsov opened this issue Apr 14, 2021 · 35 comments · Fixed by #1694

Comments

@antonfirsov
Copy link
Member

antonfirsov commented Apr 14, 2021

Currently, Huffmann decoding (done by HuffmanScanDecoder) is strictly separated from postprocessing/color conversion (done by JpegImagePostProcessor) for simplicity. This means that JpegComponent.SpectralBlocks are allocated upfront for the whole image.

I did a second round of memory profiling using SimpleGcMemoryAllocator to get rid of pooling for more auditable results. This shows that SpectralBlocks are responsible for the majority of our memory allocation:

image

This can be eliminated with some non-trivial, but still limited refactoring:

  • JpegDecoderCore and HuffmanScanDecoder needs a mode where JpegComponent.SpectralBlocks is interpreted as a sliding window of blocks instead of full set of decoded spectral blocks
  • HuffmanScanDecoder can then push the rows of the sliding window, directly calling an instance of JpegComponentPostprocessor in the end of it's MCU-row decoding loop
@JimBobSquarePants
Copy link
Member

I can definitely get behind this. As I recall looking at other libraries they were working on a per MCU process but I think per MCU row is fine.

It's a shame this optimization is only limited to sequential jpegs but is definitely worth the effort.

@antonfirsov
Copy link
Member Author

antonfirsov commented Apr 16, 2021

I hope to be able to have a look at this within ~3 weeks, unless someone else wants to take it earlier.

@JimBobSquarePants
Copy link
Member

@br3aker is this something you'd be interested in? You've been doing amazing work in the encoder!

@br3aker
Copy link
Contributor

br3aker commented Jun 14, 2021

@JimBobSquarePants I can definately get behind this after PR with that deBruijn table I was talking about at memory allocator PR.

@br3aker
Copy link
Contributor

br3aker commented Jun 20, 2021

Did a little study to understand decoder architecture. Long story short: code explosion due to generic TPixel type propagation. Current workflow for jpeg encoder:

Decode
    ...
    ParseStream<TPixel>
        ...
        Parse Start of Frame // resulting image size is known - allocate Buffer2D instead of in post process step
        Parse Start of Scan(s) // that's where the problem lies - there's no way convert spectral data to TPixel without generics
        ...
    ...
    PostProcessIntoImage<TPixel> // spectral -> YCbCr -> Rgba -> TPixel

Converting spectral data to YCbCr and then to Rgba is a piece of cake as we know for sure that supported jpeg contains spectral data which would yield YCbCr colorspace values which should be converted to Rgba for future colorspace conversion - no generics needed. Rgba -> TPixel is done via PixelOperations<TPixel>.Instance.FromVector4Destructive(...) call which is:

  1. Impossible to get without making entire call stack generic
  2. I guess any PixelOperations<TPixel>.Instance virtual calls are devirtualized, at least in jpeg encoder it's not a bottleneck

There's no need to convert full mcu row as PostProcessor converts them piece by piece so it's unlikely to bring any performance benefits. Moreover, 4:2:0 needs to process two rows at the same time, so 2 full rows of Block8x8 must be allocated. I think that per block decoding directly to the Image is the best approach here.

While machine code size can bloat at runtime for each pixel type decoded from jpeg, I don't think that majory of users would use more than 1-3 pixel types they want images to decode to.

@JimBobSquarePants @antonfirsov I might overlooked something but I'm almost confident that this is the only way, can you elaborate on the final decision?

@antonfirsov
Copy link
Member Author

antonfirsov commented Jun 20, 2021

@br3aker I think we can solve this with a double-dispatch trick by implementing IImageVisitor somewhere inside JpegImagePostProcessor. After initializing it with an Image instance, you can define a non- generic method to consume the spectral data, and call image.Accept(visitor) to delegate the implementation to a generic method. You can then call this non-generic method from HuffmanScanDecoder. In the end, the generic IImageVistor.Visit<T> implementation will be very similar to what PostProcess<T>(imageFrame) does.

The recommendation to go MCU row by MCU-row is mostly to avoid the overhead of virtual calls. Note that virtual methods on PixelOperations<TPixel> can not be devirtualized, calling them with block granularity would be inefficient (same for the color converters).

@br3aker
Copy link
Contributor

br3aker commented Jun 20, 2021

Yep, was a bit delusional about the power of the JIT :D. Thanks for pointing that out.

I though as TPixel is a struct, jit would compile IL to an exclusive implementation for exact TPixel type. Completely forgot that PixelOperations<TPixel> is the base class, not an actual implementation.

Won't be able to work for a couple of days but will definitely work on this, thanks for the double dispatch advice!

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Jun 21, 2021

I think that per block decoding directly to the Image is the best approach here.
@br3aker Fairly certain libjpeg turbo etc do it one block at a time for baseline. I don't know if there's an optimization we can do for progressive.

@br3aker
Copy link
Contributor

br3aker commented Jun 21, 2021

@JimBobSquarePants I meant one block at a time but for a bulk of mcus at the same time so it would eliminate virtual call overhead:

foreach stride in image:
    foreach mcu in stride:
        rgbaMcu = mcu.FromSpectral().ToRgba();
        SomeMagicClass.ConvertToImage(rgbaMcu); // virtual call convertion per each 8x8 mcu in image

vs.

rgbaStride = allocator.Alloc(mcusPerStride);
foreach stride in image:
    foreach mcu in stride:
        rgbaMcuStride[i] = mcu.FromSpectral().ToRgba();
    SomeMagicClass.ConvertToImage(rgbaMcuStride); // virtual call convertion per each mcu stride in image

The only problem here is memory allocation for mcu stride, especially for 420 subsampling as it proccesses more mcus per decoding unit (4 luma + 1 chroma) per actual deconding to spectral data. Maybe it's better to process in more granular bulks depending on some allocator size like it was 2MB last discussion but it doesn't matter that much atm so we can discuss it later when at least mvp implementation is ready.

Encoder actually has the same problem, it calls PixelOperations<TPixel>.Instance.Convert(,,,) for each mcu in image, I'll benchmark it for bulk conversion after decoder stuff, maybe it's still possible to squeez some more performance out of it.

@antonfirsov
Copy link
Member Author

antonfirsov commented Jun 21, 2021

rgbaMcu = mcu.FromSpectral().ToRgba();
SomeMagicClass.ConvertToImage(rgbaMcu);

This is not how it works in our decoder. We have spectral -> YCbCr (or other) -> Vector4 -> TPixel conversion chains, where YCbCr (or other) -> Vector4 is done with optimized bulk color converters:

https://github.com/SixLabors/ImageSharp/tree/master/src/ImageSharp/Formats/Jpeg/Components/Decoder/ColorConverters

#1121 has a very detailed description of the pipeline, and refactor plans. I prefer doing things in smaller steps, and I think #1597 can be fixed without major changes on the pipeline.

I strongly recommend the per-stride approach, because it's gonna be a less intrusive change, and the virtual methods will process data in larger bulks. @JimBobSquarePants I assume libjpeg-turbo doesn't have many virtual calls in their pipeline, it's less modular, doing things in big, super-optimized, hardcoded steps. We can switch to this kind of approach in a future integer-only pipeline, but I'm quite sure that with our current architecture it's gonna be suboptimal.

The only problem here is memory allocation for mcu stride, especially for 420 subsampling as it proccesses more mcus per decoding unit (4 luma + 1 chroma) per actual deconding to spectral data.

I don't think it's a problem. We need to look at the big picture, the memory allocated by a trio of MCU rows (or even row-groups) is a small friction of what the whole Image<T> is consuming.

In short:
I recommend to implement the smallest possible change that fixes this issue. This will save you time, and makes it easy to review. The simplest way to achieve this is to:

  1. Add a variant of JpegImagePostProcessor.PostProcess, that's (1) non-generic (2) works with a push model (3) on smaller granularity.

  2. Introduce a switch that - when turned on - tells to HuffmannScanDecoder and to PostProcessor to interpret IJpegComponent.SpectralBlocks as a small sliding window of spectral MCU rows, instead of the spectral data of the entire image. This will also enable a lot of code sharing between the implementation of PostProcess<T> and the new push method.

@br3aker
Copy link
Contributor

br3aker commented Jun 21, 2021

This is not how it works in our decoder. We have spectral -> YCbCr (or other) -> Vector4 -> TPixel conversion chains, where YCbCr (or other) -> Vector4 is done with optimized bulk color converters:

https://github.com/SixLabors/ImageSharp/tree/master/src/ImageSharp/Formats/Jpeg/Components/Decoder/ColorConverters

#1121 has a very detailed description of the pipeline, and refactor plans. I prefer doing things in smaller steps, and I think #1597 can be fixed without major changes on the pipeline.

My example pseudo code was purely to illustrate why suggested per-block conversion is bad. I've read through all the current decoding pipeline and actually have a question: what's the point of PostProcessor?

Don't get me wrong, I understand that ScanDecoder does Stream -> Spectral and then PostProcessor does Spectral -> YCbCr -> Rgba32 -> TPixel but what's the point of doing one extra step for post processing if eveything you need is done in ScanDecoder?

Imo, it'll be more clear to do everything in HuffmanScanDecoder like this:

  • Baseline
// DecodeMcuTripletStrideFromStream: Parse stream -> Decode to spectral -> IDCT -> YCbCr
foreach mcuTripletStride in DecodeMcuTripletStrideFromStream(i):
    // converts YCbCr -> Rgba32 -> TPixel
    // 1 virtual call per stride - good enough, even if that would be a bottleneck for some reason,
    // this solution scales well for any number of strides per conversion call
    PixelConverter.ConvertTo<TPixel>(mcuTripletStride, imageRawMemory[i]);
  • Progressive
// Parse all scans -> Decode to spectral -> IDCT -> YCbCr
entireSpectralTripletData = DecodeEntireImageData();
foreach mcuTripletStride in entireSpectralTripletData;
    // converts YCbCr -> Rgba32 -> TPixel
    // 1 virtual call per stride - good enough, even if that would be a bottleneck for some reason,
    // this solution scales well for any number of strides per conversion call
    PixelConverter.ConvertTo<TPixel>(mcuTripletStride, imageRawMemory[i]);

If I understood you correctly, your approach is the same but it quits stream parsing and fetches mcu strides during post-process step so there's no difference in theory. In practice all of that spectral data is scattered among JpegFrame, JpegComponent, JpegComponentPostProcessor and PostProcessor itself. A little too complicated for simple pixel conversion isn't it?

And with my proposed solution everything is done within HuffmanScanDecoder<TPixel> class which inherits HuffmanScanDecoder and provides virtual method to bulk convert decoded data directly into the Image<TPixel>. Not gonna lie, it'll require some refactoring to be done but it'll greatly reduce code complexity caused by post processor as everything will be decoded during scan decoding logic without any other classes/containers/etc and I think it's worth the effort. I've already done part of it, I'm sure it won't take long to finish.

And one last thing - jpeg encoder is done the way I've proposed solution for the decoder. It would be a mirror copy to already existing internal API.

@antonfirsov
Copy link
Member Author

antonfirsov commented Jun 21, 2021

The purpose is separation of concerns & a bit historical. In #274 we replaced the entire Huffman coding logic without touching the (already SIMD-optimized) color conversion by porting part of pdf.js's jpeg decoder. We did this by defining JpegScanDecoder as a thing responsible to extract the spectral data from a stream, and JpegImagePostprocessor as the next step responsible for converting the spectral data to an Image<T>. We even have tests that validate JpegScanDecoder against libjpeg's spectral decoding logic.

I'm fine with an architectural change here, but with YCbCr -> Rgba32 -> TPixel (instead of YCbCr -> Vector4 -> TPixel) you really suggest to solve #1121 and #1597 in one big step, including complete reimplementation of the color converters. Note that Vector4 is also historical here: Vector4 and Vector<float> is the only way to get efficient SIMD for targets lower than .NET Core 3.0.

We don't care about .NET Framework performance anymore, so if you have the motivation, and the time to replace the dinosaur stuff with something better and faster, I'm not gonna be in the way. Although it might worth to consider doing things in more granular steps, and at least keep the float conversion logic in the first refactor PR. I leave it up to you :)

@br3aker
Copy link
Contributor

br3aker commented Jun 22, 2021

Thanks for your time explaning how current architecture was put together, interesting stuff!

As entire YCbCr -> TPixel conversion would be done somewhere inside HuffmanScanDecoder class it should be easy to swap underlying implementation to the one implemented in #1121 with reimplementation of color converters.

So, transition could be something like this:

  1. HuffmanScanDecoder is filled with parameters from various markers (it's controlled via JpegFrame right now which holds both raw decoded data and decoding parameters), decodes and passes Image<TPixel> to the post processing step
  2. HuffmanScanDecoder and PostProcessor unification

While step 1 might seem pointless - it would prepare internal decoder API for step 2 without changing actual spectral decoding logic.

@JimBobSquarePants
Copy link
Member

Question. Would we ever consider adding support for arithmetic encoding (rare and currently unsupported)

@br3aker
Copy link
Contributor

br3aker commented Jun 22, 2021

Not sure if anyone would ever use arithmetic encoding, it's a lot slower and does not provide much storage optimization. For example, image specific huffman tree (optimize_coding flag in libjpeg) would produce a smaller image for about 5%:

image

Note that decoder already supports any huffman tree, the only thing left is to implement it in the encoder which is not that complicated and won't change anything architecture-wise (would provide public flag to the public API though). And it will affect only encoder performance as you need to traverse entire image before actually encoding it. Decoder should not suffer from performance penalty, I would even dare to say that it should be a bit faster to decode optimized image than the default one.

I can't provide such numbers for arithmetic coding but I've heard it to be somewhat ~10%-12% in the best case scenario compared to huffman.
Even if somebody would ever want to implement it for the sake of why-not-ness and education - it can be implemented without much of a change to the internal decoder API, it would add one extra flag and arithmetic decoding method itself.

@JimBobSquarePants
Copy link
Member

Makes sense to me thanks! That’s a good link you’ve shared. I keep meaning to look up the tables used in mozjpeg as I’m sure I heard that they deviated from the standard.

@br3aker
Copy link
Contributor

br3aker commented Jun 22, 2021

I heard that too, they even had some experiments on quantization tables like this, for example: mozilla/mozjpeg#76

And this paper: https://www.semanticscholar.org/paper/Extending-RD-OPT-with-global-thresholding-for-JPEG-Ratnakar-Livny/42beaa7703ea3a7292801b648c63dda6f651c277?p2df

Feeling like a kid in the candy store to be honest, want to work on everything at the same time 😄

@antonfirsov
Copy link
Member Author

antonfirsov commented Jun 23, 2021

@br3aker regarding #1597 (comment), I want to see a more detailed plan on the way you are going to change the spectral => TPixel conversion pipeline in step 2. Do you want to go full integer and reimplement IDCT, color conversion, packing etc, or rather keep (& refactor) the current pieces?

With a full reimplementation, we are talking about several dozens of working hours IMO. For example #1462 is a tiny step contributing to #1121, which alone took one evening + one full day. I really want to avoid blocking #1597 on an entire reimplementation of the conversion chain, and keep these two things separate.

(Not talking about the PostProcessor, HuffmannScanDecoder, PixelConverter or whatever classes that put the pieces together, but rather the theoretical pipeline and the individual pieces in the way described in #1121's opening comment.)

@br3aker
Copy link
Contributor

br3aker commented Jun 24, 2021

@antonfirsov sorry for the late responce, had some really busy days.

I won't alter conversion pipeline with this issue fix. Let's define a pseudo workflow:

Now

// Scan decoder
spectralRows = new[TotalBlocks];
foreach mcuRow in Image:
    spectralRows[i] = DecodeToSpectral(mcuRow);
// Post processor
foreach row in spectralRows:
    ImageRows[i] = DecodeFromSpectral<TPixel>(row);

This is the workflow for any type of jpeg, either progressive or baseline, interleaved or not. DecodeFromSpectral<TPixel>(row) works with rows, it's a 'current' pipeline described in #1121. Pseudo code processes single rows for simplicity, in reality it works with a number of rows at the same time.

Proposal

// Scan decoder - Baseline
spectralRow = new[RowSizeInBlocks]
foreach mcuRow in Image:
    DecodeToSpectral(from: mcuRow, dest: spectralRow);
    ImageRows[i] = DecodeFromSpectral<TPixel>(spectralRow);
// Scan decoder - Progressive
spectralRows = new[RowCount]
foreach scan in Image:
    foreach mcuRow in scan:
        DecodeToSpectralProgressive(from: mcuRow, dest: spectralRows[i])

foreach row in spectralRows:
    ImageRows[i] = DecodeFromSpectral<TPixel>(row);

Basically, nothing changed. Right now there are 2 steps:

  1. Code -> Spectral for entire image for each scan
  2. Spectral -> TPixel (spectral to pixel means full chain from Speed up JPEG Decoder color conversion #1121).

Proposed approach does it in 1 single step for baseline:

  1. Code -> Spectral -> TPixel for each row in image (which would solve GC problem as it would decode and convert image row by row)

For progressive nothing would change:

  1. Code -> Spectral for entire image for each scan
  2. Spectral -> TPixel for entire image

To be honest, I thought it to be a lot easier to do but it's a lot of work. Not abandonning this just saying it would take some more time :P

P.S.
For non-interleaved jpegs algorithm would be same as for progressive jpeg in terms of memory usage & 2 step conversion.

P.P.S.
Losing my mind due to really hot weather this summer so I might not cover something important, will respond asap if you would have any more questions.

@antonfirsov
Copy link
Member Author

@br3aker thanks for clarifying!

Looks like I misunderstood you because of the line YCbCr -> Rgba32 -> TPixel in your previous comment, which implied integer algorithms for me. (We have YCbCr (planar) -> Vector4 (of RGBA) -> TPixel now.)

Losing my mind due to really hot weather this summer so I might not cover something important, will respond asap if you would have any more questions.

I totally understand this, there was 35 ℃ a up until yesterday here, was frustrated and brainless all the time :)

To be honest, I thought it to be a lot easier to do but it's a lot of work. Not abandonning this just saying it would take some more time

Take whatever time is needed, the issue is officially in the "Future" milestone for a reason :)

Your plan looks good to me now conceptually. However when it comes to code style, I still think that it's worth to implement DecodeFromSpectral<TPixel>(spectralRow) in a separate class, because color conversion is a concern that is very well separable from Huffmann decoding, and I believe it's easier to navigate multiple types (each having clear responsibilities) than a monster class that has 1000's of lines.

Such a trick can also prevent turning HuffmannScanDecoder generic. For baseline jpeg, it would go like this:

abstract class SpectralToImageConverter
{
     // Add all the marker paramters which are needed for decoding in this type instead of HuffmannScanDecoder

    public abstract void DecodeFromSpectral(spectralRow, rowIndex);
}

class SpectralToImageConverter<TPixel> : SpectralToImageConverter
{
    private Image<TPixel> image;
    
    public override void DecodeFromSpectral(spectralRow, rowIndex){
        // You can use TPixel now
        image[rowIndex] = DecodeImpl(spectralRow);
    }
}

class HuffmannScanDecoder
{
    // Use the non-generic base type:
    private SpectralToImageConverter spectralConverter;

    public DecodeBaseline()
    {
        spectralRow = new[RowSizeInBlocks]
        foreach mcuRow in Image:
            DecodeToSpectral(from: mcuRow, dest: spectralRow);

            spectralConverter.DecodeFromSpectral(spectralRow, i);
    }
}

class JpegDecoderCore
{
    private SpectralToImageConverter spectralConverter;

    public void Decode<TPixel>()
    {
        Image<TPixel> image = ...;
        // ...
        this.spectralConverter = new SpectralToImageConverter<TPixel>(image, ...);
        
        ParseStream();
    }
    
    private void ProcessStartOfScan()
    {
        HuffmannScanDecoder scanDecoder = HuffmannScanDecoder(this.spectralConverter, ...);
        scanDecoder.DecodeBaseline();
    }
}

What do you think?

@br3aker
Copy link
Contributor

br3aker commented Jun 24, 2021

@antonfirsov actually I had almost the same plan but for HuffmanScanDecoder<TPixel> which implements existing HuffmanScanDecoder with all decoding methods set to abstract which in fact would lead to a monstrous class. I like your idea, thanks!

@br3aker
Copy link
Contributor

br3aker commented Jun 29, 2021

@antonfirsov encountered a little dilemma:

For baseline jpeg files proposed conversion is straightforward as it's known when spectral stride for each component is ready.
For progressive it's s bit tricky as progressive can define separate scans for each component so if(spectralEnd == 63) /* Convert spectral to color here */ won't work. There are two solutions:

public Image<TPixel> Decode<TPixel>(BufferedReadStream stream, CancellationToken cancellationToken)
    where TPixel : unmanaged, IPixel<TPixel>
{
    // this is still WIP but final variant would look somewhat like this
    var specificConverter = new SpectralToImageConverter<TPixel>(this.Configuration);
    this.spectralConverter = specificConverter;

    this.ParseStream(stream, cancellationToken: cancellationToken);
    this.InitExifProfile();
    this.InitIccProfile();
    this.InitIptcProfile();
    this.InitDerivedMetadataProperties();

    // this looks out of place to be honest
    if (/* This jpeg is progressive */)
    {
        specificConverter.ConvertFullScan();
    }
    
    return new Image<TPixel>(this.Configuration, this.Metadata, new[] { specificConverter.ImageFrame });
}

Another solution is to comit spectral data to the converter before returning from ParseStream() method.

While both solutions look a bit 'ugly' it's the most performant way of checking are all scans done?. What do you think?

P.S.
Yes, Image<TPixel> creation looks ugly, I will open a separate PR for new ctor from single frame soon.

@antonfirsov
Copy link
Member Author

@br3aker I like the plan with the pseudo Decode<TPixel> method. I don't think there is a way to avoid this complexity (or "ugliness"), since it comes from the jpeg standard itself.

@br3aker
Copy link
Contributor

br3aker commented Jun 29, 2021

@antonfirsov I will hide if-check in property getter for visual clarity then. Thanks for the responce!

@br3aker
Copy link
Contributor

br3aker commented Jul 7, 2021

A little update on this:

I redid a lot of code and screwed something up and I couldn't find out why in a couple of hours + got a new idea which should be a little more understandable.

First of all, resulting image whould be constructed from Buffer2D<TPixel> buffer. This would introduce a new Image<TPixel> constructor but it would be less intrusive. @antonfirsov not sure what to do in with #1680, implement pixel buffer ctor there or within this issue PR?

Second, I've decided to implement this as an enumerable collection of spectral strides:

// There would actually be some wrapping class for strides 
// so it would store all components' spectral strides in a single object
foreach(Buffer2D<Block8x8> spectralStride in scanDecoder)
{
    // spectral -> vector4
    Buffer2D<Vector4> colorBuffer = ConvertFromSpectalToVector4(spectralStride);

    // vector4 -> TPixel
    PixelBuffer[i] = ConvertFromVector4<TPixel>(colorBuffer);
}

For every decoding mode it won't change anything (except there's won't be extra virtual call for each stride conversion) but for baseline dct it would allow to deffer stream parsing stride by stride.

@antonfirsov
Copy link
Member Author

@antonfirsov not sure what to do in with #1680, implement pixel buffer ctor there or within this issue PR?

If you have a working PR, that would be a good trigger to push things into a decision, otherwise it's just endless bike shadding API discussions 😄

Second, I've decided to implement this as an enumerable collection of spectral strides.
foreach(Buffer2D<Block8x8> spectralStride in scanDecoder)

I wonder how does this work with ProcessStartOfScan? Does SOS kick of the code that iterates through the enumerable stuff, reading the stream further internally? What if there are more SOS-s?

@JimBobSquarePants
Copy link
Member

What if there are more SOS-s?

We definitely need to cater for that since that's how progressive jpegs work.

I would open a draft PR where we can discuss the actual implementation.

@br3aker
Copy link
Contributor

br3aker commented Jul 8, 2021

I wonder how does this work with ProcessStartOfScan? Does SOS kick of the code that iterates through the enumerable stuff, reading the stream further internally? What if there are more SOS-s?

That's not that hard to determine actually. Multiple SOS markers can exist only in:

  1. progressive jpegs - can be checked via SOF2 marker existence, current internal API already has it
  2. non-interleaved jpegs - can be checked if given scan component count is not equal to 'global' component count defined in SOF segment

In other words:

if (!this.Frame.IsProgressive && this.Frame.ComponentCount == scanComponentCount) 
{
    // this SOS must be the only one, any extra is an error and can be checked after spectral decoding
    this.scanDecoder.Baseline = true;

    // we can return true to signal that we are ready for spectral conversion
    return true;
}

// decodes current partial scan to the pre-allocated spectral buffer
this.scanDecoder.DecodeScan();

// for more consistent behaviour we can actually evaluate if multi-sos jpeg is done
// via spectralEnd == 63 for progressive jpegs 
// via processedScans == this.Frame.ComponentCount for non-inteleaved baseline jpegs
return lastScanCondition;

There's a problem if given jpeg has anything after SOS except EOI and we can actually check even that - we can call ParseStream() one more time.

@br3aker
Copy link
Contributor

br3aker commented Jul 8, 2021

Nevermind, current architecture & code is not in a good shape for my plan, I'll try to work on it later. Priority right now:

  1. Working PR fixing this issue with least code change possible
  2. PR for refactoring (a lot of decoupling needed for decoder core & scan decoder) and micro-optimizations (there are some good places to cut off couple of ms)
  3. (possible) PR for Enumerable idea

Sorry for this rapid change of ideas & messages, yesterday's discard of an almost working code knocked me hard. Will try to work out an implementation in a couple of days.

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Jul 9, 2021

yesterday's discard of an almost working code knocked me hard

No need to apologize and I feel you mate. At some point I need to replay months of optimization code I wrote for a zlib stream implementation because at some point I broke it but have no idea when. 😞

Really looking forward to seeing what you come up with!

@br3aker
Copy link
Contributor

br3aker commented Jul 11, 2021

@antonfirsov @JimBobSquarePants sorry for bothering but I have a little problem

PR is actually ready with almost all tests passing. The only problem is these tests for baseline jpegs:

image

These tests check if spectral data is equal to libjpeg spectral data of the given image. This approach is impossible as spectral data is discarded deep inside scan decoding process. Progressive and multi-scan baselines can be tested simply because they use the same technique as before the PR.

Question: Do we even need to test spectral data? We can compare final colors which would be invalid if spectral is invalid. Yes, it's a couple layers 'higher' but right now it's impossible to test. Only if Enumerable approach will be implemented - spectral strides would be given out by virtual method so it would be possible to inject testing code.

P.S.
jpeg400jfif.jpg must fail - it's a known bug, don't mind it.

@antonfirsov
Copy link
Member Author

We can compare final colors which would be invalid if spectral is invalid.

The good thing about verifying spectral data is that the spectral intermediate result is exact, while for color conversion small deviations are allowed. It's important to be able to catch cases when a small difference is a result of a Huffman-decoding bug and not a floating point inaccuracy. We faced and fixed such issues while finalizing #274, and while I don't expect a refactor of that volume anytime soon, the tests can be still handy for Huffman decoding optimizations, so I prefer to keep them in long term.

On the other hand, this should not block progress, so my recommendation is to temporarily disable them, and re-enable when the Enumerable refactor is done. @JimBobSquarePants agreed?

@br3aker
Copy link
Contributor

br3aker commented Jul 11, 2021

@antonfirsov skipping tests now seems like an easy plan but you know...

I will slightly alter current PR architecture to enable testing without major changes so it won't rely on 'somewhat possible enumerable implementation in the future'.

@JimBobSquarePants
Copy link
Member

@br3aker

I will slightly alter current PR architecture to enable testing without major changes so it won't rely on 'somewhat possible enumerable implementation in the future'

Not sure I follow what you mean here? Do you mean that this is not an issue now? I'm happy to temporarily disable baseline tests for now simply to see a difference.

@br3aker
Copy link
Contributor

br3aker commented Jul 11, 2021

Not sure I follow what you mean here? Do you mean that this is not an issue now? I'm happy to temporarily disable baseline tests for now simply to see a difference.

It is a problem because I wanted to change as little code as possible for this PR so you guys won't spend too much time reviewing. These tests fix would result in a bigger change than necessary for PR to work. I will disable them and push a draft PR then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants