Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose liblzma's 'preset_dict' feature #6

Merged
merged 1 commit into from
Dec 18, 2020

Conversation

anall
Copy link
Contributor

@anall anall commented Dec 15, 2020

DO NOT MERGE

This is a mostly hacked together [not even minimally viable, just enough to get it to work] implementation to see if this is something you are willing to consider adding to this library in this current manner. I will happily accept opinions if you believe you have a better way to accomplish this.

My current implementation has a couple of known shortcomings:

  • On earlier versions of Perl that do not have SV/string copy-on-write semantics, the dictionary (which may be several megabytes) will be copied -- possibly pass it around as a scalar ref instead of the string itself?
  • Obviously, I'm just using the string from the SV without claiming it in any way. I didn't want to spend time on a proper solution if this feature is not something wanted in this module.

Thank you.

@pmqs
Copy link
Owner

pmqs commented Dec 16, 2020

Hey @anall thanks for the update. Yes, I'm interested in integrating the change. Will look at the your code but I'm really short on time available to work on it. If you do have time to make it more bullet-proof, that would really help.

@pmqs pmqs self-assigned this Dec 16, 2020
@anall
Copy link
Contributor Author

anall commented Dec 16, 2020

This additional commit should be safer and more bullet-proof but did require a more extensive change to the codebase.

I am not sure if I am happy that Perl will copy the preset dictionary, but I am at the limit of my XS knowledge and am unsure of the best way to prevent that copy without exposing other problems.

My only thought, especially for preventing copies even on earlier versions of Perl, is still to somehow "steal" the buffer out of the provided SV.

Support for "copy on write"/CoW scalars would also be useful, but that is beyond my XS knowledge.

But that is more of a performance concern and might be a case of premature optimization.

One other outstanding question: given the amount of duplicated code and similarity, should I get rid of Lzma::Filter::Lzma::PresetDict and instead add the PresetDict as an option to Lzma::Filter::Lzma?

@pmqs
Copy link
Owner

pmqs commented Dec 17, 2020

Found a spare moment to a preliminary scan the changes. All is looking good so far. Even better, you've added tests! Only thing I see missing are a few paragraphs for the documentation.

Wouldn't worry about duplicated code or COW at this point.

Now I just need to find some time the check the code in mode detail.

@pmqs
Copy link
Owner

pmqs commented Dec 18, 2020

Regarding some of your questions/comments

This additional commit should be safer and more bullet-proof but did require a more extensive change to the codebase.

I am not sure if I am happy that Perl will copy the preset dictionary, but I am at the limit of my XS knowledge and am unsure of the best way to prevent that copy without exposing other problems.

My only thought, especially for preventing copies even on earlier versions of Perl, is still to somehow "steal" the buffer out of the provided SV.

Support for "copy on write"/CoW scalars would also be useful, but that is beyond my XS knowledge.

I've never had to use COW in any of the XS code I've written.

But that is more of a performance concern and might be a case of premature optimization.

I agree that taking a copy of the preset dictionary isn't optimal, but will do for now.

One possibility is to pass a reference to the preset dict into the XS code (or create the reference with newRV_inc in the XS code) and store that in the XS object. That will prevent the dictionary being destroyed until the XS DESTROY method explicitly deletes it. The main issue with that approach is you can't prevent the contents of the preset dict being modified by code that runs after the creation of the filter.

FYI, the zlib equivalent of preset dictionaries is much more straightforward. You just call deflateSetDictionary (see zlib) with the dictionary once. Under the hood zlib does whatever processing that is needed, so there is no need to store the preset dict in an XS object.

One other outstanding question: given the amount of duplicated code and similarity, should I get rid of Lzma::Filter::Lzma::PresetDict and instead add the PresetDict as an option to Lzma::Filter::Lzma?

Yes, thought about this some more and I agree. Having a duplicate of Lzma::Filter::Lzma that just adds the PresetDict option just adds more complexity to an already complex interface.

@pmqs pmqs added the enhancement New feature or request label Dec 18, 2020
This will allow initializing the LZ77 history window using a preset dictionary.

This feature can be useful if somebody needs to compress many, mostly similar, chunks of data independently from each other.
@anall anall force-pushed the wip/preset_dictionary branch from dd55729 to 68fe695 Compare December 18, 2020 21:46
@anall
Copy link
Contributor Author

anall commented Dec 18, 2020

The main issue with that approach is you can't prevent the contents of the preset dict being modified by code that runs after the creation of the filter.

Yeah, that was my main concern, especially if the modifications require Perl to reallocate the buffer.

The copy does seem to be the safest option.

I've gone and rebased my changes into a single commit, and I believe this is now ready.

@anall anall marked this pull request as ready for review December 18, 2020 21:48
@anall anall changed the title [WIP] Expose liblzma's 'preset_dict' feature Expose liblzma's 'preset_dict' feature Dec 18, 2020
@pmqs pmqs merged commit 3432a76 into pmqs:master Dec 18, 2020
@pmqs
Copy link
Owner

pmqs commented Dec 18, 2020

Thanks @anall, change merged.

clrpackages pushed a commit to clearlinux-pkgs/perl-Compress-Raw-Lzma that referenced this pull request Jan 12, 2021
…96 to version 2.100

  2.100 7 January 2021

      * Expose liblzma's 'preset_dict' feature
        pmqs/Compress-Raw-Lzma#6
        4d9d4e596c4f567c87626a827e39c4435e62472d fix typo
        dc394d53b0575edf8f72e28829a2ff9faea7e729 Add t/10preset_dict.t
        c5afb68e2a3a4b2fc4e548ffa61d2a3a383b5c96 Add cast to deRef
        32f9085aba510c4d99d4a374406e734b13b82eef fix minor typos
        55b8d6a6f65a1d6426c55f5b51aefdba6dabfbb3 fix merge issue
        9eb88de7abaaefe736d475260c73de525e7ae39f Merge branch 'master' of https://github.com/pmqs/Compress-Raw-Lzma
        3432a769b283ac5dc9fd757e973cc8aefc1e2345 Merge pull request #6 from anall/wip/preset_dictionary
        68fe695c16b14a983f39c8c8567557148bbc43ca Expose the preset_dict feature from liblzma when creating a Lzma filter

      * trim whitespace
        4ca252f1e1d740489dbd3736ab1f74e1492dff6d
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this pull request Nov 2, 2022
https://metacpan.org/dist/Compress-Raw-Lzma/changes

License-Update: Update copyright years.

* 2.201 25 June 2022

  * 2.201
    Sat Jun 25 09:50:04 2022 +0100
    367f6414d5eb54e5f0d1d07ceb1c909ce5ac84d8

  * 2.103 3 April 2022

    * No changes

  * 2.101 20 February 2021

    * fix version numbers in meta files

  * 2.100 7 January 2021

    * Expose liblzma's 'preset_dict' feature
      pmqs/Compress-Raw-Lzma#6
      4d9d4e596c4f567c87626a827e39c4435e62472d fix typo
      dc394d53b0575edf8f72e28829a2ff9faea7e729 Add t/10preset_dict.t
      c5afb68e2a3a4b2fc4e548ffa61d2a3a383b5c96 Add cast to deRef
      32f9085aba510c4d99d4a374406e734b13b82eef fix minor typos
      55b8d6a6f65a1d6426c55f5b51aefdba6dabfbb3 fix merge issue
      9eb88de7abaaefe736d475260c73de525e7ae39f Merge branch 'master'
      of https://github.com/pmqs/Compress-Raw-Lzma
      3432a769b283ac5dc9fd757e973cc8aefc1e2345 Merge pull request openembedded#6
      from anall/wip/preset_dictionary
      68fe695c16b14a983f39c8c8567557148bbc43ca Expose the preset_dict
      feature from liblzma when creating a Lzma filter

    * trim whitespace
      4ca252f1e1d740489dbd3736ab1f74e1492dff6d

Signed-off-by: Tim Orling <[email protected]>
Signed-off-by: Khem Raj <[email protected]>
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this pull request Nov 2, 2022
https://metacpan.org/dist/Compress-Raw-Lzma/changes

License-Update: Update copyright years.

* 2.201 25 June 2022

  * 2.201
    Sat Jun 25 09:50:04 2022 +0100
    367f6414d5eb54e5f0d1d07ceb1c909ce5ac84d8

  * 2.103 3 April 2022

    * No changes

  * 2.101 20 February 2021

    * fix version numbers in meta files

  * 2.100 7 January 2021

    * Expose liblzma's 'preset_dict' feature
      pmqs/Compress-Raw-Lzma#6
      4d9d4e596c4f567c87626a827e39c4435e62472d fix typo
      dc394d53b0575edf8f72e28829a2ff9faea7e729 Add t/10preset_dict.t
      c5afb68e2a3a4b2fc4e548ffa61d2a3a383b5c96 Add cast to deRef
      32f9085aba510c4d99d4a374406e734b13b82eef fix minor typos
      55b8d6a6f65a1d6426c55f5b51aefdba6dabfbb3 fix merge issue
      9eb88de7abaaefe736d475260c73de525e7ae39f Merge branch 'master'
      of https://github.com/pmqs/Compress-Raw-Lzma
      3432a769b283ac5dc9fd757e973cc8aefc1e2345 Merge pull request openembedded#6
      from anall/wip/preset_dictionary
      68fe695c16b14a983f39c8c8567557148bbc43ca Expose the preset_dict
      feature from liblzma when creating a Lzma filter

    * trim whitespace
      4ca252f1e1d740489dbd3736ab1f74e1492dff6d

Signed-off-by: Tim Orling <[email protected]>
Signed-off-by: Khem Raj <[email protected]>
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this pull request Nov 7, 2022
https://metacpan.org/dist/Compress-Raw-Lzma/changes

License-Update: Update copyright years.

* 2.201 25 June 2022

  * 2.201
    Sat Jun 25 09:50:04 2022 +0100
    367f6414d5eb54e5f0d1d07ceb1c909ce5ac84d8

  * 2.103 3 April 2022

    * No changes

  * 2.101 20 February 2021

    * fix version numbers in meta files

  * 2.100 7 January 2021

    * Expose liblzma's 'preset_dict' feature
      pmqs/Compress-Raw-Lzma#6
      4d9d4e596c4f567c87626a827e39c4435e62472d fix typo
      dc394d53b0575edf8f72e28829a2ff9faea7e729 Add t/10preset_dict.t
      c5afb68e2a3a4b2fc4e548ffa61d2a3a383b5c96 Add cast to deRef
      32f9085aba510c4d99d4a374406e734b13b82eef fix minor typos
      55b8d6a6f65a1d6426c55f5b51aefdba6dabfbb3 fix merge issue
      9eb88de7abaaefe736d475260c73de525e7ae39f Merge branch 'master'
      of https://github.com/pmqs/Compress-Raw-Lzma
      3432a769b283ac5dc9fd757e973cc8aefc1e2345 Merge pull request openembedded#6
      from anall/wip/preset_dictionary
      68fe695c16b14a983f39c8c8567557148bbc43ca Expose the preset_dict
      feature from liblzma when creating a Lzma filter

    * trim whitespace
      4ca252f1e1d740489dbd3736ab1f74e1492dff6d

Signed-off-by: Tim Orling <[email protected]>
Signed-off-by: Khem Raj <[email protected]>
daregit pushed a commit to daregit/yocto-combined that referenced this pull request May 22, 2024
https://metacpan.org/dist/Compress-Raw-Lzma/changes

License-Update: Update copyright years.

* 2.201 25 June 2022

  * 2.201
    Sat Jun 25 09:50:04 2022 +0100
    367f6414d5eb54e5f0d1d07ceb1c909ce5ac84d8

  * 2.103 3 April 2022

    * No changes

  * 2.101 20 February 2021

    * fix version numbers in meta files

  * 2.100 7 January 2021

    * Expose liblzma's 'preset_dict' feature
      pmqs/Compress-Raw-Lzma#6
      4d9d4e596c4f567c87626a827e39c4435e62472d fix typo
      dc394d53b0575edf8f72e28829a2ff9faea7e729 Add t/10preset_dict.t
      c5afb68e2a3a4b2fc4e548ffa61d2a3a383b5c96 Add cast to deRef
      32f9085aba510c4d99d4a374406e734b13b82eef fix minor typos
      55b8d6a6f65a1d6426c55f5b51aefdba6dabfbb3 fix merge issue
      9eb88de7abaaefe736d475260c73de525e7ae39f Merge branch 'master'
      of https://github.com/pmqs/Compress-Raw-Lzma
      3432a769b283ac5dc9fd757e973cc8aefc1e2345 Merge pull request #6
      from anall/wip/preset_dictionary
      68fe695c16b14a983f39c8c8567557148bbc43ca Expose the preset_dict
      feature from liblzma when creating a Lzma filter

    * trim whitespace
      4ca252f1e1d740489dbd3736ab1f74e1492dff6d

Signed-off-by: Tim Orling <[email protected]>
Signed-off-by: Khem Raj <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants