Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal encoding of Code 128 with option "suppressc" #278

Merged
merged 5 commits into from
Nov 7, 2024

Conversation

lyngklip
Copy link
Contributor

This should encode Code 128 optimally - at least with the option "allowxc". Without that option the encoder will switch to the ASCII range before using character set C. This prevents problems with decoders that have "alternative ways" of dealing with the combination of extended ASCII and character set C. I believe that there are other ways to achieve optimal encoding, but I also think that this is quite an elegant solution. I can't really take credit for it because as with most things it's been done before in some way or another - although perhaps not in the context of Code 128. It is most likely more demanding in terms of memory usage and processing time than the current encoder. It's a brute force encoder if you will. I hope it is as easy to read and understand as the current encoder. I fully understand if it is "too much". It's mainly a way for me of procrastinating on a different project.

@gitlost
Copy link
Contributor

gitlost commented Oct 25, 2024

Nice work! But as I said on PR #276, I don't think disallowing allowxc is necessary.

On my measurements (with allowxc) it's a fair bit slower than PR #276, e.g. using this script:

% Prefixed with `loadctx`, `unloadctx`, `raiseerror`, `processoptions`, `parseinput`, `renlinear`
% and current master `code128`, with this PR #278 as `lode128` and PR #276 as `mode128`

-2 vmreclaim

/havel true def /havem true def
/totc 0 def /totl 0 def /totm 0 def
/data [
    (123456ABCD)
    (^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`)
    (123456ABCD123456ABCD123456ABCD123456ABCD123456ABCD123456ABCD)
] def

1 1 3 {
    ==

havel {
    2 vmreclaim

    data {
        /datum exch def
        /difl 0 def
        /startt usertime def
        1 1 100 {
            pop
            datum (dontdraw parse allowxc) /lode128 /uk.co.terryburton.bwipp findresource exec
            pop
        } for
        /endt usertime def
        /difl difl endt startt sub add def
        (    lode128 difl ) print difl ==
        /totl totl difl add def
    } forall
} if

havem {
    2 vmreclaim

    data {
        /datum exch def
        /difm 0 def
        /startt usertime def
        1 1 100 {
            pop
            datum (dontdraw parse) /mode128 /uk.co.terryburton.bwipp findresource exec
            pop
        } for
        /endt usertime def
        /difm difm endt startt sub add def
        (    mode128 difm ) print difm ==
        /totm totm difm add def
    } forall
} if

    2 vmreclaim

    data {
        /datum exch def
        /difc 0 def
        /startt usertime def
        1 1 100 {
            pop
            datum (dontdraw parse) /code128 /uk.co.terryburton.bwipp findresource exec
            pop
        } for
        /endt usertime def
        /difc difc endt startt sub add def
        (    code128 difc ) print difc ==
        /totc totc difc add def
    } forall

} for

havel { (mode128 totl ) print totl == } if
havem { (mode128 totm ) print totm == } if
(code128 totc ) print totc ==

I get the following results (gs 10.02.1, Intel Core i9-9900K)

1
    lode128 difl 80
    lode128 difl 150
    lode128 difl 320
    mode128 difm 30
    mode128 difm 50
    mode128 difm 90
    code128 difc 20
    code128 difc 30
    code128 difc 50
2
    lode128 difl 70
    lode128 difl 140
    lode128 difl 310
    mode128 difm 30
    mode128 difm 50
    mode128 difm 100
    code128 difc 20
    code128 difc 30
    code128 difc 50
3
    lode128 difl 60
    lode128 difl 150
    lode128 difl 320
    mode128 difm 30
    mode128 difm 40
    mode128 difm 100
    code128 difc 20
    code128 difc 30
    code128 difc 50
mode128 totl 1600
mode128 totm 520
code128 totc 300

@lyngklip
Copy link
Contributor Author

I've done some inner loop optimizations on the same branch that cut the time in half.

@lyngklip
Copy link
Contributor Author

Time consumption is now on par with mode128 if my calculations are correct. However the final optimization takes it too far in my opinion. It bakes in priority and unrolls the inner loop.

@lyngklip
Copy link
Contributor Author

lyngklip commented Oct 26, 2024

I added options to avoid character set C and to avoid extended ASCII latch codes (shift codes are still allowed). Can you think of more options that might be interesting?

  • avoidc
  • avoidxlatch

I can invert the allowxc option and rename other options as you like. Or how about "avoidxshift", "avoidalatch", "avoidblatch", "avoidashift", "avoidbshift" - is this of any use to anyone? I think they would be fairly simple to add.

@lyngklip
Copy link
Contributor Author

This branch now addresses issue #23 and it dips into #58. The way the encoder works lends itself to these sort of issues.

@lyngklip
Copy link
Contributor Author

This encoder only uses the charmaps array to map function codes and GS1-128 linkage (which I still don't know what is). I'm inclined to remove it and define dictionaries seta, setb, and setc directly. Is there an argument for keeping charmaps and charvals around?

@lyngklip
Copy link
Contributor Author

I went ahead and did that. This improved performance. Time consumption should now be below mode128 although not as low as the original. Probably because the dictionaries are smaller and lookup cheaper.

@gitlost
Copy link
Contributor

gitlost commented Oct 27, 2024

Fab stuff!

I don't think the option "avoidxlatch" is particularly useful. Also I'd avoid the word "avoid" - it has a suggestion of "trying one's best" - I'd use "suppress" instead. So "suppressc" and "suppressxc"...

For "avoidxc"/"suppressxc", which I concede could be useful for compatibility, "xc" is pretty cryptic, though it's hard to come up with something that isn't very long - perhaps "unlatchextbeforec"??

Note the PR is missing the latest commit to master ([fd7966c]).

Nit-picks: there's trailing white space on a number of lines; "character set A/B/C" -> "code set A/B/C".

I'll be closing PR #276 in preference to this, though probably not till tomorrow (need to do some more testing).

@terryburton
Copy link
Member

terryburton commented Oct 27, 2024

I've been absent from this discussion, but dropping in to say I'm happy with how things are going in terms of development and review. - and thanks!

Lack of time rather than lack of interest, on my part. Please don't let that demotivate...

@lyngklip
Copy link
Contributor Author

My editor was not configured for removing trailing whitespace on postscript documents. Let me take this opportunity to complain about github treating postscript files as binary. I noticed the issue you raised with linguist.

Do my comments in the postscript file make sense - should I cut down on them or add more?

@gitlost
Copy link
Contributor

gitlost commented Oct 27, 2024

Your comments are great and fine by me! (There's a few I'd cut down a little but only a very few.)

@lyngklip
Copy link
Contributor Author

Do it! :-)

@gitlost
Copy link
Contributor

gitlost commented Oct 27, 2024

Ha! I'll leave it to Terry to decide... (Also there're so minor it's not worth mentioning really.)

@lyngklip
Copy link
Contributor Author

Can I add a few more changes?

@gitlost
Copy link
Contributor

gitlost commented Oct 27, 2024

You can add as many as you like! Nothing's finalized, any testing I do is automated so is easily re-run.

@lyngklip
Copy link
Contributor Author

Okay at this point I'm really just noodling.

@lyngklip
Copy link
Contributor Author

lyngklip commented Oct 28, 2024

Can this principle be applied in other places? I picked Code 128 at random.

@gitlost
Copy link
Contributor

gitlost commented Oct 28, 2024

One that comes to mind immediately is MaxiCode, where the 144 module size limits input to be short enough (138 characters) that it might be feasible...

(Note you still haven't rebased this PR to the latest master.)

@gitlost
Copy link
Contributor

gitlost commented Oct 28, 2024

That's cool, thanks!

@lyngklip
Copy link
Contributor Author

Oh man I didn't know you could comment here via e-mail. My e-mail address has been harvested long ago, but I don't want to volunteer more information than I have to ;-)

@gitlost
Copy link
Contributor

gitlost commented Oct 28, 2024

Yes, that was unfortunate. I'm sure though that Micro$oft can be trusted to use your data responsibly (gaffaw).

@lyngklip
Copy link
Contributor Author

There may be some things I would like to do differently to keep Code 128 consistent with MaxiCode if I ever get that to work.

@lyngklip
Copy link
Contributor Author

lyngklip commented Oct 30, 2024

I would like to discuss maxicode with the aim of potentially making a pull-request. How should I go about it - start a discussion or simply make the pull-request and take the discussion there?

My goal is optimal encoding and I'm trying to assess how far from that goal the current encoder is. Yet another update: Suspicion of sub-optimal encoding. I will make a test case for review.

In short: I am soliciting test cases that expose sub-optimal encoding.

@lyngklip
Copy link
Contributor Author

Observations about PostScript: It seems to have quite fast access to current dictionary - at least for recently used values. So much so that I haven't been able to gain much by using the stack a lot. That's fortunate because when you're juggling more than one or two variables that shit gets old real quick. Names are a lot more readable. Some effort must have gone into making dictionary access pretty quick.

@gitlost
Copy link
Contributor

gitlost commented Oct 30, 2024

Wow those improvements seem to make it even slightly faster than the current code!

That's good to know about variables being ok compared to stack, which as you say gets very very old very quickly.

Re MaxiCode, from my point of view just opening a PR would be the way to go...

@gitlost
Copy link
Contributor

gitlost commented Oct 30, 2024

Just noticed the edit re sub-optimal - will check it out.

@lyngklip
Copy link
Contributor Author

lyngklip commented Oct 30, 2024

I'm preparing a pull request with a test case that exposes it - at least I think it does, it's up for discussion. Here: #279

@lyngklip lyngklip marked this pull request as draft October 30, 2024 15:38
@terryburton
Copy link
Member

Ensure that you preserve any modifications that you want to keep (that are not checked in anywhere) before issuing git reset --hard.

@terryburton
Copy link
Member

(Check the edits if you are following along by email. GitHub stripped out the links.)

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 7, 2024

Holy smoke, that's confusing - are we there? GitHub must close the pull-request when I force-push. I love Git - Linus is a genius - but my Git skills are limited.

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 7, 2024

I don't have the Aztec spec, but I took a look at Aztec in BWIPP. The latch tables and other tables look a lot like what I'm suggesting for Code 128. I've looked in vain for history tables corresponding to Code 128 so my first guess is that the algorithms are not identical. Perhaps Aztec encoding doesn't require much history.

I will look more at Aztec. Bar codes are ridiculously, wonderfully complex for what seem like simple things.

@terryburton
Copy link
Member

Holy smoke, that's confusing - are we there? GitHub must close the pull-request when I force-push. I love Git - Linus is a genius - but my Git skills are limited.

It looks like you've aligned your branch with the current HEAD of BWIPP, but you haven't staged your own changes on top of this. I.e. there is nothing extra in your branch to review. Because of this, GitHub decided to close the PR.

If you stage the files that I linked to (based on your earlier work), commit them, then make and commit your recent changes, then push the result then what will appear hear will be consistent and in a form that can be reviewed online. (And I will be able to re-open the PR.)

@terryburton terryburton reopened this Nov 7, 2024
@terryburton
Copy link
Member

Ignore that. GitHub wasn't showing your recent commits until I reopened the PR.

No action on you for now.

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 7, 2024

Ignore that. GitHub wasn't showing your recent commits until I reopened the PR.

That confused me too. Then again being confused is my MO.

@terryburton
Copy link
Member

@lyngklip You missed a file, so the tests aren't currently passing.

Please do something like this from the project root:

wget https://raw.githubusercontent.com/bwipp/postscriptbarcode/aa30a21fcc39cbc71997f1891ce58bbe6189dd3a/tests/ps_tests/code128.ps
mv code128.ps tests/ps_tests/code128.ps
make && make test
git add tests/ps_tests/code128.ps
git commit -m 'tests: Fix up for new Code 128 encoder'
git push
```

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 7, 2024

I should name the states and use those names instead of numbers. Like the Aztec encoder. Although for inner-loop code that hurts performance...

I suspect one could make a cookie-cutter encoder. Someone could turn it into a C++ template...

I would like to remove STP from the dictionaries. The encoder only uses the dictionaries to look up FNC1-3 and GS1-128 linkage codes. I understand you might want to keep the char maps for consistency. Consistency is good. The maps impact performance though.

Copy link
Member

@terryburton terryburton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the style and approach. Well done, and thanks.

If you could please make the minor fixups that I've indicated then I'll merge it (behind the disabled feature flag newencoder).

Plan is that in due course (quite soon) I'll enable it by default by inverting the feature flag (adding a legacyencoder option) to give uses a fallback in case of problems.

Then once the dust has settled I'll remove the old encoder and clean up.

src/code128.ps.src Outdated Show resolved Hide resolved
src/code128.ps.src Show resolved Hide resolved
src/code128.ps.src Outdated Show resolved Hide resolved
src/code128.ps.src Outdated Show resolved Hide resolved
src/code128.ps.src Outdated Show resolved Hide resolved
src/code128.ps.src Outdated Show resolved Hide resolved
src/code128.ps.src Outdated Show resolved Hide resolved
src/code128.ps.src Outdated Show resolved Hide resolved
@terryburton
Copy link
Member

terryburton commented Nov 7, 2024

I should name the states and use those names instead of numbers. Like the Aztec encoder. Although for inner-loop code that hurts performance...

For Code 128 I'm not too bothered about performance because the messages are short. The choice is yours...

I suspect one could make a cookie-cutter encoder. Someone could turn it into a C++ template...

That's a laudable goal. I'm in two minds about keeping a "specification-based" optimiser as a fallback for each symbology. Decoders can be frustratingly fragile!

I would like to remove STP from the dictionaries. The encoder only uses the dictionaries to look up FNC1-3 and GS1-128 linkage codes. I understand you might want to keep the char maps for consistency. Consistency is good. The maps impact performance though.

I'm not too concerned about micro-optimising. Removing things that are barely used (providing we have comments describing any uses of magic numbers) is fine though.

@terryburton terryburton merged commit d3c0de6 into bwipp:master Nov 7, 2024
5 checks passed
@terryburton
Copy link
Member

terryburton commented Nov 7, 2024

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 7, 2024

@lyngklip By what name / handle would you like your participation to be credited:

I made the hall of fame? That is pretty cool. Bue Jensen please :-)

@terryburton
Copy link
Member

terryburton commented Nov 7, 2024

@lyngklip By what name / handle would you like your participation to be credited:

I made the hall of fame? That is pretty cool. Bue Jensen please :-)

Occasional and sustained interest by others has made the project what it is, since my investment has necessarily waxed and waned multiple times.

@lyngklip
Copy link
Contributor Author

lyngklip commented Nov 8, 2024

By the way, there's an ISO/IEC 24778:2024 now.

@terryburton
Copy link
Member

By the way, there's an ISO/IEC 24778:2024 now.

Each of the barcode symbology specifications is being worked on and released with changes to align them with "decimal grading", which relates to scoring verification parameters with greater fidelity.

The current tranche of work doesn't present much concern for encoders.

@lyngklip lyngklip deleted the optimal-code128 branch November 11, 2024 14:02
@terryburton
Copy link
Member

@gitlost I'll probably invert the feature flag and cut a new BWIPP release at the point that you tell me that your port to Zint is adequately tested. (FYI only. I'm not actively pushing for it.)

@terryburton terryburton changed the title Optimal encoding of Code 128 with option "allowxc" Optimal encoding of Code 128 with option "suppressc" Nov 12, 2024
@gitlost
Copy link
Contributor

gitlost commented Nov 25, 2024

I ended up deciding to keep the existing Divide-and-Conquer algorithm, adapting it to use many of the techniques here - the only reason being that Zint has a manual Code Set selection which I couldn't get to work properly with the backtracking algorithm despite numerous attempts at hacking it.

I did have a fully functioning direct port though, which passed all my tests (except those involving manual Code Set Selection), and was around 20% faster than the Divide-and-Conquer algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants