Optimal encoding of Code 128 with option "suppressc" #278

lyngklip · 2024-10-25T07:10:52Z

This should encode Code 128 optimally - at least with the option "allowxc". Without that option the encoder will switch to the ASCII range before using character set C. This prevents problems with decoders that have "alternative ways" of dealing with the combination of extended ASCII and character set C. I believe that there are other ways to achieve optimal encoding, but I also think that this is quite an elegant solution. I can't really take credit for it because as with most things it's been done before in some way or another - although perhaps not in the context of Code 128. It is most likely more demanding in terms of memory usage and processing time than the current encoder. It's a brute force encoder if you will. I hope it is as easy to read and understand as the current encoder. I fully understand if it is "too much". It's mainly a way for me of procrastinating on a different project.

gitlost · 2024-10-25T22:17:25Z

Nice work! But as I said on PR #276, I don't think disallowing allowxc is necessary.

On my measurements (with allowxc) it's a fair bit slower than PR #276, e.g. using this script:

% Prefixed with `loadctx`, `unloadctx`, `raiseerror`, `processoptions`, `parseinput`, `renlinear`
% and current master `code128`, with this PR #278 as `lode128` and PR #276 as `mode128`

-2 vmreclaim

/havel true def /havem true def
/totc 0 def /totl 0 def /totm 0 def
/data [
    (123456ABCD)
    (^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`)
    (123456ABCD123456ABCD123456ABCD123456ABCD123456ABCD123456ABCD)
] def

1 1 3 {
    ==

havel {
    2 vmreclaim

    data {
        /datum exch def
        /difl 0 def
        /startt usertime def
        1 1 100 {
            pop
            datum (dontdraw parse allowxc) /lode128 /uk.co.terryburton.bwipp findresource exec
            pop
        } for
        /endt usertime def
        /difl difl endt startt sub add def
        (    lode128 difl ) print difl ==
        /totl totl difl add def
    } forall
} if

havem {
    2 vmreclaim

    data {
        /datum exch def
        /difm 0 def
        /startt usertime def
        1 1 100 {
            pop
            datum (dontdraw parse) /mode128 /uk.co.terryburton.bwipp findresource exec
            pop
        } for
        /endt usertime def
        /difm difm endt startt sub add def
        (    mode128 difm ) print difm ==
        /totm totm difm add def
    } forall
} if

    2 vmreclaim

    data {
        /datum exch def
        /difc 0 def
        /startt usertime def
        1 1 100 {
            pop
            datum (dontdraw parse) /code128 /uk.co.terryburton.bwipp findresource exec
            pop
        } for
        /endt usertime def
        /difc difc endt startt sub add def
        (    code128 difc ) print difc ==
        /totc totc difc add def
    } forall

} for

havel { (mode128 totl ) print totl == } if
havem { (mode128 totm ) print totm == } if
(code128 totc ) print totc ==

I get the following results (gs 10.02.1, Intel Core i9-9900K)

1
    lode128 difl 80
    lode128 difl 150
    lode128 difl 320
    mode128 difm 30
    mode128 difm 50
    mode128 difm 90
    code128 difc 20
    code128 difc 30
    code128 difc 50
2
    lode128 difl 70
    lode128 difl 140
    lode128 difl 310
    mode128 difm 30
    mode128 difm 50
    mode128 difm 100
    code128 difc 20
    code128 difc 30
    code128 difc 50
3
    lode128 difl 60
    lode128 difl 150
    lode128 difl 320
    mode128 difm 30
    mode128 difm 40
    mode128 difm 100
    code128 difc 20
    code128 difc 30
    code128 difc 50
mode128 totl 1600
mode128 totm 520
code128 totc 300

lyngklip · 2024-10-26T09:38:18Z

I've done some inner loop optimizations on the same branch that cut the time in half.

lyngklip · 2024-10-26T15:32:36Z

Time consumption is now on par with mode128 if my calculations are correct. However the final optimization takes it too far in my opinion. It bakes in priority and unrolls the inner loop.

lyngklip · 2024-10-26T18:33:36Z

I added options to avoid character set C and to avoid extended ASCII latch codes (shift codes are still allowed). Can you think of more options that might be interesting?

avoidc
avoidxlatch

I can invert the allowxc option and rename other options as you like. Or how about "avoidxshift", "avoidalatch", "avoidblatch", "avoidashift", "avoidbshift" - is this of any use to anyone? I think they would be fairly simple to add.

lyngklip · 2024-10-27T06:05:18Z

This branch now addresses issue #23 and it dips into #58. The way the encoder works lends itself to these sort of issues.

lyngklip · 2024-10-27T06:47:48Z

This encoder only uses the charmaps array to map function codes and GS1-128 linkage (which I still don't know what is). I'm inclined to remove it and define dictionaries seta, setb, and setc directly. Is there an argument for keeping charmaps and charvals around?

lyngklip · 2024-10-27T06:51:28Z

I went ahead and did that. This improved performance. Time consumption should now be below mode128 although not as low as the original. Probably because the dictionaries are smaller and lookup cheaper.

gitlost · 2024-10-27T14:34:03Z

Fab stuff!

I don't think the option "avoidxlatch" is particularly useful. Also I'd avoid the word "avoid" - it has a suggestion of "trying one's best" - I'd use "suppress" instead. So "suppressc" and "suppressxc"...

For "avoidxc"/"suppressxc", which I concede could be useful for compatibility, "xc" is pretty cryptic, though it's hard to come up with something that isn't very long - perhaps "unlatchextbeforec"??

Note the PR is missing the latest commit to master ([fd7966c]).

Nit-picks: there's trailing white space on a number of lines; "character set A/B/C" -> "code set A/B/C".

I'll be closing PR #276 in preference to this, though probably not till tomorrow (need to do some more testing).

terryburton · 2024-10-27T15:39:40Z

I've been absent from this discussion, but dropping in to say I'm happy with how things are going in terms of development and review. - and thanks!

Lack of time rather than lack of interest, on my part. Please don't let that demotivate...

lyngklip · 2024-10-27T16:03:32Z

My editor was not configured for removing trailing whitespace on postscript documents. Let me take this opportunity to complain about github treating postscript files as binary. I noticed the issue you raised with linguist.

Do my comments in the postscript file make sense - should I cut down on them or add more?

gitlost · 2024-10-27T16:34:53Z

Your comments are great and fine by me! (There's a few I'd cut down a little but only a very few.)

lyngklip · 2024-10-27T16:35:13Z

Do it! :-)

gitlost · 2024-10-27T16:46:36Z

Ha! I'll leave it to Terry to decide... (Also there're so minor it's not worth mentioning really.)

lyngklip · 2024-10-27T17:02:11Z

Can I add a few more changes?

gitlost · 2024-10-27T17:13:05Z

You can add as many as you like! Nothing's finalized, any testing I do is automated so is easily re-run.

lyngklip · 2024-10-27T18:29:32Z

Okay at this point I'm really just noodling.

lyngklip · 2024-10-28T11:46:09Z

Can this principle be applied in other places? I picked Code 128 at random.

gitlost · 2024-10-28T14:00:28Z

One that comes to mind immediately is MaxiCode, where the 144 module size limits input to be short enough (138 characters) that it might be feasible...

(Note you still haven't rebased this PR to the latest master.)

gitlost · 2024-10-28T15:09:02Z

That's cool, thanks!

lyngklip · 2024-10-28T15:27:01Z

Oh man I didn't know you could comment here via e-mail. My e-mail address has been harvested long ago, but I don't want to volunteer more information than I have to ;-)

gitlost · 2024-10-28T15:41:56Z

Yes, that was unfortunate. I'm sure though that Micro$oft can be trusted to use your data responsibly (gaffaw).

lyngklip · 2024-10-29T07:12:29Z

There may be some things I would like to do differently to keep Code 128 consistent with MaxiCode if I ever get that to work.

lyngklip · 2024-10-30T07:48:22Z

I would like to discuss maxicode with the aim of potentially making a pull-request. How should I go about it - start a discussion or simply make the pull-request and take the discussion there?

My goal is optimal encoding and I'm trying to assess how far from that goal the current encoder is. Yet another update: Suspicion of sub-optimal encoding. I will make a test case for review.

In short: I am soliciting test cases that expose sub-optimal encoding.

lyngklip · 2024-10-30T13:39:06Z

Observations about PostScript: It seems to have quite fast access to current dictionary - at least for recently used values. So much so that I haven't been able to gain much by using the stack a lot. That's fortunate because when you're juggling more than one or two variables that shit gets old real quick. Names are a lot more readable. Some effort must have gone into making dictionary access pretty quick.

gitlost · 2024-10-30T14:33:43Z

Wow those improvements seem to make it even slightly faster than the current code!

That's good to know about variables being ok compared to stack, which as you say gets very very old very quickly.

Re MaxiCode, from my point of view just opening a PR would be the way to go...

gitlost · 2024-10-30T14:34:51Z

Just noticed the edit re sub-optimal - will check it out.

lyngklip · 2024-10-30T14:56:49Z

I'm preparing a pull request with a test case that exposes it - at least I think it does, it's up for discussion. Here: #279

terryburton · 2024-11-06T19:41:43Z

Ensure that you preserve any modifications that you want to keep (that are not checked in anywhere) before issuing git reset --hard.

terryburton · 2024-11-06T19:44:07Z

(Check the edits if you are following along by email. GitHub stripped out the links.)

lyngklip · 2024-11-07T07:17:27Z

Holy smoke, that's confusing - are we there? GitHub must close the pull-request when I force-push. I love Git - Linus is a genius - but my Git skills are limited.

lyngklip · 2024-11-07T10:55:48Z

I don't have the Aztec spec, but I took a look at Aztec in BWIPP. The latch tables and other tables look a lot like what I'm suggesting for Code 128. I've looked in vain for history tables corresponding to Code 128 so my first guess is that the algorithms are not identical. Perhaps Aztec encoding doesn't require much history.

I will look more at Aztec. Bar codes are ridiculously, wonderfully complex for what seem like simple things.

terryburton · 2024-11-07T12:24:48Z

Holy smoke, that's confusing - are we there? GitHub must close the pull-request when I force-push. I love Git - Linus is a genius - but my Git skills are limited.

It looks like you've aligned your branch with the current HEAD of BWIPP, but you haven't staged your own changes on top of this. I.e. there is nothing extra in your branch to review. Because of this, GitHub decided to close the PR.

If you stage the files that I linked to (based on your earlier work), commit them, then make and commit your recent changes, then push the result then what will appear hear will be consistent and in a form that can be reviewed online. (And I will be able to re-open the PR.)

terryburton · 2024-11-07T12:27:06Z

Ignore that. GitHub wasn't showing your recent commits until I reopened the PR.

No action on you for now.

lyngklip · 2024-11-07T12:30:02Z

Ignore that. GitHub wasn't showing your recent commits until I reopened the PR.

That confused me too. Then again being confused is my MO.

terryburton · 2024-11-07T12:39:06Z

@lyngklip You missed a file, so the tests aren't currently passing.

Please do something like this from the project root:

wget https://raw.githubusercontent.com/bwipp/postscriptbarcode/aa30a21fcc39cbc71997f1891ce58bbe6189dd3a/tests/ps_tests/code128.ps
mv code128.ps tests/ps_tests/code128.ps
make && make test
git add tests/ps_tests/code128.ps
git commit -m 'tests: Fix up for new Code 128 encoder'
git push
```

lyngklip · 2024-11-07T12:55:40Z

I should name the states and use those names instead of numbers. Like the Aztec encoder. Although for inner-loop code that hurts performance...

I suspect one could make a cookie-cutter encoder. Someone could turn it into a C++ template...

I would like to remove STP from the dictionaries. The encoder only uses the dictionaries to look up FNC1-3 and GS1-128 linkage codes. I understand you might want to keep the char maps for consistency. Consistency is good. The maps impact performance though.

terryburton

I like the style and approach. Well done, and thanks.

If you could please make the minor fixups that I've indicated then I'll merge it (behind the disabled feature flag newencoder).

Plan is that in due course (quite soon) I'll enable it by default by inverting the feature flag (adding a legacyencoder option) to give uses a fallback in case of problems.

Then once the dust has settled I'll remove the old encoder and clean up.

src/code128.ps.src

terryburton · 2024-11-07T13:44:10Z

I should name the states and use those names instead of numbers. Like the Aztec encoder. Although for inner-loop code that hurts performance...

For Code 128 I'm not too bothered about performance because the messages are short. The choice is yours...

I suspect one could make a cookie-cutter encoder. Someone could turn it into a C++ template...

That's a laudable goal. I'm in two minds about keeping a "specification-based" optimiser as a fallback for each symbology. Decoders can be frustratingly fragile!

I would like to remove STP from the dictionaries. The encoder only uses the dictionaries to look up FNC1-3 and GS1-128 linkage codes. I understand you might want to keep the char maps for consistency. Consistency is good. The maps impact performance though.

I'm not too concerned about micro-optimising. Removing things that are barely used (providing we have comments describing any uses of magic numbers) is fine though.

terryburton · 2024-11-07T15:40:22Z

@lyngklip By what name / handle would you like your participation to be credited:

lyngklip · 2024-11-07T19:13:34Z

@lyngklip By what name / handle would you like your participation to be credited:

I made the hall of fame? That is pretty cool. Bue Jensen please :-)

terryburton · 2024-11-07T19:58:09Z

@lyngklip By what name / handle would you like your participation to be credited:

I made the hall of fame? That is pretty cool. Bue Jensen please :-)

Occasional and sustained interest by others has made the project what it is, since my investment has necessarily waxed and waned multiple times.

lyngklip · 2024-11-08T10:40:38Z

By the way, there's an ISO/IEC 24778:2024 now.

terryburton · 2024-11-08T12:52:34Z

By the way, there's an ISO/IEC 24778:2024 now.

Each of the barcode symbology specifications is being worked on and released with changes to align them with "decimal grading", which relates to scoring verification parameters with greater fidelity.

The current tranche of work doesn't present much concern for encoders.

terryburton · 2024-11-11T17:14:03Z

@gitlost I'll probably invert the feature flag and cut a new BWIPP release at the point that you tell me that your port to Zint is adequately tested. (FYI only. I'm not actively pushing for it.)

gitlost · 2024-11-25T10:13:34Z

I ended up deciding to keep the existing Divide-and-Conquer algorithm, adapting it to use many of the techniques here - the only reason being that Zint has a manual Code Set selection which I couldn't get to work properly with the backtracking algorithm despite numerous attempts at hacking it.

I did have a fully functioning direct port though, which passed all my tests (except those involving manual Code Set Selection), and was around 20% faster than the Divide-and-Conquer algorithm.

This was referenced Oct 28, 2024

code128: Add minimal encodation algorithm (non-extended ASCII only) #276

Closed

code128: reduce extended latch cut-off from 5 to 4 #275

Closed

lyngklip marked this pull request as draft October 30, 2024 15:38

lyngklip closed this Nov 7, 2024

lyngklip force-pushed the optimal-code128 branch from 9db3244 to fe2c218 Compare November 7, 2024 07:09

Two in one

562e4de

lyngklip added 2 commits November 7, 2024 08:29

Add final newline back, sorry

a646f76

Improve comments

2497935

terryburton reopened this Nov 7, 2024

tests: Fix up for new Code 128 encoder

dac7535

terryburton reviewed Nov 7, 2024

View reviewed changes

Update according to review

1b9ab6e

terryburton merged commit d3c0de6 into bwipp:master Nov 7, 2024
5 checks passed

lyngklip deleted the optimal-code128 branch November 11, 2024 14:02

terryburton changed the title ~~Optimal encoding of Code 128 with option "allowxc"~~ Optimal encoding of Code 128 with option "suppressc" Nov 12, 2024

terryburton mentioned this pull request Nov 12, 2024

Consider enabling Code 128 optimal encoding by default #281

Open

Optimal encoding of Code 128 with option "suppressc" #278

Optimal encoding of Code 128 with option "suppressc" #278

Conversation

lyngklip commented Oct 25, 2024

gitlost commented Oct 25, 2024

lyngklip commented Oct 26, 2024

lyngklip commented Oct 26, 2024

lyngklip commented Oct 26, 2024 • edited Loading

lyngklip commented Oct 27, 2024

lyngklip commented Oct 27, 2024

lyngklip commented Oct 27, 2024

gitlost commented Oct 27, 2024

terryburton commented Oct 27, 2024 • edited Loading

lyngklip commented Oct 27, 2024

gitlost commented Oct 27, 2024

lyngklip commented Oct 27, 2024

gitlost commented Oct 27, 2024

lyngklip commented Oct 27, 2024

gitlost commented Oct 27, 2024

lyngklip commented Oct 27, 2024

lyngklip commented Oct 28, 2024 • edited Loading

gitlost commented Oct 28, 2024

gitlost commented Oct 28, 2024

lyngklip commented Oct 28, 2024

gitlost commented Oct 28, 2024

lyngklip commented Oct 29, 2024

lyngklip commented Oct 30, 2024 • edited Loading

lyngklip commented Oct 30, 2024

gitlost commented Oct 30, 2024

gitlost commented Oct 30, 2024

lyngklip commented Oct 30, 2024 • edited Loading

terryburton commented Nov 6, 2024

terryburton commented Nov 6, 2024

lyngklip commented Nov 7, 2024 • edited Loading

lyngklip commented Nov 7, 2024

terryburton commented Nov 7, 2024

terryburton commented Nov 7, 2024

lyngklip commented Nov 7, 2024

terryburton commented Nov 7, 2024

lyngklip commented Nov 7, 2024 • edited Loading

terryburton left a comment

Choose a reason for hiding this comment

terryburton commented Nov 7, 2024 • edited Loading

terryburton commented Nov 7, 2024 • edited Loading

lyngklip commented Nov 7, 2024

terryburton commented Nov 7, 2024 • edited Loading

lyngklip commented Nov 8, 2024 • edited Loading

terryburton commented Nov 8, 2024

terryburton commented Nov 11, 2024

gitlost commented Nov 25, 2024

lyngklip commented Oct 26, 2024 •

edited

Loading

terryburton commented Oct 27, 2024 •

edited

Loading

lyngklip commented Oct 28, 2024 •

edited

Loading

lyngklip commented Oct 30, 2024 •

edited

Loading

lyngklip commented Oct 30, 2024 •

edited

Loading

lyngklip commented Nov 7, 2024 •

edited

Loading

lyngklip commented Nov 7, 2024 •

edited

Loading

terryburton commented Nov 7, 2024 •

edited

Loading

terryburton commented Nov 7, 2024 •

edited

Loading

terryburton commented Nov 7, 2024 •

edited

Loading

lyngklip commented Nov 8, 2024 •

edited

Loading