-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimal encoding of Code 128 with option "suppressc" #278
Conversation
Nice work! But as I said on PR #276, I don't think disallowing On my measurements (with
I get the following results (gs 10.02.1, Intel Core i9-9900K)
|
I've done some inner loop optimizations on the same branch that cut the time in half. |
Time consumption is now on par with mode128 if my calculations are correct. However the final optimization takes it too far in my opinion. It bakes in priority and unrolls the inner loop. |
I added options to avoid character set C and to avoid extended ASCII latch codes (shift codes are still allowed). Can you think of more options that might be interesting?
I can invert the allowxc option and rename other options as you like. Or how about "avoidxshift", "avoidalatch", "avoidblatch", "avoidashift", "avoidbshift" - is this of any use to anyone? I think they would be fairly simple to add. |
This encoder only uses the charmaps array to map function codes and GS1-128 linkage (which I still don't know what is). I'm inclined to remove it and define dictionaries seta, setb, and setc directly. Is there an argument for keeping charmaps and charvals around? |
I went ahead and did that. This improved performance. Time consumption should now be below mode128 although not as low as the original. Probably because the dictionaries are smaller and lookup cheaper. |
Fab stuff! I don't think the option "avoidxlatch" is particularly useful. Also I'd avoid the word "avoid" - it has a suggestion of "trying one's best" - I'd use "suppress" instead. So "suppressc" and "suppressxc"... For "avoidxc"/"suppressxc", which I concede could be useful for compatibility, "xc" is pretty cryptic, though it's hard to come up with something that isn't very long - perhaps "unlatchextbeforec"?? Note the PR is missing the latest commit to master ([fd7966c]). Nit-picks: there's trailing white space on a number of lines; "character set A/B/C" -> "code set A/B/C". I'll be closing PR #276 in preference to this, though probably not till tomorrow (need to do some more testing). |
I've been absent from this discussion, but dropping in to say I'm happy with how things are going in terms of development and review. - and thanks! Lack of time rather than lack of interest, on my part. Please don't let that demotivate... |
My editor was not configured for removing trailing whitespace on postscript documents. Let me take this opportunity to complain about github treating postscript files as binary. I noticed the issue you raised with linguist. Do my comments in the postscript file make sense - should I cut down on them or add more? |
Your comments are great and fine by me! (There's a few I'd cut down a little but only a very few.) |
Do it! :-) |
Ha! I'll leave it to Terry to decide... (Also there're so minor it's not worth mentioning really.) |
Can I add a few more changes? |
You can add as many as you like! Nothing's finalized, any testing I do is automated so is easily re-run. |
Okay at this point I'm really just noodling. |
Can this principle be applied in other places? I picked Code 128 at random. |
One that comes to mind immediately is MaxiCode, where the 144 module size limits input to be short enough (138 characters) that it might be feasible... (Note you still haven't rebased this PR to the latest master.) |
That's cool, thanks! |
Oh man I didn't know you could comment here via e-mail. My e-mail address has been harvested long ago, but I don't want to volunteer more information than I have to ;-) |
Yes, that was unfortunate. I'm sure though that Micro$oft can be trusted to use your data responsibly (gaffaw). |
There may be some things I would like to do differently to keep Code 128 consistent with MaxiCode if I ever get that to work. |
I would like to discuss maxicode with the aim of potentially making a pull-request. How should I go about it - start a discussion or simply make the pull-request and take the discussion there? My goal is optimal encoding and I'm trying to assess how far from that goal the current encoder is. Yet another update: Suspicion of sub-optimal encoding. I will make a test case for review. In short: I am soliciting test cases that expose sub-optimal encoding. |
Observations about PostScript: It seems to have quite fast access to current dictionary - at least for recently used values. So much so that I haven't been able to gain much by using the stack a lot. That's fortunate because when you're juggling more than one or two variables that shit gets old real quick. Names are a lot more readable. Some effort must have gone into making dictionary access pretty quick. |
Wow those improvements seem to make it even slightly faster than the current code! That's good to know about variables being ok compared to stack, which as you say gets very very old very quickly. Re MaxiCode, from my point of view just opening a PR would be the way to go... |
Just noticed the edit re sub-optimal - will check it out. |
I'm preparing a pull request with a test case that exposes it - at least I think it does, it's up for discussion. Here: #279 |
Ensure that you preserve any modifications that you want to keep (that are not checked in anywhere) before issuing |
(Check the edits if you are following along by email. GitHub stripped out the links.) |
9db3244
to
fe2c218
Compare
Holy smoke, that's confusing - are we there? GitHub must close the pull-request when I force-push. I love Git - Linus is a genius - but my Git skills are limited. |
I don't have the Aztec spec, but I took a look at Aztec in BWIPP. The latch tables and other tables look a lot like what I'm suggesting for Code 128. I've looked in vain for history tables corresponding to Code 128 so my first guess is that the algorithms are not identical. Perhaps Aztec encoding doesn't require much history. I will look more at Aztec. Bar codes are ridiculously, wonderfully complex for what seem like simple things. |
It looks like you've aligned your branch with the current HEAD of BWIPP, but you haven't staged your own changes on top of this. I.e. there is nothing extra in your branch to review. Because of this, GitHub decided to close the PR. If you stage the files that I linked to (based on your earlier work), commit them, then make and commit your recent changes, then push the result then what will appear hear will be consistent and in a form that can be reviewed online. (And I will be able to re-open the PR.) |
Ignore that. GitHub wasn't showing your recent commits until I reopened the PR. No action on you for now. |
That confused me too. Then again being confused is my MO. |
@lyngklip You missed a file, so the tests aren't currently passing. Please do something like this from the project root:
|
I should name the states and use those names instead of numbers. Like the Aztec encoder. Although for inner-loop code that hurts performance... I suspect one could make a cookie-cutter encoder. Someone could turn it into a C++ template... I would like to remove STP from the dictionaries. The encoder only uses the dictionaries to look up FNC1-3 and GS1-128 linkage codes. I understand you might want to keep the char maps for consistency. Consistency is good. The maps impact performance though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the style and approach. Well done, and thanks.
If you could please make the minor fixups that I've indicated then I'll merge it (behind the disabled feature flag newencoder
).
Plan is that in due course (quite soon) I'll enable it by default by inverting the feature flag (adding a legacyencoder
option) to give uses a fallback in case of problems.
Then once the dust has settled I'll remove the old encoder and clean up.
For Code 128 I'm not too bothered about performance because the messages are short. The choice is yours...
That's a laudable goal. I'm in two minds about keeping a "specification-based" optimiser as a fallback for each symbology. Decoders can be frustratingly fragile!
I'm not too concerned about micro-optimising. Removing things that are barely used (providing we have comments describing any uses of magic numbers) is fine though. |
@lyngklip By what name / handle would you like your participation to be credited: |
I made the hall of fame? That is pretty cool. Bue Jensen please :-) |
Occasional and sustained interest by others has made the project what it is, since my investment has necessarily waxed and waned multiple times. |
By the way, there's an ISO/IEC 24778:2024 now. |
Each of the barcode symbology specifications is being worked on and released with changes to align them with "decimal grading", which relates to scoring verification parameters with greater fidelity. The current tranche of work doesn't present much concern for encoders. |
@gitlost I'll probably invert the feature flag and cut a new BWIPP release at the point that you tell me that your port to Zint is adequately tested. (FYI only. I'm not actively pushing for it.) |
I ended up deciding to keep the existing Divide-and-Conquer algorithm, adapting it to use many of the techniques here - the only reason being that Zint has a manual Code Set selection which I couldn't get to work properly with the backtracking algorithm despite numerous attempts at hacking it. I did have a fully functioning direct port though, which passed all my tests (except those involving manual Code Set Selection), and was around 20% faster than the Divide-and-Conquer algorithm. |
This should encode Code 128 optimally - at least with the option "allowxc". Without that option the encoder will switch to the ASCII range before using character set C. This prevents problems with decoders that have "alternative ways" of dealing with the combination of extended ASCII and character set C. I believe that there are other ways to achieve optimal encoding, but I also think that this is quite an elegant solution. I can't really take credit for it because as with most things it's been done before in some way or another - although perhaps not in the context of Code 128. It is most likely more demanding in terms of memory usage and processing time than the current encoder. It's a brute force encoder if you will. I hope it is as easy to read and understand as the current encoder. I fully understand if it is "too much". It's mainly a way for me of procrastinating on a different project.