Hexdump stops prematurely at LF #428

misutoneko · 2024-09-24T21:23:52Z

New user here, hi :D

I've only tested ugrep for a short while, but I think I've found a bug.
To cut to the chase here's an example:

echo -e "Hello World\nThis part left unseen" |ugrep -U --hexdump=1A1 "\x57\x6f\x72"

Which will give you this:

00000000  48 65 6c 6c 6f 20 57 6f  |Hello Wo|
00000008  72 6c 64 0a -- -- -- --  |rldJ----|

So the problem is that the output doesn't continue all the way to the context boundary, it stops at the first sight of LF (0x0a).
OK sure, greps are usually line-oriented, but imho the byte values in context shouldn't matter when dealing with the binary stuff?
Tested with v3.7.2 and v6.5.0.

Btw hexdumping is THE main attraction of ugrep for me :D
I have high hopes that ugrep will help me to replace my silly old 600+ line bash/awk script that I'm now using for dumping...

The text was updated successfully, but these errors were encountered:

misutoneko · 2024-09-26T10:13:34Z

Here's a workaround:

echo -e "Hello World\nThis part left unseen" \
|ugrep -y -U --color=always --hexdump=1A1 "\x57\x6f\x72" \
|grep -A1 --color=never 31m

The -y in the first ugrep is passthru.
The second (u)grep finds the lines based on the color code and prints them with the desired context added.
Note that the search pattern 31m is ok for demo purposes, but it should be refined for RL usage.

EDIT:
Actually that -y might eat all performance so it's better to restrict the first ugrep:

echo -e "Hello World\nThis part left unseen" \
|ugrep -U --color=always -A1 --hexdump=1A1 "\x57\x6f\x72" \
|ugrep --color=never -A1 31m

genivia-inc · 2024-10-01T18:01:54Z

As you point out, the hexdump feature is line-oriented, like grep in general is a line-oriented tool. The reason is the efficient buffering with a window that shifts out at line boundaries so it is possible to quickly search several GB of input without impacting memory resources.

A way to avoid this is to match newline \n byte as part of the pattern, but there are also reasons not to do that.

Perhaps this should be moved to the ugrep discussion so we can close this eventually.

misutoneko · 2024-10-01T22:18:01Z

Yeah I had a hunch it might be something like that.
And I agree, if this is ever fixed it should be done in a way that doesn't compromise speed.
The workaround, well, works for me well enough (it could break at the next version update though).

I guess only the repo owner can move to Discussions? Didn't spot any way to do that...

genivia-inc · 2024-10-02T14:13:40Z

I agree that there are opportunities to improve the hexdump feature with better context control. I haven't done that, because I didn't want to mess with the search engine.

What I would like to do is improve the hexdump context to always show the given number of hex lines before and after a match, regardless of the presence of newlines. That needs a change to the search engine so that the before hex lines (i.e. bytes) are not shifted out of the window when searching large files.

If will put that on the TODO list.

genivia-inc · 2024-10-20T19:31:57Z

The upcoming v7 release will produce hex before and after context lines regardless of LF boundaries. The before hex context is guaranteed to display, as long as the before hex size specified is not crazy large like hundreds of hex lines (in that case, the hex before context may get truncated by chance if the file is several MB).

misutoneko · 2024-10-26T16:27:26Z

Thank you! Works like a charm from what I can see :D

genivia-inc · 2024-10-27T19:45:26Z

The v7.0.2 update improves hex context lines when these run into each other. I want to avoid overlap, which can be confusing. So it looks a lot cleaner in the update, at least it does to me :)

fix #434 option -v issue with the ^ anchor caused by the v7 updated --hexdump hex context support #428

genivia-inc added the enhancement New feature or request label Oct 19, 2024

genivia-inc closed this as completed in fc26ea0 Oct 25, 2024

genivia-inc added a commit that referenced this issue Oct 27, 2024

released 7.0.2

bddae22

fix #434 option -v issue with the ^ anchor caused by the v7 updated --hexdump hex context support #428

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hexdump stops prematurely at LF #428

Hexdump stops prematurely at LF #428

misutoneko commented Sep 24, 2024

misutoneko commented Sep 26, 2024 •

edited

Loading

genivia-inc commented Oct 1, 2024

misutoneko commented Oct 1, 2024

genivia-inc commented Oct 2, 2024

genivia-inc commented Oct 20, 2024

misutoneko commented Oct 26, 2024

genivia-inc commented Oct 27, 2024

Hexdump stops prematurely at LF #428

Hexdump stops prematurely at LF #428

Comments

misutoneko commented Sep 24, 2024

misutoneko commented Sep 26, 2024 • edited Loading

genivia-inc commented Oct 1, 2024

misutoneko commented Oct 1, 2024

genivia-inc commented Oct 2, 2024

genivia-inc commented Oct 20, 2024

misutoneko commented Oct 26, 2024

genivia-inc commented Oct 27, 2024

misutoneko commented Sep 26, 2024 •

edited

Loading