Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with options -v -An for large files #319

Closed
genivia-inc opened this issue Nov 20, 2023 · 0 comments
Closed

Problem with options -v -An for large files #319

genivia-inc opened this issue Nov 20, 2023 · 0 comments
Labels
bug Something isn't working patch A patch to fix an issue

Comments

@genivia-inc
Copy link
Member

Observed with a benchmark 100,000,000 bytes enwik8 file to search the word the to output inverted matches -v with "after context" -A1:

ugrep -vA1 -n the enwik8 | wc
 1114216 12665271 100352570

The correct output should be:

ugrep -vA1 -n the enwik8 | wc
 1114310 12671469 100396462

The problem may happen with very large files with a high match count for the patterns specified, such as the word the in the large enwik8 Wikipedia file. An internal buffer shift adds 1 to a line number counter in function begin_before(), which is called by the InvertContextGrepHandler() functor that is triggered by at the buffer shift. This counts up one too many lineno when InvertContextGrepHandler() is also used to output context at the same time. This causes a missed line in the output.

Note: Fixed in the latest commit of v4.3.3-1. The output is now exactly the same byte-for-byte as GNU grep 3.11.

@genivia-inc genivia-inc added bug Something isn't working patch A patch to fix an issue labels Nov 20, 2023
genivia-inc added a commit that referenced this issue Nov 24, 2023
- ug no longer quits with an error message when no default .ugrep config file was found
- allow config file importing in config files using config=FILE (does not permit recursive imports) see also #320
- fix the output of + separators by no longer using them #317
- fix -v with -ABC context #319
- fix configuration file option arguments that may got lost and causes option argument errors in some cases after parsing a config file, such as colors=
stdedos pushed a commit to stdedos/ugrep that referenced this issue Jan 18, 2024
# By Robert van Engelen (55) and others
# Via GitHub (16) and Robert van Engelen (2)
* tag 'v4.5.2':
  released 4.5.2
  tests: Fix tests with 7zip disabled
  7zip: Do not build when configured with disable-7zip
  released 4.5.1 fix bzip3/7zip configure interference
  add Genivia#341 format %Z enhancement
  fix Genivia#10 --disable-7zip
  fix bzip3/7zip detection interference
  released 4.5.0
  remove shebang from bash completion script
  released 4.4.1
  Fix installation target to use DESTDIR when setting up completions
  add `installers-regex` to Winget Releaser workflow
  released 4.4.0
  released 4.4.0
  Update README.md
  improved zsh completions with option args
  Update README.md
  Update README.md
  Update README.md
  add bash fish zsh completions
  Bump github/codeql-action from 2 to 3
  updated fish completions
  update completions
  add fish completions
  add bash completions
  docs: openSUSE install method added
  released 4.3.6
  Update README.md
  released 4.3.5
  released 4.3.5
  Add Macports moar +pager variant (moar-pager)
  fix linker warning -L/lib directory not found
  fix Genivia#323 configure check
  released 4.3.4
  Refactor Dockerfile for optimized build speed and image size
  Update Arch Linux package URL in README.md
  Update README.md
  update to fix Genivia#316 Genivia#317 Genivia#319
  ugrep.cpp: Fix typo preceeded
  include bzip3 library only when --with-bzip3 is specified
  released 4.3.3
  add bzip3 decompression Genivia#311
  add brotli decompression Genivia#312
  add brotli decompression Genivia#312
  nested zip error recovery Genivia#313 redux
  nested zip error recovery Genivia#313
  quicker TUI blanking when search restarts
  update README
  updated README
  Add Zig support
  released 4.3.2
  released 4.3.2
  Update README.md
  Update README.md
  Update README.md
  Update README.md
  add ugrep.com
  updated README
  update Genivia#305 to support DragonFly and NetBSD
  add thread affinity and priority
  fix Genivia#306 option --bool space in regex bracket list
  fix Genivia#306 option --bool space in regex bracket list
  updated README
  Add Kakoune
  updated README
  Bump actions/checkout from 3 to 4
  released 4.3.1
  updated README
  updated README
  add winget installation reference in the readme
  add Winget Releaser workflow
  updated README

Signed-off-by: Stavros Ntentos <[email protected]>

# Conflicts:
#	src/ugrep.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working patch A patch to fix an issue
Projects
None yet
Development

No branches or pull requests

1 participant