Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counters issue in CALL SUGGEST #2764

Closed
5 tasks
donhardman opened this issue Nov 15, 2024 · 3 comments
Closed
5 tasks

Counters issue in CALL SUGGEST #2764

donhardman opened this issue Nov 15, 2024 · 3 comments
Assignees
Labels

Comments

@donhardman
Copy link
Contributor

donhardman commented Nov 15, 2024

Bug Description:

There is an issue where document counters differ even when all inserted data is identical, but processed in parallel with concurrency and inserted in different order.

To reproduce this issue, execute the script from this repository:

php -d memory_limit=2G ./test/clt-tests/scripts/load_names_attr.php --batch-size=100000 --concurrency=4 --docs=2000000 --start-id=1 --drop-table --min-infix-len=2

After completion, execute this SQL query:

mysql> call suggest('SMITH', 'name', 10 as limit, 2 as max_edits);
+---------+----------+------+
| suggest | distance | docs |
+---------+----------+------+
| smyth   | 1        | 1036 |
| keith   | 2        | 2748 |
| minh    | 2        | 1766 |
| south   | 2        | 1001 |
| seitz   | 2        | 961  |
| nesmith | 2        | 945  |
| mitch   | 2        | 910  |
| faith   | 2        | 897  |
| seth    | 2        | 896  |
| edith   | 2        | 887  |
+---------+----------+------+

When running the query multiple times (2-3 times to verify), you'll notice different document counts for the same words, such as 'edith'. This inconsistency also occurs when index_exact_words is enabled and when using SELECT COUNT(*) queries.

Manticore Search Version:

Latest dev version

Operating System Version:

Ubuntu Jammy

Have you tried the latest development version?

None

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@tomatolog
Copy link
Contributor

tomatolog commented Nov 18, 2024

Unable to reproduce the issue.
I use daemon from the master head (bedf3c9) and daemon from the version e2c80bb there issue was reported.
I run at the dev box loop 5 times

stas@dev2:~/manticore$ php -d memory_limit=2G ~/manticore/test/clt-tests/scripts/load_names_attr.php --batch-size=100000 --concurrency=4 --docs=2000000 --start-id=1 --drop-table --min-infix-len=2
Total time: 64.395491123199
31058 docs per sec
stas@dev2:~/manticore$ mysql -h0 -P 27315 -e "call suggest('SMITH', 'name', 10 as limit, 2 as max_edits)" | md5sum
2911c06c9b260ae136b60fe5411bc55e  -
stas@dev2:~/manticore$ mysql -h0 -P 27315 -e "call suggest('SMITH', 'name', 10 as limit, 2 as max_edits)"
+---------+----------+------+
| suggest | distance | docs |
+---------+----------+------+
| smyth   | 1        | 408  |
| keith   | 2        | 449  |
| nesmith | 2        | 392  |
| seth    | 2        | 369  |
| mitch   | 2        | 312  |
| edith   | 2        | 291  |
| minh    | 2        | 271  |
| seitz   | 2        | 167  |
| faith   | 2        | 157  |
| south   | 2        | 142  |
+---------+----------+------+

and every time I got the same result and md5sum 2911c06c9b260ae136b60fe5411bc55e

I also tried the script ~/manticore/test/clt-tests/scripts/load_names_attr.php from the branch test/update-blocking-by-combining-blocks and see the same result all the time

@tomatolog tomatolog assigned donhardman and unassigned tomatolog Nov 18, 2024
@sanikolaev
Copy link
Collaborator

I couldn't reproduce it either anymore. @PavelShilin89 Can you confirm there's no instability issue with it anymore?

@PavelShilin89
Copy link
Contributor

The test has been added to PR - #2756 and runs correctly, the test itself has not changed, but now the bug is not reproducible. The problem is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants