Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_entry_matching_value does not handle ignore-case properly. #648

Open
haiqi96 opened this issue Jan 2, 2025 · 9 comments
Open

get_entry_matching_value does not handle ignore-case properly. #648

haiqi96 opened this issue Jan 2, 2025 · 9 comments
Labels
bug Something isn't working

Comments

@haiqi96
Copy link
Contributor

haiqi96 commented Jan 2, 2025

Bughttps://github.com/y-scope/clp/issues/new/choose

Our current implementation of get_entry_matching_value() assumes only 1 value in the dictionary will match the input string.

This assumption is correct when match is case_sensitive (since the the input value won't contain wildcard).
However, when ignore_case is switched on, it's possible more than one dictioanry variables will match the input value. (for example, all of "VAR1", "var1" and "vAr1") matches input "var1", but only one of them will be returned.
As a result, CLP won't return matching messages with the other variables.

I believe the proper fix to this issue is to ether:

  1. update the return type of the method, so it returns a vector of entry. For case-sensitive case, the vector will always contain only one entry (or 0 if no matches). For ignore-case, the vector could have more than one elements.
  2. Create two different methods, one for case-senstive match, one for non-case sensitive match so they can have their dedicated interface.

CLP version

329edf6

Environment

unrelated to OS version.

Reproduction steps

  1. Compress the attached log.
  2. run clg search with ./cmake-build-release/clg -i output/ "My custom log two var1 num"

test.log

@haiqi96 haiqi96 added the bug Something isn't working label Jan 2, 2025
@aestriplex
Copy link

aestriplex commented Jan 15, 2025

@haiqi96 I saw this issue with no one assigned, and thought I would try to develop a fix (this one). I chose to keep the boolean parameter ignore_case. Currently, all tests pass, but I haven't implemented the ones specific to my changes yet. Should I put them in the file test-EncodedVariableInterpreter.cpp, or is it better if I create a dedicated file?

@haiqi96
Copy link
Contributor Author

haiqi96 commented Jan 16, 2025

@aestriplex Thank you very much for developing the fix. The vector apporach makes sense to me.
However, there's one bug in the implementation at line. The code now adds multiple variables to the logtype and subqueries but it is only supposed to add one variable. (since it only tries to match one)

One potential way to fix this is to use an imprecise var instead of a precise var. Unfortunately I am very busy this week so won't have time to dive down into details. Can you investigate first and let me know if you have any question or concern?

@aestriplex
Copy link

Thanks @haiqi96. Yes, I will investigate and try to fix it on my own. I'll let you know

aestriplex added a commit to aestriplex/clp that referenced this issue Jan 18, 2025
@aestriplex
Copy link

@haiqi96 I changed the implementation of the encode_and_search_dictionary method. Since I noticed that the SubQuery's add_imprecise_dict_var expects two unordered sets, I changed the return type of my previous fix to get_entry_matching_value from vector to unordered set. This way we can avoid verbose casting in encode_and_search_dictionary.
Both

LogTypeDictionaryEntry::add_dict_var(logtype);

and

sub_query.add_imprecise_dict_var(encoded_vars, entries);

are called just once now.
All the unit tests (that did not test this specific feature) still work, and I executed your "Reproduction steps" as well, here's the result

teo@Mac build % ./clg -i /out_log "My custom log two var1 num"
/log/test.log:2024-12-03 12:06:37.096 My custom log two var1 num 12(StaticText)
/log/test.log:2024-12-03 12:06:37.096 My custom log two VAR1 num 12(StaticText)
/log//test.log:2024-12-03 12:06:37.096 My custom log two VaR1 num 12(StaticText)

Tomorrow I'll implement a new section in test-EncodedVariableInterpreter.cpp containing some test cases for my changes

aestriplex added a commit to aestriplex/clp that referenced this issue Jan 21, 2025
@aestriplex
Copy link

@haiqi96
I should probably have put two more test cases like

REQUIRE(var_dict_reader.get_entry_matching_value("string_not_present_in_var_strs", true).empty());
REQUIRE(var_dict_reader.get_entry_matching_value("string_not_present_in_var_strs", false).empty());

If it is okay with you, I will open a pull request with this issue linked

@haiqi96
Copy link
Contributor Author

haiqi96 commented Jan 21, 2025

@haiqi96 I should probably have put two more test cases like

REQUIRE(var_dict_reader.get_entry_matching_value("string_not_present_in_var_strs", true).empty());
REQUIRE(var_dict_reader.get_entry_matching_value("string_not_present_in_var_strs", false).empty());
If it is okay with you, I will open a pull request with this issue linked

Sorry, was dragged into some work today, I will closely look at your change tomorrow, as long as no unexpected task comes up.

@haiqi96
Copy link
Contributor Author

haiqi96 commented Jan 21, 2025

@haiqi96 I should probably have put two more test cases like

REQUIRE(var_dict_reader.get_entry_matching_value("string_not_present_in_var_strs", true).empty());
REQUIRE(var_dict_reader.get_entry_matching_value("string_not_present_in_var_strs", false).empty());
If it is okay with you, I will open a pull request with this issue linked

Sorry for getting back late.
I persoanlly feel that we don't need these two extra case, since you are not chaning the underlying logic of the dictionary matching, so I don't see what those two cases intend to catch.
Btw I assume that without your change, the newly test should fail?

Your change in general looks good, but there are a few comments/concerns:

  1. What's the motivation to switch from vector to set? in my opinion, we should minimize the change to introduce for the bug fix if possible. So I would be leaning towards keeping it as vector instead of replacing with set.
  2. The imprecise var should only apply if there are actually more than one potential dictionary variables matching the query. So in the change, we should check the number of matching values. If there's only 1 value matching, then it should be a "precise_var". If there are more than one values matching, then it should be precise var.
  3. glt is a stale workspace, and I don't remember if there's test covering it. Can you double check? if there's not test actively running for it, I would say let's not make the change for glt. We are planning to rework glt in the future, so it's ok to keep it as it is for now.

aestriplex added a commit to aestriplex/clp that referenced this issue Jan 23, 2025
@aestriplex
Copy link

@haiqi96
Without my changes, the newly test (section "Test multiple metching values") does not even compile, because in the original code the method get_entry_matching_value returned a pointer to the first matching entry, no matter if you were ignoring case or not.

  1. I put the std::vector back to the set post. The only reason I changed from std::vector (my first commit) to std::unordered_set (my second commit) in the first place was to avoid instantiating a set from the vector in encode_and_search_dictionary - because add_imprecise_dict_var expects an unordered set of entries. Anyway, I changed back to std::vector.
  2. My bad, I fixed it
  3. Neither did I find the tests for glt. I brought the code back as it was before my modification

@haiqi96
Copy link
Contributor Author

haiqi96 commented Jan 23, 2025

@aestriplex Thank you very much for making the changes again! Can you please open a pull request? There are few places that can be further polished but we can address them in the PR.

aestriplex added a commit to aestriplex/clp that referenced this issue Jan 23, 2025
aestriplex added a commit to aestriplex/clp that referenced this issue Jan 23, 2025
aestriplex added a commit to aestriplex/clp that referenced this issue Jan 23, 2025
aestriplex added a commit to aestriplex/clp that referenced this issue Jan 25, 2025
aestriplex added a commit to aestriplex/clp that referenced this issue Jan 25, 2025
aestriplex added a commit to aestriplex/clp that referenced this issue Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants