Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core-clp): get_entry_matching_value should handle ignore-case properly. #690

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

aestriplex
Copy link

@aestriplex aestriplex commented Jan 23, 2025

This pull request implements a fix for the issue #648

Here's a short recap of the main changes:

  1. get_entry_matching_value now returns a vector
  2. encode_and_search_dictionary adds imprecise dict var when there are more matching entries

There are four commits because I worked on it taking advantage of @haiqi96's advices (see the discussion on #648). Should I squash them?

Summary by CodeRabbit

  • New Features

    • Enhanced dictionary search functionality to support multiple matching entries.
    • Improved variable matching with more flexible search capabilities.
  • Bug Fixes

    • Updated error handling for empty variable strings.
    • Refined dictionary entry retrieval process.
  • Tests

    • Added comprehensive test cases for multiple matching dictionary entries.
    • Verified case-sensitive and case-insensitive search behaviours.

The changes improve the system's ability to handle complex dictionary searches, allowing for more nuanced and precise variable matching across different components.

Copy link
Contributor

coderabbitai bot commented Jan 23, 2025

Walkthrough

The pull request introduces significant modifications to the get_entry_matching_value method across multiple files in the CLP project. The primary change involves updating the method's return type from a single pointer to a vector of pointers, enabling the retrieval of multiple dictionary entries. This modification allows for more flexible matching, particularly in case-insensitive searches where multiple entries might match the input string. The changes are consistent across different components of the project, including dictionary readers, variable interpreters, and search output processing.

Changes

File Change Summary
components/core/src/clp/DictionaryReader.hpp Updated get_entry_matching_value to return std::vector<EntryType const*> instead of EntryType const*
components/core/src/clp/EncodedVariableInterpreter.cpp Modified logic to handle multiple dictionary entries during encoding and searching
components/core/src/clp_s/DictionaryReader.hpp Similar update to return type for get_entry_matching_value method
components/core/src/clp_s/search/Output.cpp Updated to process multiple variable dictionary entries
components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp Enhanced handling of multiple dictionary entries
components/core/tests/test-EncodedVariableInterpreter.cpp Added test section for multiple matching values

Sequence Diagram

sequenceDiagram
    participant DR as DictionaryReader
    participant EVI as EncodedVariableInterpreter
    participant Output as Search Output

    DR->>DR: Search for entries
    DR-->>EVI: Return vector of matching entries
    EVI->>EVI: Process multiple entries
    EVI-->>Output: Pass encoded variables
    Output->>Output: Populate string queries
Loading

Possibly related issues

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (3)

39-39: Consider adding more context or logs upon returning false
When the dictionary returns no entries, returning false implies no match. For traceability and debugging, consider logging a brief message indicating the unmatched variable string.


46-51: Avoid shadowing and preserve clarity
The variable encoded_var declared inside this block may overshadow the encoded_var declared above. To improve clarity and maintain scoping hygiene, consider renaming or reusing the existing variable.


53-61: Confirm efficiency for large sets of matches
Using std::unordered_set for both the IDs and the entries is a good approach for preventing duplicates. If the dictionary can grow large, ensure that insertion and iteration remain performant.

components/core/src/clp_s/DictionaryReader.hpp (1)

159-181: Evaluate case-sensitive early exit logic
Early exit in the case-sensitive branch returns after the first match. This design is acceptable if the business logic requires only one match; otherwise, you may want to continue searching for potential duplicates. Also, consider that repeatedly converting each dictionary entry’s value to uppercase in the else-branch may affect performance on large dictionaries.

components/core/src/clp/DictionaryReader.hpp (1)

236-258: Ensure consistency in case handling
The code pattern for case-sensitive versus case-insensitive matches is consistent. Similar to the other DictionaryReader, consider potential performance costs of repeatedly calling boost::algorithm::to_upper_copy() on each iteration.

components/core/tests/test-EncodedVariableInterpreter.cpp (2)

379-379: Correct typographical error in test name
The word “metching” should be “matching.”

-    SECTION("Test multiple metching values") {
+    SECTION("Test multiple matching values") {

382-405: Comprehensive case-insensitive verification
By storing multiple string variants and confirming that only the case-insensitive retrieval returns all of them, you have validated the updated dictionary logic thoroughly. Consider verifying each returned entry’s content for added rigour.

components/core/src/clp_s/search/Output.cpp (2)

935-935: Consider checking if the returned vector is empty before use.
While it's not strictly necessary, confirming that entries is non-empty (or otherwise handling the empty case) can improve clarity and prevent potential unexpected behaviour in future modifications.


940-940: Reserve space in the set before inserting to avoid repeated rehashing.
Since you already know the approximate number of elements from the size of entries, reserving that capacity in matching_vars (e.g., matching_vars.reserve(entries.size())) can marginally improve performance and reduce memory overhead.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66067d6 and ea41a9c.

📒 Files selected for processing (6)
  • components/core/src/clp/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp/EncodedVariableInterpreter.cpp (1 hunks)
  • components/core/src/clp_s/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp_s/search/Output.cpp (1 hunks)
  • components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (1 hunks)
  • components/core/tests/test-EncodedVariableInterpreter.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/tests/test-EncodedVariableInterpreter.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/DictionaryReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/DictionaryReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/search/Output.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/EncodedVariableInterpreter.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (9)
components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (2)

37-38: Ensure consistent handling of multiple matched entries
Changing the function call to return a vector works well for retrieving multiple entries, thereby supporting the new requirement for imprecise matching. This approach is clear and logically sound.


45-45: Verify classification before calling add_non_double_var
While this is logically consistent with the dictionary path, verify that treating all dictionary-based entries as non_double is appropriate in all cases. If some entries are numeric or require special handling, additional checks may be needed.

components/core/src/clp_s/DictionaryReader.hpp (1)

67-69: Accurate docstring update
The revised docstring correctly states that the method returns a vector of matching entries, which may be empty.

components/core/src/clp/DictionaryReader.hpp (1)

85-87: Docstring alignment with new return type
The documentation now accurately states that multiple matching entries can be returned.

components/core/src/clp/EncodedVariableInterpreter.cpp (4)

389-390: Vector-based retrieval
Switching to retrieving all matching entries in a vector offers more flexibility for handling multiple matches. This aligns well with your updated dictionary logic.


391-394: Early return on empty matches
Returning false when no dictionary entries are found is a clear approach. Ensure callers handle this outcome properly.


398-403: Single match flow
Dispatching a single dictionary entry and returning immediately keeps the logic straightforward.


404-413: Handling multiple matches
Encoding and storing all dictionary entries in an imprecise variable set is a practical approach for ambiguous matches. No issues here.

components/core/tests/test-EncodedVariableInterpreter.cpp (1)

409-414: Prudent test clean-up
Deleting the dictionary files at the end of the test prevents workspace pollution.

if (false == ignore_case) {
for (auto const& entry : m_entries) {
if (entry.get_value() == search_string) {
return &entry;
entries.push_back(&entry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinZhihao-723 Should we use emplace_back instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to use push_back for pointer types. Emplace is only required if the underlying object requires a constructor.

components/core/src/clp/EncodedVariableInterpreter.cpp Outdated Show resolved Hide resolved
components/core/src/clp/EncodedVariableInterpreter.cpp Outdated Show resolved Hide resolved

if (entries.size() == 1) {
clp::VariableDictionaryEntry const* entry = entries[0];
auto encoded_var = encode_var_dict_id(entry->get_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to shadow the encoded_var declared out side of this scope, do we need a new variable here?

Or maybe, we can refactor the code to not have a "globally declared" encoded_var

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was an oversight. It seems to me that remove the "globally declared" encoded_var here may make the first two cases a little more verbose. At the end of the day, this seems clear enough:

encoded_variable_t encoded_var;
if (convert_string_to_representable_integer_var(var_str, encoded_var)) {
    LogTypeDictionaryEntry::add_int_var(logtype);
    sub_query.add_non_dict_var(encoded_var);
} else if (convert_string_to_representable_float_var(var_str, encoded_var)) {
    LogTypeDictionaryEntry::add_float_var(logtype);
    sub_query.add_non_dict_var(encoded_var);
} else {
    auto const entries = var_dict.get_entry_matching_value(var_str, ignore_case);
    ...
}

@@ -82,9 +82,9 @@ class DictionaryReader {
* Gets the entry exactly matching the given search string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should update the docstring.

Suggested change
* Gets the entry exactly matching the given search string
* Gets the entry matching the given search string

@kirkrodrigues any suggestions on how to account for multiple possible matching? "entry(entries)"?

vector<clp::variable_dictionary_id_t> var_ids;
clp::LogTypeDictionaryEntry logtype_dict_entry;
for (auto var_str : var_strs) {
EncodedVariableInterpreter::encode_and_add_to_dictionary(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of directly using the var_strs as msg argument for encode_and_add_to_dictionary, can we construct a message with a dummy log type?

I guess what you can do can be similar to

string const msg_template  = "here is a string with a dictionary var: ".

for var_str in var_strs:
    auto const msg = msg_template + var_id
    ...

components/core/tests/test-EncodedVariableInterpreter.cpp Outdated Show resolved Hide resolved
components/core/tests/test-EncodedVariableInterpreter.cpp Outdated Show resolved Hide resolved
@haiqi96
Copy link
Contributor

haiqi96 commented Jan 24, 2025

Thanks for opening the PR.
When the PR is approved, you will need to squash the changes so it looks like a single commit. Note that you don't need to squash it manaully, there is a github option to allow your merge the PR with squashing. We can discuss after we get there.

I have left a batch of comments.
Note that you will also need to change the title of the PR. you can refer to the past merged PRs to get an idea of how it should look like.

@aestriplex aestriplex changed the title get_entry_matching_value should handle ignore-case properly. fix(core-clp): get_entry_matching_value should handle ignore-case properly. Jan 25, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
components/core/src/clp_s/DictionaryReader.hpp (2)

69-70: Consider using std::vector<EntryType const*> on a single line.

The return type declaration spans two lines unnecessarily.

-    std::vector<EntryType const*>
-    get_entry_matching_value(std::string const& search_string, bool ignore_case) const;
+    std::vector<EntryType const*> get_entry_matching_value(std::string const& search_string, bool ignore_case) const;

159-182: Consider using emplace_back instead of push_back.

The implementation looks good, but consider using emplace_back for potentially better performance when adding entries to the vector.

-                entries.push_back(&entry);
+                entries.emplace_back(&entry);

Also at:

-                entries.push_back(&entry);
+                entries.emplace_back(&entry);
components/core/src/clp/EncodedVariableInterpreter.cpp (3)

389-390: Consider using auto with explicit type for better readability.

The type of entries could be made more explicit while still using auto.

-        auto const entries = var_dict.get_entry_matching_value(var_str, ignore_case);
+        auto const entries = std::vector<VariableDictionaryEntry const*>{var_dict.get_entry_matching_value(var_str, ignore_case)};

398-403: Consider using std::size instead of size().

The single entry case is handled correctly, but consider using std::size for consistency with modern C++.

-        if (entries.size() == 1) {
+        if (std::size(entries) == 1) {

405-413: Consider reserving space in the sets for better performance.

When dealing with multiple entries, consider reserving space in both sets to avoid reallocations.

         std::unordered_set<encoded_variable_t> encoded_vars;
+        encoded_vars.reserve(entries.size());
         std::unordered_set<clp::VariableDictionaryEntry const*> const entries_set(
                 entries.begin(),
                 entries.end()
         );
         for (auto const& entry : entries) {
             encoded_vars.insert(encode_var_dict_id(entry->get_id()));
         }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea41a9c and 61dea57.

📒 Files selected for processing (6)
  • components/core/src/clp/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp/EncodedVariableInterpreter.cpp (1 hunks)
  • components/core/src/clp_s/DictionaryReader.hpp (2 hunks)
  • components/core/src/clp_s/search/Output.cpp (1 hunks)
  • components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp (1 hunks)
  • components/core/tests/test-EncodedVariableInterpreter.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • components/core/src/clp_s/search/clp_search/EncodedVariableInterpreter.cpp
  • components/core/tests/test-EncodedVariableInterpreter.cpp
🧰 Additional context used
📓 Path-based instructions (4)
components/core/src/clp/DictionaryReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/EncodedVariableInterpreter.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/DictionaryReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/search/Output.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (4)
components/core/src/clp_s/DictionaryReader.hpp (1)

67-67: LGTM! Documentation updated to reflect the new return type.

The docstring accurately describes that the function now returns a vector of entries.

components/core/src/clp/DictionaryReader.hpp (1)

236-259: Same suggestions as in the other DictionaryReader.hpp file apply here.

Consider:

  1. Using std::vector<EntryType const*> on a single line
  2. Using emplace_back instead of push_back
components/core/src/clp/EncodedVariableInterpreter.cpp (1)

391-397: LGTM! Early return for empty entries.

The empty check and early return is well-placed and maintains the function's contract.

components/core/src/clp_s/search/Output.cpp (1)

935-940: LGTM! Correctly handles multiple entries from the dictionary.

The code properly retrieves and processes multiple entries from the variable dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants