Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: Returning CachedIrodsPath for data objects #283

Merged
merged 9 commits into from
Nov 15, 2024
Merged

Conversation

chStaiger
Copy link
Collaborator

To give some performance boost when searching for data objects the search function will now return a CachedIrodsPath and set the the checksum and the size to the value the search returns.

That gives a splendid improvement.

import time

search_start = time.time()
data = search_data(session, path_pattern="%.txt")
search_end = time.time()
print(f"Search time {search_end-search_start}")
for d in data:
    d.size
size_end = time.time()
print(f"Size time {size_end-search_end}")
  • Returning and IrodsPath for data objects and fetching the size

    Search time 2.733164072036743
    Size time 67.01198482513428
    
  • Returning a CachedIrodsPath with values from search

    Search time 2.3439738750457764
    Size time 0.0004801750183105469
    

@chStaiger
Copy link
Collaborator Author

@jjkoehorst Does this solve your performance issue?

@chStaiger chStaiger requested a review from qubixes November 12, 2024 22:24
@chStaiger chStaiger changed the title Returning CachedIrodsPath for data objects Search: Returning CachedIrodsPath for data objects Nov 12, 2024
Copy link
Collaborator

@qubixes qubixes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there seems almost no performance impact generally for returning an extra column, this looks like a good idea. Just one small comment, otherwise seems like a sensible implementation!

Comment on lines 195 to 196
key_map = [(k.icat_key, k) for k in item.keys()]
for n_key, o_key in key_map:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about simplifying it to simply looping over item.keys()? Or am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also possible. I can then transform the key object to a string in the loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just saves one line of code, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed.

ibridges/search.py Outdated Show resolved Hide resolved
@chStaiger chStaiger merged commit 02e982a into develop Nov 15, 2024
12 checks passed
@chStaiger chStaiger deleted the perfsearch branch November 15, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants