-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix exporting projects with honeypots #8597
Fix exporting projects with honeypots #8597
Conversation
/check |
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes in this pull request involve enhancements to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CommonData
participant ProjectData
participant CvatExtractor
User->>CommonData: Request included frames
CommonData->>CommonData: Check _initialized_included_frames
alt Not initialized
CommonData->>CommonData: Calculate included frames
CommonData->>CommonData: Set _initialized_included_frames
end
CommonData->>User: Return included frames
User->>CvatExtractor: Export media files
CvatExtractor->>CommonData: Dump media files
CommonData->>CvatExtractor: Return media files
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
❌ Some checks failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
cvat/apps/dataset_manager/formats/cvat.py (1)
Line range hint
1373-1389
: Consider using a public API for excluded frames.The function now correctly handles honeypots by excluding them from export, which is good for security. However, it uses the protected member
_excluded_frames
. Consider:
- Making this attribute public if it's part of the stable API
- Adding a public method to check if a frame should be excluded
Also, the comment could be more descriptive:
- # exclude deleted frames and honeypots + # Skip deleted frames and honeypot frames (frames used for quality control)cvat/apps/dataset_manager/bindings.py (1)
1118-1121
: Improve readability by avoiding backslashes for line continuationConsider using parentheses to wrap the expression instead of using a backslash for line continuation. This enhances readability and adheres to PEP 8 guidelines.
Apply this diff to improve the code:
- "path": mangle_image_name(db_image.path, defaulted_subset, original_names) \ - if not db_image.is_placeholder else db_image.path, + "path": ( + mangle_image_name(db_image.path, defaulted_subset, original_names) + if not db_image.is_placeholder else db_image.path + ),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- cvat/apps/dataset_manager/bindings.py (5 hunks)
- cvat/apps/dataset_manager/formats/cvat.py (3 hunks)
🔇 Additional comments (4)
cvat/apps/dataset_manager/formats/cvat.py (2)
1416-1416
: LGTM!The call to
dump_media_files
has been correctly updated to match the new function signature, maintaining proper handling of project data export with subsets.
Line range hint
1373-1416
: Verify all call sites are updated.Let's verify that all calls to
dump_media_files
have been updated and that the frame filtering logic is consistent.✅ Verification successful
Based on the search results, I can now generate the final response:
Frame filtering logic is consistent across the codebase
The verification shows that:
- All calls to
dump_media_files
are within the same file and use the correct instance_data parameter- The
_excluded_frames
handling is consistently implemented:
- Initialized and managed in
bindings.py
- Used correctly for frame filtering in
cvat.py
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining calls to dump_media_files with project_data # and verify frame filtering consistency # Test 1: Search for calls to dump_media_files echo "Checking dump_media_files calls..." rg -p "dump_media_files\(" --type py # Test 2: Search for other uses of _excluded_frames to ensure consistent frame filtering echo "Checking frame filtering consistency..." rg "_excluded_frames" --type pyLength of output: 1092
cvat/apps/dataset_manager/bindings.py (2)
288-288
: Efficient caching with_initialized_included_frames
The introduction of
_initialized_included_frames
to cache the included frames improves performance by avoiding redundant computations inget_included_frames
.
540-547
: Ensure cache validity inget_included_frames
While caching included frames enhances performance, ensure that
_initialized_included_frames
remains valid if the frame data changes after initialization. If frames can be modified after the cache is set, consider implementing a mechanism to invalidate or update the cache accordingly.
for task_id, frame in sorted(self._frame_info): | ||
if not self._tasks_data.get(task_id): | ||
self.init_task_data(task_id) | ||
|
||
task_included_frames = self._tasks_data[task_id].get_included_frames() | ||
if (task_id, frame) not in self._deleted_frames and frame in task_included_frames: | ||
get_frame(task_id, frame) | ||
|
||
for t_data in self.task_data: | ||
task: Task = t_data.db_instance | ||
|
||
for task in self._db_tasks.values(): | ||
anno_manager = AnnotationManager( | ||
self._annotation_irs[task.id], dimension=self._annotation_irs[task.id].dimension | ||
) | ||
task_included_frames = t_data.get_included_frames() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor to avoid redundant task data initialization
The loop starting at line 1281 initializes task_data
for each task_id
if not already done. However, the subsequent loop starting at line 1289 iterates over self.task_data
, which also initializes task_data
when necessary. Consider refactoring to eliminate the redundancy and improve efficiency.
|
||
def init_task_data(self, task_id: int) -> TaskData: | ||
try: | ||
task = self._db_tasks[task_id] | ||
except KeyError as ex: | ||
raise Exception("There is no such task in the project") from ex | ||
|
||
task_data = TaskData( | ||
annotation_ir=self._annotation_irs[task_id], | ||
db_task=task, | ||
host=self._host, | ||
create_callback=self._task_annotations[task_id].create \ | ||
if self._task_annotations is not None else None, | ||
) | ||
task_data._MAX_ANNO_SIZE //= len(self._db_tasks) | ||
task_data.soft_attribute_import = self.soft_attribute_import | ||
self._tasks_data[task_id] = task_data | ||
|
||
return task_data | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raise a more specific exception in init_task_data
Instead of raising a generic Exception
when a task is not found, consider raising a more specific exception like KeyError
or creating a custom exception. This improves error handling and clarity.
Handle potential division by zero in _MAX_ANNO_SIZE
calculation
In line 1402, ensure that len(self._db_tasks)
is not zero before performing integer division to prevent a possible ZeroDivisionError
.
Apply this diff to handle the potential error:
+ if len(self._db_tasks) == 0:
+ raise ValueError("No tasks available to initialize task data.")
task_data._MAX_ANNO_SIZE //= len(self._db_tasks)
Committable suggestion was skipped due to low confidence.
c236751
to
602e6cd
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8597 +/- ##
===========================================
- Coverage 74.29% 74.25% -0.04%
===========================================
Files 401 401
Lines 43379 43395 +16
Branches 3945 3945
===========================================
- Hits 32227 32223 -4
- Misses 11152 11172 +20
|
@property | ||
def task_data(self): | ||
for task_id, task in self._db_tasks.items(): | ||
for task_id, _ in self._db_tasks.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for task_id, _ in self._db_tasks.items(): | |
for task_id in self._db_tasks: |
Or maybe with .keys()
Quality Gate passedIssues Measures |
Motivation and context
PR fixes an issue with exporting project datasets: task honeypots are no longer exported
How has this been tested?
Checklist
develop
branch- [ ] I have updated the documentation accordingly- [ ] I have increased versions of npm packages if it is necessary(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation