Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Oddly long load times for no discernible reason. #567

Open
2 tasks done
RR-N opened this issue Apr 30, 2024 · 5 comments
Open
2 tasks done

[Bug]: Oddly long load times for no discernible reason. #567

RR-N opened this issue Apr 30, 2024 · 5 comments
Labels
category: enhancement improvements of code or code behavior status: todo something needs to be done

Comments

@RR-N
Copy link

RR-N commented Apr 30, 2024

What happened?

I have a number of different NWB files that were created with the same code. Of these, a particular dataset takes an oddly long time to load in, but only in matnwb -- pynwb loads these just fine. For comparison here and here are two datasets that were generated with the same code. I don't think filesize is a factor here, but FWIW, this dataset takes three to four times longer to load than this one or this one, which are both much bigger. May be worth noting that the conversion code was written in python, and that I've confirmed with other people on other systems that this only happens in matnwb.

I ran the MATLAB profiler on the nwbRead function to try and figure out what was going on -- figure that might prove helpful, so attached it here.

Profiler.pdf

Steps to Reproduce

tic
nwbRead(test_file)
toc

Error Message

No response

Operating System

Windows

Matlab Version


MATLAB Version: 23.2.0.2515942 (R2023b) Update 7
MATLAB License Number: 40702059
Operating System: Microsoft Windows 10 Pro Version 10.0 (Build 19045)
Java Version: Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode

MATLAB Version 23.2 (R2023b) License 40702059
Simulink Version 23.2 (R2023b) License 40702059
Bioinformatics Toolbox Version 23.2 (R2023b) License 40702059
Computer Vision Toolbox Version 23.2 (R2023b) License 40702059
Data Acquisition Toolbox Version 23.2 (R2023b) License 40702059
Deep Learning HDL Toolbox Version 23.2 (R2023b) License 40702059
Deep Learning Toolbox Version 23.2 (R2023b) License 40702059
GPU Coder Version 23.2 (R2023b) License 40702059
Image Acquisition Toolbox Version 23.2 (R2023b) License 40702059
Image Processing Toolbox Version 23.2 (R2023b) License 40702059
MATLAB Coder Version 23.2 (R2023b) License 40702059
MATLAB Compiler Version 23.2 (R2023b) License 40702059
MATLAB Compiler SDK Version 23.2 (R2023b) License 40702059
MATLAB Test Version 23.2 (R2023b) License 40702059
Mapping Toolbox Version 23.2 (R2023b) License 40702059
Medical Imaging Toolbox Version 23.2 (R2023b) License 40702059
Optimization Toolbox Version 23.2 (R2023b) License 40702059
Parallel Computing Toolbox Version 23.2 (R2023b) License 40702059
Partial Differential Equation Toolbox Version 23.2 (R2023b) License 40702059
Reinforcement Learning Toolbox Version 23.2 (R2023b) License 40702059
Signal Processing Toolbox Version 23.2 (R2023b) License 40702059
SimBiology Version 23.2 (R2023b) License 40702059
Simulink Coder Version 23.2 (R2023b) License 40702059
Simulink Compiler Version 23.2 (R2023b) License 40702059
Statistics and Machine Learning Toolbox Version 23.2 (R2023b) License 40702059
Symbolic Math Toolbox Version 23.2 (R2023b) License 40702059
System Identification Toolbox Version 23.2 (R2023b) License 40702059

Code of Conduct

@rly
Copy link
Contributor

rly commented Apr 30, 2024

This is related to NeurodataWithoutBorders/pynwb#1889 but specific to matnwb. I think the slowness has to do with the number of groups in the file. @lawrence-mbf do you have any insights?

@lawrence-mbf
Copy link
Collaborator

The profile.pdf is nice but unfortunately it seems like the slowness doesn't just come from the MatNWB implementation itself, but the internal hdf5 lib provided by MATLAB. I will have to benchmark this locally and find the places that these hdf5 libs are used.

@lawrence-mbf
Copy link
Collaborator

Okay it's like I suspected: The issue is the slowness of H5R.get_name which is painfully slow (it takes 0.11 seconds per call and it is called 1925 times in the example file). Unfortunately, afaik this is the only way to retrieve the full HDF5 path name from the raw reference data which is important for the ObjectView and RegionView objects to search through the in-memory NwbFile references. Does h5py do something different with references that avoids this problem?

@rly
Copy link
Contributor

rly commented Apr 30, 2024

Are all references resolved on initial read? Can they be deferred?

@lawrence-mbf
Copy link
Collaborator

You could cache both the ref data as well as a path to the file and do what types.untyped.DataStub does. That would require a bit of breakage in the classes themselves but may not be that big of a change.

@ehennestad ehennestad added category: enhancement improvements of code or code behavior status: todo something needs to be done labels Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior status: todo something needs to be done
Projects
None yet
Development

No branches or pull requests

4 participants