Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5 file layout and performance #3

Open
bnlawrence opened this issue Mar 25, 2024 · 1 comment
Open

HDF5 file layout and performance #3

bnlawrence opened this issue Mar 25, 2024 · 1 comment

Comments

@bnlawrence
Copy link
Collaborator

bnlawrence commented Mar 25, 2024

Here are two comparisons of opening a file on a posix file system using h5py and pyfive:

python opening_speed.py 
File Opening Time Comparison
h5py:    0.015273
pyfive:  0.005531
Additional times:  0.000124,  0.003239

File Opening Time Comparison
h5py:    0.054081
pyfive:  0.387869
Additional times:  0.000317,  0.000853

This has almost certainly got something to do with the way the file is lain down in terms of where indexes etc go, but the performance difference is heavily exacerbated when the file is on S3 ... it would be good to have the capability to diagnose this sort of thing. Could we modify pyfive to provide a "layout diagnostic view"?

The additional times are

h3 = time.time()
v = f2['var']
d = v._dataobjects
h4 = time.time()
d._get_chunk_addresses()
h5 = time.time()

h4-h3 and h5-h4, where f2 is the open pyfive file instance. It suggests the b-tree read itself is very fast.

@bnlawrence
Copy link
Collaborator Author

bnlawrence commented Mar 25, 2024

For the record, these files differ signficantly. Ncdumps are
file1.txt
and
file2.txt.

The former is smaller and only has one variable. The header and b-tree layouts will be significantly different.

@bnlawrence bnlawrence added this to the h5netcdf ready milestone Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant