-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading time differs for compressed and uncompressed documents #73
Comments
Hi @jurekkow, Thank you for the detailed report on the ROI reading speed issue. I've taken some time to analyze the situation based on the numbers you've provided and have some insights to share. I found that decompressing a single tile (2040x2040 pixels, 16-bit grayscale, which amounts to approximately 8MB) takes around 45ms; so we're looking at a decompression throughput of about 200MB/s. In contrast, with the uncompressed CZI, the datarate we can see is around ~2000MB/s assuming an SSD-storage (and potentially even higher if the file resides in cache.). Taking these observations into account, the results you're encountering appear to be in line with "expected system behavior". At this juncture, there doesn't seem to be a malfunction or bug in the process. I'd have little hope that the performance of the JPGXR-decoding (or, zstd-decoding for that matter) can be improved here. I compared the decoding speed of libCZI's bundled jxr-codec and Windows-WIC-JPGXR-codec, and they came out about the same. Note that (on Windows) libCZI can work with both codecs (e.g. with CZICmd, the option However - surely there are options to improve performance of operation in this case.
So, the first two bullets seem feasible and reasonably straightforward to me. The third bullet should have an enormous effect, or I'd think the speed-up can be orders of magnitude provided that your access pattern is favorable. Maybe you can give some insight about whether this would be desirable/applicable for your application. Other than those ideas - I have to confess, the best option for the time being is to decompress the document before operation. At least I came to the conclusion, that there is no obvious flaw here, and the decompression-performance is as fast as we can expect it. |
Hi @ptahmose. Thanks for the thorough investigation. I do read many ROIs from the same reader object, and the ROIs are requested, in a particular order starting from top left moving to right, and then to the following line, exactly how you described. My only worry is that it may cause OOM issues, for reading large documents. To avoid that I'd need to add some cache invalidation logic. |
At this point, the answer is a clear "no".
Other than that, with the current state of the pylibczi-API, I'd think the following would be possible:
Performance-wise this idea should be on the same level as "introduce caching to libCZI", and it would benefit greatly from leveraging concurrency with the decoding, however the latency would obviously be rather bad (=the time it takes until the first small ROI would be available). Of course, it will introduce some complexity at the application-/Python-level. But, it would not require any changes to pylibczi/libczi I reckon. Next steps from my side:
|
Another idea which crossed my mind - instead of using a "ROI-based access" (i.e. where the application is requesting an arbitrary ROI), maybe reading the data "tile by tile" could be worth considering. I.e. there would be no multi-tile-composition, the application would just read the existing tiles, one after the other. If there is no need on the application side for tile-composition, this should be the easiest approach. An additional benefit would be - parallelization could then take place on the Python-layer. |
I just would like to clarify this part:
I understood the original idea as caching of subblocks not ROIs.
This would require a rather big change in the application logic on our end. Also if here by "tile" you mean subblock, I don't think that pylibCZIrw currently supports that. |
I created new tickets for the three ideas we came up with in the course of this discussion. |
wrt to #76 - I'd give this idea a lower priority at the moment, so I'd plan to conclude activities in this context with the two optimizations which have been done so far for the time being. |
Describe the bug
There is a significant difference in ROI reading speed, depending on the method of compression.
To Reproduce
The following Python script:
Outputs:
Expected behavior
Reading times of compressed and uncompressed documents don't differ, or differ less than 2x.
Desktop (please complete the following information):
Additional context
The original, JPEGXR-compressed document, was decompressed and ZSTD-compressed again using czicompress tool.
Test document will be delivered separately.
The text was updated successfully, but these errors were encountered: