-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
heif: geoheif properties cleanup #11396
Conversation
gcore/geoheif.cpp
Outdated
@@ -124,12 +124,12 @@ void GeoHEIF::extractSRS(const uint8_t *payload, size_t length) const | |||
} | |||
else if (crsEncoding == "curi") | |||
{ | |||
if ((crs.at(0) != '[') || (crs.at(crs.length() - 1) != ']')) | |||
if ((crs.at(0) != '[') || (crs.at(crs.length() - 2) != ']')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change looks wrong to me. The last character is at index length - 1, not -2. Or isn't the closing bracket the last character?
if it is, more idiomatic would be:
if ((crs.at(0) != '[') || (crs.at(crs.length() - 2) != ']')) | |
if ((crs.at(0) != '[') || (crs.back() != ']')) |
or maybe there's a trailing null character... ? In which case, a code comment would be welcome
in any case, a check on crs.length() should also be added. Otherwise crs.at() might throw an exception. In the GDAL code base we tend to avoid relying on exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a null character to terminate the string. I've updated the logic to check that is present, which hopefully makes the code clearer. I also added a comment on the format.
@bradh @rouault It seems to be something to do with that CURIE check. The following condition evaluates to true: if ((crs.at(0) != '[') || (crs.at(crs.length() - 2) != ']') ||
(crs.at(crs.length() - 1) != '\0')) and it fails to set up the CRS properly. The CURIE string is: Also, because the context here is unambiguously a CURIE (because it's a So I think there was something to @rouault 's comment above which is not handled by the clarification. The following seems to work: if ((crs.at(0) != '[') || (crs.at(crs.length() - 1) != ']') /*||
(crs.at(crs.length()) != '\0')*/) and I had to comment out the check for It's still flipped 90 degrees though, so trying to understand why that is:
|
when constructing a std::string from a pointer and length, a potential nul character in the range will be imported in the string, contrary to constructing only from the pointer, where that stops at the first nul character. For safety reasons constructing with pointer and length is the best thing to do, but is the spec unambiguous about if there is a nul character or not in the file ? (some formats like TIFF have a length, but requires the last byte to be a nul character). Perhaps there are test data files that use both interpretations, causing this mess? |
Hmm just noticed this error:
probably this line is also wrong: std::string curie = crs.substr(1, crs.length() - 3); Unless this is something that we are not outputting correctly in our own files? The following diff is necessary for things to work with the GeoHEIF files returned by our GNOSIS Map Server right now: diff --git a/gcore/geoheif.cpp b/gcore/geoheif.cpp
index 723fa9e1af..3c6c5fb5f2 100644
--- a/gcore/geoheif.cpp
+++ b/gcore/geoheif.cpp
@@ -143,13 +143,12 @@ void GeoHEIF::extractSRS(const uint8_t *payload, size_t length) const
else if (crsEncoding == "curi")
{
// null terminated string in the form "[EPSG:4326]"
- if ((crs.at(0) != '[') || (crs.at(crs.length() - 2) != ']') ||
- (crs.at(crs.length() - 1) != '\0'))
+ if ((crs.at(0) != '[') || (crs.at(crs.length() - 1) != ']'))
{
CPLDebug("GeoHEIF", "CRS CURIE is not a safe CURIE");
return;
}
- std::string curie = crs.substr(1, crs.length() - 3);
+ std::string curie = crs.substr(1, crs.length() - 2);
size_t separatorPos = curie.find(':');
if (separatorPos == std::string::npos)
{ @bradh We do not currently include a NUL character at the end of the CURIE string in the output. Even if we did include one, wouldn't char a[] = { 'H', 'e', 'l', 'l', 'o', 0 };
std::string s(a, 6);
printf("length() = %d", (int)s.length()); prints out 6. |
no that's fine, and the saner approach IMHO of not having an explicit nul terminator in the format (because from a security point of view, even if it seems to be a good intent to include it for C-like consumers, on practice you need to check if it is present or not in case the file would be corrupt or hostile. so it is better not to include it at all, and let implemenations add it themselves if they need to). If you do std::string("foo", 3), std::string will automatically add the nul terminator character
yes that's fully "expected" from the way the std::string(pointer, size) constructor works. |
It seems like that's the case and our implementation not adding the NUL character is causing the mess.
I guess that Hopefully @bradh will tell us how it should be :) |
Yes. The draft spec says its a This is a common pattern in the ISO box structures. In this specific case, we need the terminator to tell where the CRS ends and the epoch starts. (Likely you were working around my original implementation bug, sorry about that). And yes, it has to be a safe Compact URI. While we could make it optional, there are enough options (OK, too many options) already. So just do it one safe way, and let OGC API users copy it out unchanged. |
Figured this was the case, so I guess I need to fix our implementation :) Thanks! Though from the C++ code perspective, it seems like the -1 would be easier to understand if it were done when reading the data from the box, rather than in the later checks... @bradh It might be good to clarify this in D010, because it's not totally obvious to those who can't afford the 216 Swiss francs that it implies a NUL terminating character (you could technically subtract the 4 bytes for the epoch from the overall box size to determine the length). |
The null terminator is for the string, not for the property. The implementation concept is that the GeoHEIF property parsing logic is centralised, rather than each implementation needing to do it (or part of it).
I think this is valid. Will create a ticket for that on the GIMI gitlab, since its not about the GDAL part. |
I should probably shut up, but I find it a bit sad that in 2024 we still use such TIFF era designs where you need to look for a nul terminator character to locate when an optional element is located and that we are using binary encodings to save a few bytes of metadata, which this whole discussion demonstrates to cause implementation troubles. Why not a darn JSON document embedded in a box with all geo stuff in a readable and easily extensible way... ? |
Yes but I meant that this could be done for all HEIF |
I must say that dealing with the HEIF boxes and big-endianness is not something today's developers will find easy to deal with, and the optional elements changing their size makes them all the more complicated. For compact binary encodings I've been advocating for UBJSON lately. |
I recognise the frustration, but there are a couple of considerations. The main reason for this breakdown is that it is intended to also be used for image sequences and video, not just a single image. Being able to share properties saves quite a lot of space, and efficiency does matter when there are a lot of properties (like every corner point in every frame). The other is that you are seeing it through the lens of three GeoHEIF properties, rather than everything that is already in HEIF including all of ISO Base Media File Format. JSON isn't that common there, and it would be an additional generator side dependency for something like a camera. By contrast, HEIF writers already have the logic to write out a string. So JSON is an option, and you can just associate it to the image(s) using |
@bradh I've updated our server to include the NUL terminating character in the I've also rebased #11384 on this PR, and everything seems to be working fine now. |
What does this PR do?
Modifies the HEIF part of the GeoHEIF parsing to extract properties in Init(). It also reduces the amount of code by collecting all the properties at once (and only once), which should offset the cost of getting properties where we have no geospatial context.
It also removes a bit of debug output from the avif_heif.py test that is not needed.
It also removes some unused code (ExtractUserDescription) from the HEIF driver that should have been removed before merging.
What are related issues/pull requests?
This is related to #11384 in that it addresses the problem where the order of operations makes a difference to the outcome. It doesn't address the part of #11384 that fixes the axis order though.
@jerstlouis Can you check that part?
Tasklist
Environment
N/A