heif: geoheif properties cleanup #11396

bradh · 2024-11-30T02:07:16Z

What does this PR do?

Modifies the HEIF part of the GeoHEIF parsing to extract properties in Init(). It also reduces the amount of code by collecting all the properties at once (and only once), which should offset the cost of getting properties where we have no geospatial context.

It also removes a bit of debug output from the avif_heif.py test that is not needed.

It also removes some unused code (ExtractUserDescription) from the HEIF driver that should have been removed before merging.

What are related issues/pull requests?

This is related to #11384 in that it addresses the problem where the order of operations makes a difference to the outcome. It doesn't address the part of #11384 that fixes the axis order though.

@jerstlouis Can you check that part?

Tasklist

Make sure code is correctly formatted (cf pre-commit configuration)
Add test case(s)
[N/A] Add documentation
[N/A] Updated Python API documentation (swig/include/python/docs/)
Review
Adjust for comments
All CI builds and checks have passed

Environment

N/A

rouault · 2024-11-30T13:28:32Z

gcore/geoheif.cpp

@@ -124,12 +124,12 @@ void GeoHEIF::extractSRS(const uint8_t *payload, size_t length) const
    }
    else if (crsEncoding == "curi")
    {
-        if ((crs.at(0) != '[') || (crs.at(crs.length() - 1) != ']'))
+        if ((crs.at(0) != '[') || (crs.at(crs.length() - 2) != ']'))


this change looks wrong to me. The last character is at index length - 1, not -2. Or isn't the closing bracket the last character?
if it is, more idiomatic would be:

Suggested change

if ((crs.at(0) != '[') || (crs.at(crs.length() - 2) != ']'))

if ((crs.at(0) != '[') || (crs.back() != ']'))

or maybe there's a trailing null character... ? In which case, a code comment would be welcome

in any case, a check on crs.length() should also be added. Otherwise crs.at() might throw an exception. In the GDAL code base we tend to avoid relying on exceptions

There is a null character to terminate the string. I've updated the logic to check that is present, which hopefully makes the code clearer. I also added a comment on the format.

jerstlouis · 2024-11-30T23:20:12Z

@bradh @rouault extractSRS() seems to be called first with this, but unfortunately even with my fix in #11384 rebased on this, in a twist of fate which sees Murphy's law joining forces with the Curse of the Out-of-order Axes, the image is shown rotated 90 degree.

It seems to be something to do with that CURIE check. The following condition evaluates to true:

         if ((crs.at(0) != '[') || (crs.at(crs.length() - 2) != ']') ||
            (crs.at(crs.length() - 1) != '\0'))

and it fails to set up the CRS properly.

The CURIE string is: [EPSG:4326].

Also, because the context here is unambiguously a CURIE (because it's a curi box), a safe CURIE is not essential, so probably either EPSG:4326 or [EPSG:4326] could be acceptable. The whole point of safe CURIEs is to recognize a string as a CURIE where something other than a CURIE is accepted (e.g., a URN or http(s) URI).

So I think there was something to @rouault 's comment above which is not handled by the clarification.

The following seems to work:

        if ((crs.at(0) != '[') || (crs.at(crs.length() - 1) != ']') /*||
            (crs.at(crs.length()) != '\0')*/)

and I had to comment out the check for '\0' because it otherwise crashes -- so I guess there really is no null character in crs itself after it becomes a C++ string.

It's still flipped 90 degrees though, so trying to understand why that is:

--> GeoHEIF::extractSRS()
Found a curi CRS
curi: [EPSG:4326]
importing srs...
--> GeoHEIF::extractSRS()
Found a curi CRS
curi: [EPSG:4326]
importing srs...
--> GeoHEIF::GetGeoTransform()
axes is not set in call to GetGeoTransform()
--> GeoHEIF::GetGeoTransform()
axes is not set in call to GetGeoTransform()
--> GeoHEIF::GetSpatialRef()
--> GeoHEIF::GetSpatialRef()
--> GeoHEIF::GetGeoTransform()
axes is not set in call to GetGeoTransform()
--> GeoHEIF::GetSpatialRef()
--> GeoHEIF::extractSRS()
Found a curi CRS
curi: [EPSG:4326]
importing srs...
ERROR 1: PROJ: proj_create: crs not found: EPSG:432
--> GeoHEIF::GetGeoTransform()
axes is not set in call to GetGeoTransform()
--> GeoHEIF::GetGeoTransform()
axes is not set in call to GetGeoTransform()
--> GeoHEIF::GetSpatialRef()
--> GeoHEIF::GetSpatialRef()
--> GeoHEIF::GetGeoTransform()
axes is not set in call to GetGeoTransform()
--> GeoHEIF::GetSpatialRef()

rouault · 2024-11-30T23:31:29Z

and I had to comment out the check for '\0' because it otherwise crashes

when constructing a std::string from a pointer and length, a potential nul character in the range will be imported in the string, contrary to constructing only from the pointer, where that stops at the first nul character. For safety reasons constructing with pointer and length is the best thing to do, but is the spec unambiguous about if there is a nul character or not in the file ? (some formats like TIFF have a length, but requires the last byte to be a nul character). Perhaps there are test data files that use both interpretations, causing this mess?

jerstlouis · 2024-11-30T23:31:45Z

Hmm just noticed this error:

ERROR 1: PROJ: proj_create: crs not found: EPSG:432

probably this line is also wrong:

std::string curie = crs.substr(1, crs.length() - 3);

Unless this is something that we are not outputting correctly in our own files?

The following diff is necessary for things to work with the GeoHEIF files returned by our GNOSIS Map Server right now:

diff --git a/gcore/geoheif.cpp b/gcore/geoheif.cpp
index 723fa9e1af..3c6c5fb5f2 100644
--- a/gcore/geoheif.cpp
+++ b/gcore/geoheif.cpp
@@ -143,13 +143,12 @@ void GeoHEIF::extractSRS(const uint8_t *payload, size_t length) const
     else if (crsEncoding == "curi")
     {
         // null terminated string in the form "[EPSG:4326]"
-        if ((crs.at(0) != '[') || (crs.at(crs.length() - 2) != ']') ||
-            (crs.at(crs.length() - 1) != '\0'))
+        if ((crs.at(0) != '[') || (crs.at(crs.length() - 1) != ']'))
         {
             CPLDebug("GeoHEIF", "CRS CURIE is not a safe CURIE");
             return;
         }
-        std::string curie = crs.substr(1, crs.length() - 3);
+        std::string curie = crs.substr(1, crs.length() - 2);
         size_t separatorPos = curie.find(':');
         if (separatorPos == std::string::npos)
         {

@bradh We do not currently include a NUL character at the end of the CURIE string in the output.
Is this a problem on our side?

Even if we did include one, wouldn't std::string::length() still stop counting the length when it reaches the NUL character, like strlen() does? EDIT: It does not...

   char a[] = { 'H', 'e', 'l', 'l', 'o', 0 };
   std::string s(a, 6);
   printf("length() = %d", (int)s.length());

prints out 6.

rouault · 2024-11-30T23:56:25Z

Is this a problem on our side?

no that's fine, and the saner approach IMHO of not having an explicit nul terminator in the format (because from a security point of view, even if it seems to be a good intent to include it for C-like consumers, on practice you need to check if it is present or not in case the file would be corrupt or hostile. so it is better not to include it at all, and let implemenations add it themselves if they need to). If you do std::string("foo", 3), std::string will automatically add the nul terminator character

Even if we did include one, wouldn't std::string::length() still stop counting the length when it reaches the NUL character, like strlen() does? EDIT: It does not...

yes that's fully "expected" from the way the std::string(pointer, size) constructor works.

jerstlouis · 2024-11-30T23:59:19Z

@rouault

Perhaps there are test data files that use both interpretations, causing this mess?

It seems like that's the case and our implementation not adding the NUL character is causing the mess.
In the GeoHEIF box definition (see Requirement 4) it's defined like this:

aligned(8) class CoordinateReferenceSystemBox
extends FullBox('mcrs', version=0, flags) {
    unsigned int(32) crsEncoding;
    utf8string crs;
    if (flags & 1) {
        float(32) epoch;
    }
}

I guess that utf8string here implies a NUL-terminating character?

Hopefully @bradh will tell us how it should be :)

bradh · 2024-12-01T00:01:52Z

@bradh We do not currently include a NUL character at the end of the CURIE string in the output.
Is this a problem on our side?

Yes. The draft spec says its a utf8string, which is null terminated (see ISO/IEC 14496-12:2022 Section 4.2.1).

This is a common pattern in the ISO box structures. In this specific case, we need the terminator to tell where the CRS ends and the epoch starts.

(Likely you were working around my original implementation bug, sorry about that).

And yes, it has to be a safe Compact URI. While we could make it optional, there are enough options (OK, too many options) already. So just do it one safe way, and let OGC API users copy it out unchanged.

jerstlouis · 2024-12-01T00:04:16Z

The draft spec says its a utf8string, which is null terminated (see ISO/IEC 14496-12:2022 Section 4.2.1).
This is a common pattern in the ISO box structures.

Figured this was the case, so I guess I need to fix our implementation :) Thanks!

Though from the C++ code perspective, it seems like the -1 would be easier to understand if it were done when reading the data from the box, rather than in the later checks...

@bradh It might be good to clarify this in D010, because it's not totally obvious to those who can't afford the 216 Swiss francs that it implies a NUL terminating character (you could technically subtract the 4 bytes for the epoch from the overall box size to determine the length).

bradh · 2024-12-01T00:28:29Z

Though from the C++ code perspective, it seems like the -1 would be easier to understand if it were done when reading the data from the box, rather than in the later checks...

The null terminator is for the string, not for the property. The implementation concept is that the GeoHEIF property parsing logic is centralised, rather than each implementation needing to do it (or part of it).

@bradh It might be good to clarify this in D010, because it's not totally obvious to those who can't afford the 216 Swiss francs that it implies a NUL terminating character (you could technically subtract the 4 bytes for the epoch from the overall box size to determine the length).

I think this is valid. Will create a ticket for that on the GIMI gitlab, since its not about the GDAL part.

rouault · 2024-12-01T00:31:23Z

I should probably shut up, but I find it a bit sad that in 2024 we still use such TIFF era designs where you need to look for a nul terminator character to locate when an optional element is located and that we are using binary encodings to save a few bytes of metadata, which this whole discussion demonstrates to cause implementation troubles. Why not a darn JSON document embedded in a box with all geo stuff in a readable and easily extensible way... ?

jerstlouis · 2024-12-01T00:31:47Z

The null terminator is for the string, not for the property. The implementation concept is that the GeoHEIF property parsing logic is centralised, rather than each implementation needing to do it (or part of it).

Yes but I meant that this could be done for all HEIF utf8string being read into an std::string (though maybe that's too big of a change?).

jerstlouis · 2024-12-01T00:41:14Z

where you need to look for a nul terminator character to locate when an optional element

I must say that dealing with the HEIF boxes and big-endianness is not something today's developers will find easy to deal with, and the optional elements changing their size makes them all the more complicated.

For compact binary encodings I've been advocating for UBJSON lately.

bradh · 2024-12-01T02:20:53Z

I should probably shut up, but I find it a bit sad that in 2024 we still use such TIFF era designs where you need to look for a nul terminator character to locate when an optional element is located and that we are using binary encodings to save a few bytes of metadata, which this whole discussion demonstrates to cause implementation troubles. Why not a darn JSON document embedded in a box with all geo stuff in a readable and easily extensible way... ?

I recognise the frustration, but there are a couple of considerations. The main reason for this breakdown is that it is intended to also be used for image sequences and video, not just a single image. Being able to share properties saves quite a lot of space, and efficiency does matter when there are a lot of properties (like every corner point in every frame). The other is that you are seeing it through the lens of three GeoHEIF properties, rather than everything that is already in HEIF including all of ISO Base Media File Format. JSON isn't that common there, and it would be an additional generator side dependency for something like a camera. By contrast, HEIF writers already have the logic to write out a string.

So JSON is an option, and you can just associate it to the image(s) using cdsc item reference(s). The extensibility cuts both ways though, and the risk is it becomes a random dumping ground.

jerstlouis · 2024-12-01T17:36:35Z

@bradh I've updated our server to include the NUL terminating character in the mcrs output (see links for different CRSs in #11384 (comment)). I realized there were also still some bugs in our mtxf that I've fixed.

I've also rebased #11384 on this PR, and everything seems to be working fine now.

heif: geoheif properties cleanup

4cc16e2

rouault reviewed Nov 30, 2024

View reviewed changes

geoheif: clarify safe curie parsing

a29fb99

rouault approved these changes Nov 30, 2024

View reviewed changes

rouault merged commit a29fb99 into OSGeo:master Dec 2, 2024
37 checks passed

bradh deleted the heif_props_2024-11-30 branch December 2, 2024 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

heif: geoheif properties cleanup #11396

heif: geoheif properties cleanup #11396

bradh commented Nov 30, 2024

rouault Nov 30, 2024

bradh Nov 30, 2024

jerstlouis commented Nov 30, 2024 •

edited

Loading

rouault commented Nov 30, 2024

jerstlouis commented Nov 30, 2024 •

edited

Loading

rouault commented Nov 30, 2024

jerstlouis commented Nov 30, 2024 •

edited

Loading

bradh commented Dec 1, 2024 •

edited

Loading

jerstlouis commented Dec 1, 2024 •

edited

Loading

bradh commented Dec 1, 2024

rouault commented Dec 1, 2024

jerstlouis commented Dec 1, 2024

jerstlouis commented Dec 1, 2024

bradh commented Dec 1, 2024

jerstlouis commented Dec 1, 2024

	if ((crs.at(0) != '[') \|\| (crs.at(crs.length() - 2) != ']'))
	if ((crs.at(0) != '[') \|\| (crs.back() != ']'))

heif: geoheif properties cleanup #11396

heif: geoheif properties cleanup #11396

Conversation

bradh commented Nov 30, 2024

What does this PR do?

What are related issues/pull requests?

Tasklist

Environment

rouault Nov 30, 2024

Choose a reason for hiding this comment

bradh Nov 30, 2024

Choose a reason for hiding this comment

jerstlouis commented Nov 30, 2024 • edited Loading

rouault commented Nov 30, 2024

jerstlouis commented Nov 30, 2024 • edited Loading

rouault commented Nov 30, 2024

jerstlouis commented Nov 30, 2024 • edited Loading

bradh commented Dec 1, 2024 • edited Loading

jerstlouis commented Dec 1, 2024 • edited Loading

bradh commented Dec 1, 2024

rouault commented Dec 1, 2024

jerstlouis commented Dec 1, 2024

jerstlouis commented Dec 1, 2024

bradh commented Dec 1, 2024

jerstlouis commented Dec 1, 2024

jerstlouis commented Nov 30, 2024 •

edited

Loading

jerstlouis commented Nov 30, 2024 •

edited

Loading

jerstlouis commented Nov 30, 2024 •

edited

Loading

bradh commented Dec 1, 2024 •

edited

Loading

jerstlouis commented Dec 1, 2024 •

edited

Loading