Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove check that the meta box is zero when searching for a tmap. #2431

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

maryla-uc
Copy link
Collaborator

Unless I'm misunderstanding something, whether the meta box has size zero or not, it's read in one go. So if the meta box has been seen, all of it has been seen. The only case where it could be read partially is if the provided sizeHint value is too small, but then the parsing code still assumes it has read the whole box.
If the whole box is not available yet, the reading function is supposed to return AVIF_RESULT_WAITING_ON_IO.

Whether the meta box has size zero or not, it's read in one go.
So if the meta box has been seen, all of it has been seen.
The only case where it could be read partially is if the provided
sizeHint value is too small, but then the parsing code still assumes
it has read the whole box.
If the whole box is not available yet, the reading function is
supposed to return AVIF_RESULT_WAITING_ON_IO.
@maryla-uc maryla-uc requested a review from y-guyon September 13, 2024 11:48
Copy link
Collaborator

@y-guyon y-guyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the meta box has been seen, all of it has been seen.

Not sure I follow. One cannot be sure when all of meta was parsed and in all cases.

Consider the following image with alpha:

  • ftyp
  • meta size 0
    • hdlr
    • pitm
    • iloc
    • iinf
      • infe
      • infe
    • iprp
      • ipco
        • colr
        • auxC
        • etc
      • ipma
    • idat
    • iref
      • auxl

(I have not checked all specifications to ensure this is compliant so I assume it is.)

If one received everything till the iref box exclusive, it could be considered a valid opaque image. In my opinion this is dangerous because the displayed pixels depend on the file being partially received. It should return TRUNCATED_DATA instead in my opinion.

Could the same pattern happen for tmap, for example a missing altr group at the end or something?

Another inconvenient situation would be some non-compliant duplicated box such as two pitm or something, where the file is valid as long as truncated just before that invalid box at the end. But this goes into a bigger question of whether custom metadata can be arbitrarily appended at the end of any file of whatever format.

In practice I do not think there is any writer out there using meta with size 0 and idat, so it matters little.

If the whole box is not available yet, the reading function is supposed to return AVIF_RESULT_WAITING_ON_IO.

Which "reading function"? How can it know that?

@@ -4292,10 +4287,8 @@ static avifResult avifParse(avifDecoder * decoder)
}
#if defined(AVIF_ENABLE_EXPERIMENTAL_GAIN_MAP)
if (needsTmap && !tmapSeen) {
return metaIsSizeZero ? AVIF_RESULT_TRUNCATED_DATA : AVIF_RESULT_BMFF_PARSE_FAILED;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not why that check was only present for tmap in the first place.

@maryla-uc
Copy link
Collaborator Author

If the box has size zero, we read until the end of the file in one go, or until sizeHint if provided, which is supposed to be larger than the file.

libavif/src/read.c

Lines 4166 to 4173 in 8aab77e

if (header.isSizeZeroBox) {
// The box body goes till the end of the file.
if (decoder->io->sizeHint != 0 && decoder->io->sizeHint - parseOffset < SIZE_MAX) {
sizeToRead = decoder->io->sizeHint - parseOffset;
} else {
sizeToRead = SIZE_MAX; // This will get truncated. See the documentation of avifIOReadFunc.
}
} else {

So the reading function is supposed to return all data until the end of the file, or AVIF_RESULT_TRUNCATED_DATA if not available (in which case we abort directly).
This does mean that you can't do any sort of incremental or progressive decoding if you're using idat right now.
Maybe this should be changed instead, but then again I'm not sure why ayone would use idat instead of mdat.

@maryla-uc
Copy link
Collaborator Author

maryla-uc commented Sep 13, 2024

Which "reading function"? How can it know that?

According to the documentation of avifIOReadFunc:

libavif/include/avif/avif.h

Lines 1057 to 1067 in 8aab77e

// This function should return a block of memory that *must* remain valid until another read call to
// this avifIO struct is made (reusing a read buffer is acceptable/expected).
//
// * If offset exceeds the size of the content (past EOF), return AVIF_RESULT_IO_ERROR.
// * If offset is *exactly* at EOF, provide a 0-byte buffer and return AVIF_RESULT_OK.
// * If (offset+size) exceeds the contents' size, it must truncate the range to provide all
// bytes from the offset to EOF.
// * If the range is unavailable yet (due to network conditions or any other reason),
// return AVIF_RESULT_WAITING_ON_IO.
// * Otherwise, provide the range and return AVIF_RESULT_OK.
typedef avifResult (*avifIOReadFunc)(struct avifIO * io, uint32_t readFlags, uint64_t offset, size_t size, avifROData * out);

Copy link
Collaborator

@y-guyon y-guyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reading function is supposed to return all data until the end of the file, or AVIF_RESULT_TRUNCATED_DATA if not available (in which case we abort directly).

or AVIF_RESULT_WAITING_ON_IO but this is also returned immediately.

This does mean that you can't do any sort of incremental or progressive decoding if you're using idat right now.

I am not so sure, because avifParse() may stop before trying to read the whole meta box.


Consider:

  • A file that is compliant when truncated at N bytes
  • avifParse() given these N bytes
  • With sawEverythingNeeded = !needsTmap || tmapSeen, it will try to read more, leaving the choice to return TRUNCATED_DATA or WAITING_ON_IO to avifIOReadFunc.
  • Without sawEverythingNeeded = !needsTmap || tmapSeen, it will maybe return OK -> this may lead to different pixels depending on parsing the whole file or not.

So for this part of the PR I am not so sure.

For the metaIsSizeZero part:
metaIsSizeZero is looked up only when avifIOReadFunc returned OK, meaning the full size was read. And metaIsSizeZero is necessary to differentiate between a non-compliant file that may become valid with more bytes (TRUNCATED), or a non-compliant file that is faulty no matter what bytes are appended (PARSE_FAILED).

@maryla-uc
Copy link
Collaborator Author

So the reading function is supposed to return all data until the end of the file, or AVIF_RESULT_TRUNCATED_DATA if not available (in which case we abort directly).

or AVIF_RESULT_WAITING_ON_IO but this is also returned immediately.

Sorry yes I meant it will return AVIF_RESULT_WAITING_ON_IO, not AVIF_RESULT_TRUNCATED_DATA. I don't think avifIOReadFunc is supposed to return AVIF_RESULT_TRUNCATED_DATA given that description.

We can discuss this more in person on Monday.

@maryla-uc maryla-uc changed the title Remote check that the meta box is zero when searching for a tmap. Remove check that the meta box is zero when searching for a tmap. Sep 16, 2024
Copy link
Collaborator

@y-guyon y-guyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, looks good to me.

@maryla-uc
Copy link
Collaborator Author

A few comments about what we discussed, more comments for future reference:

  • currently, read.c always tries to read the whole 'meta' box at once, even when the size is 0, in which case it asks for all bytes until EOF, which the reading function is supposed to return (if available) or return AVIF_RESULT_TRUNCATED_DATA if not available
  • if the data is not available, read.c returns immediately anyway
  • if the reading function does not respect the contract explained in avif.h, that's not our problem
  • if the file was was created by truncating another larger file, it's not our problem to detect that the file might be valid if it had more data

The last point might be the most controversial. Previously it seems we were trying to return AVIF_RESULT_TRUNCATED_DATA in this case. But currently the AVIF_RESULT_TRUNCATED_DATA return status is not documented so there are no guarantees on when it is returned, so it should be ok to change.

@vrabaud
Copy link
Collaborator

vrabaud commented Sep 25, 2024

Knowing the data is truncated can be useful when a file is partially downloaded and a browser tries to figure out some information about the image (and first, make sure it is valid).

@maryla-uc
Copy link
Collaborator Author

I guess we can keep the check but I'm not sure we do this very consistently currently. We would need to add a test.
Note that this code only applies in the rare (?) case where the size of the meta box is '0' (= "until end of file")

@maryla-uc
Copy link
Collaborator Author

(Actually this code applies only in the super rare case where the meta box is size '0' AND the file happens to be truncated just at the boundary between two subboxes of the meta box)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants