Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential problem with OKHTTP integration library #419

Closed
Tolriq opened this issue Apr 17, 2015 · 13 comments
Closed

Potential problem with OKHTTP integration library #419

Tolriq opened this issue Apr 17, 2015 · 13 comments

Comments

@Tolriq
Copy link
Contributor

Tolriq commented Apr 17, 2015

To follow on a previous issue (#409).

How are loadData / cleanup and cancel ordered in a DataFetcher ?

From previous discussion you said you do not cancel jobs, but since cancel will return instantly when do you call the cleanup when the request is cancelled ?

The problem rely in the fact that cleanup do close the inputstream so if skia is in the process of decoding something it can return an half filled image without error or exception leading to corruption of the cache.

@sjudd
Copy link
Collaborator

sjudd commented Apr 18, 2015

Are you still seeing half filled images on the latest snapshot?

Cancel does two things:

  1. It calls cancel() on the DataFetcher.
  2. It sets a flag that causes the load to abort after data is retrieved, but before it is decoded.

In OkHttp, cancel() in the DataFetcher is ignored. The only other thing cancel should do is set a flag, which is ignored once the decode starts.

It's possible there's something I'm not seeing, but I don't think cancel should cause partially loaded images.

@Tolriq
Copy link
Contributor Author

Tolriq commented Apr 18, 2015

I have not tried again, but the problem is not about cancel it's about cleanup.

When is cleanup called ?

Cleanup close the inputstream in okhttp, so if it happens during decoding you'll have an half filled image due to skia not throwing error.

Maybe it's fixed but just wanted to be sure about how cleanup is called to prevent future problem for others.

@TWiStErRob
Copy link
Collaborator

cleanup is called only once, here: DecodeJob.decodeSource, so it should be only called after the BitmapFactory has fully read the stream or an exception is thrown while reading (skia returned null is not an exception).
The input stream may be acquired from loadData, but may not be read if cancel came first (if (isCancelled) return null), so again cleanup is called after cancel() when the JVM is exiting the decodeSource method. Also in that case cache pollution can't happen since control didn't get that far.

cancel can be called any time, becuase it is delegated from Request.clear() through EngineJob.cancel() to fetcher.cancel() (some hops omitted). Since okhttp doesn't do anything in cancel() it is safe.

@Tolriq
Copy link
Contributor Author

Tolriq commented Apr 18, 2015

Well the problem is a little more complex :)

If loadData is a long running operation that does things before returning, cleanup can be called while loadData is still running.

Since we do not have control over threads it's impossible from cancel to properly stop some loadData operations. (Well without complex synchronization things that should not be needed).

So from my previous tests (maybe corrected in very lasts commits, if cleanup can be called when loadData is running this can cause problem to application if they are not aware of this possibility.
And from a coder point a view if not clearly stated somewhere a cleanup should not happens when something else is running.

@TWiStErRob
Copy link
Collaborator

I just don't see how cleanup can be called if it's in the finally block of the same try loadData is called from. cancel can be called in the meantime, yes, as I pointed out in #424 too.

You kind of have control over threads, but your "long running operation" should have an inbuilt disconnect or abort:

loadData() {
    this.thread = Thread.currentThread();
    ... start long running blocking load
}
cancel() {
    this.thread?.interrupt();
}

@Tolriq
Copy link
Contributor Author

Tolriq commented Apr 18, 2015

I do not know how but this is the case :)

public class ImageRequestCacheStreamFetcher implements DataFetcher<InputStream> {

private AtomicBoolean mIsCancelled = new AtomicBoolean(false);

    @Override
    public InputStream loadData(Priority priority) throws Exception {
       do long operation;
       if (mIsCancelled.get()) it was cancelled while running

return a stream but never reached.
}

    @Override
    public void cancel() {
        mIsCancelled.set(true);
    }

As you can see in this simple code : I do not stop anything in cancel just set a boolean.
And this boolean will be set inside loadData when reaching the test.

@TWiStErRob
Copy link
Collaborator

Are you confusing cleanup and cancel? Because I'm really confused now :) (I said cleanup is not called while in loadData and you replied with cancel's code)

If you're saying isCancelled is always true, then just put a breakpoint in cancel and see the stack, I've had some cases when Glide was clearing a request all the time, but it was my fault when I traced it back.

@Tolriq
Copy link
Contributor Author

Tolriq commented Apr 19, 2015

Well this is a followup from a discussion with @sjudd ;)

I do appreciate that you try to help but maybe you try a little too hard without looking at the whole discussion.

@sjudd told me to report back the previous remarked problem it's what I do, I do understand how my code works, and I do know that what happens does not match what @sjudd told me about how cancellation works.

So yes there's a problem and no it's not on my fault.

Take the okhttp integration library.

In the DataFetched onload before returning back the inputstream do something with the stream like a BitmapFactory.decodeStream

And you'll see in logs : (Tested with last master branch)

04-19 11:51:41.567    1614-2043/xxx D/skia? --- decoder->decode returned false
04-19 11:51:41.572    1614-2013/xxx W/System.err? java.io.InterruptedIOException
04-19 11:51:41.573    1614-2013/xxx W/System.err? at okio.Timeout.throwIfReached(Timeout.java:146)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at okio.Okio$2.read(Okio.java:134)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at okio.AsyncTimeout$2.read(AsyncTimeout.java:211)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at okio.RealBufferedSource.read(RealBufferedSource.java:50)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at com.squareup.okhttp.internal.http.HttpConnection$FixedLengthSource.read(HttpConnection.java:390)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at okio.RealBufferedSource$1.read(RealBufferedSource.java:338)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at java.io.FilterInputStream.read(FilterInputStream.java:118)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at com.bumptech.glide.util.ContentLengthInputStream.read(ContentLengthInputStream.java:61)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at com.bumptech.glide.load.resource.bitmap.RecyclableBufferedInputStream.fillbuf(RecyclableBufferedInputStream.java:166)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at com.bumptech.glide.load.resource.bitmap.RecyclableBufferedInputStream.read(RecyclableBufferedInputStream.java:309)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at android.graphics.BitmapFactory.nativeDecodeStream(Native Method)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at android.graphics.BitmapFactory.decodeStreamInternal(BitmapFactory.java:635)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at android.graphics.BitmapFactory.decodeStream(BitmapFactory.java:611)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at xxx.glide.GlideUtils.downloadImage(GlideUtils.java:140)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at xxx.glide.ImageRequestCacheStreamFetcher.loadData(ImageRequestCacheStreamFetcher.java:92)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at xxx.glide.ImageRequestCacheStreamFetcher.loadData(ImageRequestCacheStreamFetcher.java:26)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at com.bumptech.glide.load.model.ImageVideoModelLoader$ImageVideoFetcher.loadData(ImageVideoModelLoader.java:70)
04-19 11:51:41.573    1614-2013/xxx W/System.err? at com.bumptech.glide.load.model.ImageVideoModelLoader$ImageVideoFetcher.loadData(ImageVideoModelLoader.java:53)
04-19 11:51:41.583    1614-2013/xxx W/System.err? at com.bumptech.glide.load.engine.DecodeJob.decodeSource(DecodeJob.java:170)
04-19 11:51:41.583    1614-2013/xxx W/System.err? at com.bumptech.glide.load.engine.DecodeJob.decodeFromSource(DecodeJob.java:128)
04-19 11:51:41.584    1614-2013/xxx W/System.err? at com.bumptech.glide.load.engine.EngineRunnable.decodeFromSource(EngineRunnable.java:122)
04-19 11:51:41.584    1614-2013/xxx W/System.err? at com.bumptech.glide.load.engine.EngineRunnable.decode(EngineRunnable.java:101)
04-19 11:51:41.589    1614-2013/xxx W/System.err? at com.bumptech.glide.load.engine.EngineRunnable.run(EngineRunnable.java:58)
04-19 11:51:41.589    1614-2013/xxx W/System.err? at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:422)
04-19 11:51:41.589    1614-2013/xxx W/System.err? at java.util.concurrent.FutureTask.run(FutureTask.java:237)
04-19 11:51:41.589    1614-2013/xxx W/System.err? at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
04-19 11:51:41.589    1614-2013/xxx W/System.err? at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
04-19 11:51:41.589    1614-2013/xxx W/System.err? at java.lang.Thread.run(Thread.java:818)
04-19 11:51:41.589    1614-2013/xxx W/System.err? at com.bumptech.glide.load.engine.executor.FifoPriorityThreadPoolExecutor$DefaultThreadFactory$1.run(FifoPriorityThreadPoolExecutor.java:52)

Meaning since Cancel do nothing and @sjudd said that Glide does not stop the stream that cleanup have to be called to close the stream.

And in the skia case closing the stream during decodeStream will not trigger an exception but return a bad image without anyway to validate that it's ok or not, meaning the only way to know is to check if the request was cancelled (this is what I do as reported earlier) and this works.

But either the ordering should be changed or the doc should be more clear about that, because from an external point of view and with the naming of the function this should not happens and should not be needed.

In the end this is not a big deal once you known that, but since it may also be related to the big thread sync problem it's worth reporting.

@sjudd
Copy link
Collaborator

sjudd commented Apr 19, 2015

Actually I may have erroneously closed #426. @Tolriq can you provide the relevant content of GlideUtils (around line 140)?

Glide's decoder wraps InputStreams to detect when exceptions are thrown during the decode process and discard the results. From the stack trace you've provided it looks like you're not using that class and are instead calling BitmapFactory yourself, which won't wrap the stream, and will therefore return partial results because that's the way the framework behaves.

I don't think the cancel or cleanup implementation in Glide is an issue. That said, If your ImageRequestCacheStreamFetcher cancels the OkHttp request, or uses a timeout, or otherwise causes OkHttp's thread to be interrupted, OkHttp's InputStream may throw, which is expected behavior. Glide catches this case using its wrapping InputStream, you can do so by doing the same, using Glide's Downsampler, or by ignoring results after cancel() has been called on your fetcher (which it sounds like you're doing).

@Tolriq
Copy link
Contributor Author

Tolriq commented Apr 19, 2015

Yes as said earlier I do use the BitmapFactory.decodeStream but I do not cancel anything on the inputstream. Only close of inputstream in on the cleanup code of the dataFetcher.

OkHttp timeout or anything else is not called or reached.

So either Glide interrupt the loadData or cleanup is called while loadData is running.

And the doc and what you said earlier is not clear about that being possible. When you know that there's workaround (I'm checking the cancel status) but it's better to know the problems before as the problem might not be easy to spot or understand at first.

I do use manual decodeStream due to the re entrance / sync bug when calling Glide.load inside the loadData, and my specific dual cache / resizing need :)

I'm finishing integration so can't do complete debugging as I have way too much work, and the library internals is quite complex to approach :(

But if you need I can try to debug this later.

@sjudd
Copy link
Collaborator

sjudd commented Apr 19, 2015

I don't think further debugging is necessary, we know both the cause and the solution.

Cancel probably interrupts the thread, which OkHttp probably uses as a signal to cancel the download.

This doesn't affect normal users, because they use Glide's decoder which explicitly checks for exceptions when reading from InputStreams in BitmapFactory and ignores the potentially invalid results.

Since you're doing your own decoding, it's up to you to handle partial streams, which you can do trivially either by checking for cancellation explicitly, or by wrapping your stream in ExceptionCatchingInputStream.

@sjudd sjudd closed this as completed Apr 19, 2015
@TWiStErRob
Copy link
Collaborator

Sorry for being too eager, I'll try to stand back a little. Here is one thing that came to my mind though (before I saw @sjudd's interruption comment, man, that (true) is hard to miss :) ): is it possible that your fetcher is reused? That would explain having cleanup called in middle of loadData. Worth to double check, but interruption is most likely the cause.

Also since you're hand-decoding, you may be able to utilize (in addition to ExceptionCatchingInputStream) BitmapFactory.options.requestCancelDecode in your cancel which should stop decoding before the interrupt even reaches the stream and return null from decodeStream.

@Tolriq
Copy link
Contributor Author

Tolriq commented Apr 19, 2015

No fetcher is not reused :) And my hope is to be able to use normal Glide.load when it works :) (#402)

And just to be clear I opened this because of this comment #409 (comment)

Now if finally there's cancel everything is normal and logical was just not clear at all from previous description and the source code of OkHTTP integration library that maybe worth an update on the cancel part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants