Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make retry() more flexible #19331

Merged
merged 1 commit into from
Jan 12, 2017
Merged

make retry() more flexible #19331

merged 1 commit into from
Jan 12, 2017

Conversation

bjarthur
Copy link
Contributor

retry as it is is GREAT! however, it's not flexible enough for my needs. this PR simply hoists all of the arbitrary hard-coded constants up to keyword arguments. it also add the ability to display a custom message at each retry. lastly it interpolates the global default values into the docstring so the two don't get out of sync.

@amitmurthy @samoconnor

event of an exception. If `retry_on` is a `Type` then retry only
for exceptions of that type. If `retry_on` is a function
`test_error(::Exception) -> Bool` then retry only if it is true.
Returns a lambda that retries function `f` in the event of an exception. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use "anonymous function" in place of "lambda"? I did a quick search of the Julia docs and this looks like one of two uses of "lambda" to mean anonymous function. I think anonymous function is already used more often, and is easier to guess the meaning of.

@bjarthur bjarthur force-pushed the bja/retry branch 2 times, most recently from 77a97fc to 7f4b292 Compare November 16, 2016 13:07
@bjarthur
Copy link
Contributor Author

"lambda" changed to "anonymous function". anything else?

@bjarthur
Copy link
Contributor Author

@samoconnor @amitmurthy this modifies your code. any comments?

@amitmurthy
Copy link
Contributor

I am not a fan of this change. If we really need such detailed customization it is probably better to implement a retry(f::Function; check::Function=default_retry_check) where check(prev_state) returns a tuple (retry_again::Bool, new_state). check would be called everytime before retrying. The first time prev_state would be nothing. Retries are performed as long as retry_again is true. The check function should also implement the wait between retries.

The current signature should be supported via a default_retry_check function that wraps the current logic.

@samoconnor
Copy link
Contributor

not flexible enough for my needs
Can you explain the use case(s?) where you need to adjust fist delay, growth factor and jitter?

Rather than exposing all the constants, I would prefer to have, say, 3 selectable configurable strategies, each with a fixed set of constants. The current values are intended to work well for retrying due to network congestion/failure. I'm interested in what the other use cases are.

It looks like the jitter used to be +/- 0% to 20%, but is now + 0% to 10%. Why reduce the default range?

To my eye, the pattern of defining DEFAULT_UPPER_CASE_CONSTANTS and then using them to set kw arg defaults is clumsy and redundant. If I'm looking at the signature of retry and I want to be able to see what the default values of the kw args are, I don't want to have to look them up somewhere else.
Maybe there needs to be some docstring magic to extract the default value of a kw arg.

See also https://github.com/samoconnor/Retry.jl

@bjarthur bjarthur changed the title add more keyword args to retry make retry() more flexible Dec 16, 2016
@bjarthur
Copy link
Contributor Author

@amitmurthy i've refactored retry to input a default_retry_check function as you described. let me know what you think.

@samoconnor my use case is to make my code fault tolerant to network hard drive failures, so my retries need to happen on the time scale of hours not seconds. in general i just don't think it's a good idea to have so many parameters hard coded.

note that tests are currently failing due to a bootstrapping error during the build process. below is the error i'm getting. any ideas? i've never encountered a bootstrapping error before:

<snip>
promotion.jl
tuple.jl
range.jl
expr.jl
error.jl
error during bootstrap:
UndefVarError(var=:Dict)
rec_backtrace at /Users/arthurb/src/julia/src/stackwalk.c:84
record_backtrace at /Users/arthurb/src/julia/src/task.c:239 [inlined]
jl_throw at /Users/arthurb/src/julia/src/task.c:561
jl_undefined_var_error at /Users/arthurb/src/julia/src/builtins.c:130
eval at /Users/arthurb/src/julia/src/interpreter.c:201
do_call at /Users/arthurb/src/julia/src/interpreter.c:70
eval at /Users/arthurb/src/julia/src/interpreter.c:210
do_call at /Users/arthurb/src/julia/src/interpreter.c:70
eval at /Users/arthurb/src/julia/src/interpreter.c:210
eval at /Users/arthurb/src/julia/src/interpreter.c:271
eval_body at /Users/arthurb/src/julia/src/interpreter.c:576
jl_toplevel_eval_flex at /Users/arthurb/src/julia/src/toplevel.c:622
jl_parse_eval_all at /Users/arthurb/src/julia/src/ast.c:753
jl_load at /Users/arthurb/src/julia/src/toplevel.c:666 [inlined]
jl_load_ at /Users/arthurb/src/julia/src/toplevel.c:675
unknown function (ip: 0x308e221af)
jl_call_method_internal at /Users/arthurb/src/julia/src/./julia_internal.h:240 [inlined]
jl_apply_generic at /Users/arthurb/src/julia/src/gf.c:1861
do_call at /Users/arthurb/src/julia/src/interpreter.c:71
eval at /Users/arthurb/src/julia/src/interpreter.c:210
jl_toplevel_eval_flex at /Users/arthurb/src/julia/src/toplevel.c:628
jl_eval_module_expr at /Users/arthurb/src/julia/src/toplevel.c:196
jl_toplevel_eval_flex at /Users/arthurb/src/julia/src/toplevel.c:535
jl_toplevel_eval_in at /Users/arthurb/src/julia/src/builtins.c:577
unknown function (ip: 0x308e21a83)
jl_call_method_internal at /Users/arthurb/src/julia/src/./julia_internal.h:240 [inlined]
jl_apply_generic at /Users/arthurb/src/julia/src/gf.c:1861
do_call at /Users/arthurb/src/julia/src/interpreter.c:71
eval at /Users/arthurb/src/julia/src/interpreter.c:210
jl_toplevel_eval_flex at /Users/arthurb/src/julia/src/toplevel.c:628
jl_parse_eval_all at /Users/arthurb/src/julia/src/ast.c:753
jl_load at /Users/arthurb/src/julia/src/toplevel.c:666
exec_program at /Users/arthurb/src/julia/usr/bin/julia (unknown line)
true_main at /Users/arthurb/src/julia/usr/bin/julia (unknown line)
main at /Users/arthurb/src/julia/usr/bin/julia (unknown line)

make[1]: *** [/Users/arthurb/src/julia/usr/lib/julia/inference.ji] Error 1
make: *** [julia-inference] Error 2

@bjarthur bjarthur force-pushed the bja/retry branch 2 times, most recently from d674d3c to 8bf9edd Compare December 21, 2016 22:10
@bjarthur bjarthur closed this Dec 30, 2016
@bjarthur bjarthur reopened this Dec 30, 2016
@@ -993,6 +993,7 @@ export
error,
rethrow,
retry,
default_retry_check,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a really messy name to be exporting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amitmurthy i'd be happy not exporting default_retry_check. what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely don't want to export this. would a user need to call it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users would need to call default_retry_check only if they want to change it's keyword arguments. see the tests for examples.

if your objection is the length of the symbol, there is ample precedent:

julia> idx=sortperm(map(x->length(string(x)),names(Base)));
julia> names(Base)[idx[end-20:end]]
21-element Array{Symbol,1}:
 :specialized_binary        
 :task_local_storage        
 :AbstractSparseArray       
 :broadcast!_function       
 :broadcast_setindex!       
 :default_worker_pool       
 :get_zero_subnormals       
 :pointer_from_objref       
 :set_zero_subnormals       
 :AbstractSparseMatrix      
 :AbstractSparseVector      
 :RoundNearestTiesAway      
 :blas_set_num_threads      
 :InvalidStateException     
 :ProcessExitedException    
 :get_bigfloat_precision    
 :set_bigfloat_precision    
 :with_bigfloat_precision   
 :unsafe_pointer_to_objref  
 :specialized_bitwise_unary 
 :specialized_bitwise_binary

note that in the parallel realm there is already default_worker_pool. what would you think of default_check and default_pool, respectively, instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we should be exporting default_worker_pool either, at least not under that name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of those are long type names and are ok. Most of the long names with underscores we have been working to eliminate. A long underscored name is a strong indication that you are exposing insufficiently well factored concepts. This is worse for verbs than for nouns.

@bjarthur
Copy link
Contributor Author

bootstrapping error now fixed by removing the use of Dict in default_retry_check.

@bjarthur
Copy link
Contributor Author

bjarthur commented Jan 6, 2017

@amitmurthy i've removed the export of default_retry_check per @tkelman and @StefanKarpinski 's request. you might want to look over this PR. i've implemented exactly what you suggested i think.

@StefanKarpinski
Copy link
Member

This is squarely in @amitmurthy's court – up to you if you want to merge or not.


# Examples
```
map(retry(x->x^2, (s,e)->default_retry_check(s,e,n=47)), 1:5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be formatted as an doctest with the output visible?

@amitmurthy
Copy link
Contributor

amitmurthy commented Jan 8, 2017

@bjarthur thanks for your patience. Thinking some more, there may be a better way to do this. We could use a standard iterator interface for specifying delays which adds to the overall flexibility. So we have,

retry(f::Function; delays=ExponentialBackOff(), check=nothing)

ExponentialBackOff is an exported iterator which implements the current exponential backoff strategy via start, next and done methods. It is constructed with
ExponentialBackOff(n,max_delay) with default values of 1 and 10.0 respectively, i.e., ExponentialBackOff() = ExponentialBackOff(1, 10.0). This can be mentioned in the docstring for ExponentialBackOff

if check is defined it must be a 2-arg function of the iterator state and caught exception. If unspecified we retry for all exceptions.

next on the iterable returns the number of seconds to sleep before retrying.

You can remove the uppercase constant definitions.

In addition to standard ExponentialBackOff this will allow for simpler strategies like
retry(f, delays=fill(5.0,3)) for retrying 3 times at 5.0 second intervals or retry(f, delays=rand(5:10, 2)) for retrying twice with random intervals between 5 and 10 seconds.

More complex strategies including retrying for different intervals depending on the type of error encountered can be specified via custom check and delay iterator implementations.

@bjarthur
Copy link
Contributor Author

@amitmurthy i've implemented ExponentialBackOff. i like this API better too.

one of the travis builds stalled, otherwise the tests passed.

Copy link
Contributor

@amitmurthy amitmurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should export ExponentialBackOff.

if i > n || try retry_on(e) end !== true
rethrow(e)
done(delays, state) && rethrow(e)
if check != nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!==

@@ -158,8 +158,6 @@ function wrap_on_error(f, on_error; capture_data=false)
end
end

wrap_retry(f, retry_on, n, max_delay) = retry(f, retry_on; n=n, max_delay=max_delay)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this was a StackOverflow error being thrown when done inline. I assume you are not seeing that now.

retry_on=retry_on,
retry_n=retry_n,
retry_max_delay=retry_max_delay)
on_error = on_error, retry_delays = retry_delays[2:end], retry_check = retry_check)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may not work if retry_delays is of type ExponentialBackOff? I think the retry logic in pmap will need a different approach. We can tackle it as a separate issue.


"""
retry(f, [retry_on]; n=1, max_delay=10.0) -> Function
ExponentialBackOff(; n=1, first_delay=0.05, max_delay=10.0, factor=5.0, jitter=0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this does get exported, then be sure to add it to the stdlib doc index

@bjarthur
Copy link
Contributor Author

requested changes made. all tests pass.

@amitmurthy amitmurthy merged commit 31b52b4 into JuliaLang:master Jan 12, 2017
@amitmurthy
Copy link
Contributor

Thanks. Can you add a line to NEWS.md with a short description of this change?

@bjarthur
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants