-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit delegation to target class for ArrayBag #5189
Conversation
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
b5b0076
to
e94b1b1
Compare
@robsyme is this PR solving the issue for you? |
Yup, the changes solve both the very minimal example in the new test included in this PR and also the slightly more complicated example in #5187 |
b6cca1f
to
f5c7942
Compare
Signed-off-by: Rob Syme <[email protected]>
Signed-off-by: Rob Syme <[email protected]>
f5c7942
to
088fc7e
Compare
Signed-off-by: Rob Syme <[email protected]>
Signed-off-by: Rob Syme <[email protected]>
23675e4
to
1daf22a
Compare
If the point of the ArrayBag class is to provide a container for objects that order-invariant, it makes sense to ensure that hashCode and equivalence methods also reflect this property. The commented-out equals method only tests for length equivalence and a containsAll test, which would fail when testing ArrayBags with the same items with differing frequencies, e.g. |
This opens an interesting point: should |
Also it results into an inconsistency that
|
ArrayBag itself seems over-complicated and contradictory. It's basically trying to be a Here are all the places where ArrayBag is used:
The two operators should have return type Glob patterns in general are unordered, so it doesn't make sense to give the illusion of ordering in this case. What does If we do all of that, we can remove ArrayBag and avoid all of this mess |
Example of sorting groups explicitly after a groupTuple (source): // ...
| groupTuple
| map { group, items ->
def sorted = items
.sort { item -> item[0].id }
.collect { meta, csv -> csv }
return [group, sorted]
} |
Thanks for doing that analysis Ben, I agree that replacing the ArrayBag seems very sensible. |
Signed-off-by: Paolo Di Tommaso <[email protected]> Co-authored-by: Jordi Deu-Pons <[email protected]> Co-authored-by: Adam Talbot <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
This commit disables the AWS Batch spot auto-retries. The main reasons to disable this capability is: * The same tasks can be re-tried multiple times incurring in significant spending increase with the user is a aware of that * The AWS automatic retry re-execute a task in the same working directory because it's not directly managed by nextflow. This can introduce nasty side effects with partial/corrupted data left in a previous execution * There's not log/visual feedback during the pipeline execution, because it's managed directly by AWS Batch. User can still enable this capability by setting the following option: ``` aws.batch.maxSpotAttempts = n ``` where n is a integer > 0 Signed-off-by: Paolo Di Tommaso <[email protected]> Signed-off-by: Ben Sherman <[email protected]> Co-authored-by: Ben Sherman <[email protected]>
This commit disables the automatic retry made by Google Batch when a spot instance is reclaimed. The main reasons to disable this capability is: * The same tasks can be re-tried multiple times incurring in significant spending increase with the user is a aware of that * The Google automatic retry re-execute a task in the same working directory because it's not directly managed by nextflow. This can introduce nasty side effects with partial/corrupted data left in a previous execution * There's not log/visual feedback during the pipeline execution, because it's managed directly by Google Batch. User can still enable this capability by setting the following option: ``` google.batch.maxSpotAttempts = n ``` where n is a integer > 0 Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Docs pages under the developer/ directory (workflow diagram, packages, core plugins etc) had a broken Seqera logo in the sidebar. Fixes this by using a Sphinx function to always use a path for that file that's relative to the build root. Signed-off-by: Phil Ewels <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Phil Ewels <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
The default boot image is batch-cos, but it may not be desired in all situations. In particular, there is an ongoing issue in which the batch-cos image does not retry pulling a docker image if there was a network issue. The user may also have some some other configuration pre-configured in their custom boot image. This commit adds the ability to specify a custom boot disk image by using the configuration option ``` google.batch.bootDiskImage = '<NAME>' ``` Signed-off-by: Siddhartha Bagaria <[email protected]> Signed-off-by: Paolo Di Tommaso <[email protected]> Co-authored-by: Paolo Di Tommaso <[email protected]>
…ast] Signed-off-by: GlobefishNG <[email protected]> Co-authored-by: GlobefishNG <[email protected]>
The main reason for ArrayBag is to not invalidate the cache when files are emitted in different order What is tricky is that Given that adding I would avoid however the check for differing frequencies tho. Do you agree? |
my point is that this is already supported for Set, so ArrayBag should just be replaced with Set. if the order of a collection doesn't matter for purposes of caching, it shouldn't matter for the user either. |
But |
If the order doesn't matter... then they shouldn't be relying on it. Doing something like That being said, I would be willing to defer this change to the introduction of static type checking. We'll have to clean up APIs like this anyway and then we can have a feature flag like |
Sound like a reasonable plan |
Signed-off-by: Paolo Di Tommaso <[email protected]>
I forgot that The The problem is that ArrayBag implements List, which gives a false sense that it is ordered. If a user starts doing indexed access on an ArrayBag, they will be relying on non-deterministic behavior that could cause cryptic resume issues -- even though the ArrayBag itself is deterministic, their usage of the ArrayBag could lead to non-deterministic hashes downstream. You mentioned frequencies as a way to test equality and that it would be expensive, but... what if we just implement Bag as a frequency map instead of an array list? Call it a It looks like Guava even has this, it's called a Multiset:
|
In any case, I think the migration plan is the same, since replacing ArrayBag with Multiset would drop indexing, better to change it as part of the static type API. |
Solves #5187