Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename skip, replace and fail #53

Closed
nalimilan opened this issue Nov 18, 2017 · 5 comments · Fixed by #55
Closed

Rename skip, replace and fail #53

nalimilan opened this issue Nov 18, 2017 · 5 comments · Fixed by #55

Comments

@nalimilan
Copy link
Member

These functions are supposed to be called as Missings.skip since their names are too broad for them to be exported. Yet, if we move these features to Base, there will not be any module to replace Missings. A few possible solutions have been discussed:

  1. Include these functions in a standard library module, which could be called Missings. Cons: many people do not like this name, it's already relatively long to type.

  2. Make skip, replace and fail methods of more general functions under Base.Iterators, as they are not so commonly used. But since skip is needed all the time, also export a convenience function with a short name. Possible choices include:

  • nonmissing. Pros: it's the common term used to speak about values which are not missing (it's even in the Wiktionary; we already have nonzeros, which is somewhat similar; auto-completion works with just nonm[tab]. Cons: it could be weird to have a negation in a function name.
  • skipmissing. Pros: maybe more explicit than "non", and avoids the negation; can also be auto-completed by typing skipm[tab]. Cons: less standard than nonmissing.
  1. Drop the three functions (merging them with Base.Iterators functions), and add keyword arguments where missing values need a special treatment. For example, sum would take a missing=:pass argument (or missing=pass using an enum) which could also take values skip and fail. Pros: it's flexible, so that it could be extended to e.g. missing=:pairwise_delete for cor; it's kind of systematic in that skip doesn't get a special treatment. Cons: it requires adapting all reductions to support missing values, and doesn't help when you need to deal with a function which has not been adapted (so that we will probably need functions from 1 or 2 anyway, though less frequently). Option 2 could also be combined with this one by allowing a second argument to nonmissing/skipmissing which would allow replacing missing values with a value (replace) or throwing an error (fail).

Overall I'm leaning towards 2, but I'm not completely sure which name is the best one (we could find other ideas too).

@nalimilan
Copy link
Member Author

Something I should have noted is that some functions will need to take convenience keyword arguments anyway because they need a special behavior:

  • groupby needs a skipmissing=false arguments to allow dropping rows with missing values in one of the grouping columns (this avoids repeating the names of grouping columns, cf. Decide what to do with null values in grouping DataFrames.jl#1249)
  • freqtable similarly needs a skipmissing=false argument to avoid repeating the names of the columns on which the table is computed
  • cor, cov and pairwise functions need to allow dropping either all rows with at least one missing value on one of the variables (listwise deletion), or only those with a missing value in the considered pair of variables (pairwise deletion)

@quinnj
Copy link
Member

quinnj commented Nov 18, 2017

My vote is skipmissing, replacemissing, and failmissing. I still don't know when anyone would actually use failmissing, so I'm not really concerned w/ that. I also think that replacemissing could just be replaced by coalesce.(A, repl), which is shorter/more explicit anyway.

@ararslan
Copy link
Member

My vote is nonmissing, coalesce for replacement, @assert !any(ismissing, x) for failing. I don't think we really need a dedicated function for that.

@nalimilan
Copy link
Member Author

So basically you both support option 2, and the question is whether it use skipmissing or nonmissing?

I agree we probably don't need replacemissing and failmissing. However, note that coalesce.(x, r) is eager, so that's not a perfect replacement, and that the advantage of failmissing is that you don't need to make two passes over the data. But as I noted something under Base.Iterators should be enough.

@cjprybol
Copy link
Contributor

As someone who's been biased and influenced by the current Missings.skip returning an iterator, I think as long as the idea is to have nonmissing/skipmissing return an iterator then skipmissing is a more intuitive name (skip any missing elements you come across when you reach them). nonmissing is technically correct but if I were using that function for the first time I think I'd expect it to act similarly to dropmissing/dropmissing! from DataFrames before I'd expect it to return an iterator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants