Rename skip, replace and fail #53

nalimilan · 2017-11-18T11:35:31Z

These functions are supposed to be called as Missings.skip since their names are too broad for them to be exported. Yet, if we move these features to Base, there will not be any module to replace Missings. A few possible solutions have been discussed:

Include these functions in a standard library module, which could be called Missings. Cons: many people do not like this name, it's already relatively long to type.
Make skip, replace and fail methods of more general functions under Base.Iterators, as they are not so commonly used. But since skip is needed all the time, also export a convenience function with a short name. Possible choices include:

nonmissing. Pros: it's the common term used to speak about values which are not missing (it's even in the Wiktionary; we already have nonzeros, which is somewhat similar; auto-completion works with just nonm[tab]. Cons: it could be weird to have a negation in a function name.
skipmissing. Pros: maybe more explicit than "non", and avoids the negation; can also be auto-completed by typing skipm[tab]. Cons: less standard than nonmissing.

Drop the three functions (merging them with Base.Iterators functions), and add keyword arguments where missing values need a special treatment. For example, sum would take a missing=:pass argument (or missing=pass using an enum) which could also take values skip and fail. Pros: it's flexible, so that it could be extended to e.g. missing=:pairwise_delete for cor; it's kind of systematic in that skip doesn't get a special treatment. Cons: it requires adapting all reductions to support missing values, and doesn't help when you need to deal with a function which has not been adapted (so that we will probably need functions from 1 or 2 anyway, though less frequently). Option 2 could also be combined with this one by allowing a second argument to nonmissing/skipmissing which would allow replacing missing values with a value (replace) or throwing an error (fail).

Overall I'm leaning towards 2, but I'm not completely sure which name is the best one (we could find other ideas too).

The text was updated successfully, but these errors were encountered:

nalimilan · 2017-11-18T12:29:59Z

Something I should have noted is that some functions will need to take convenience keyword arguments anyway because they need a special behavior:

groupby needs a skipmissing=false arguments to allow dropping rows with missing values in one of the grouping columns (this avoids repeating the names of grouping columns, cf. Decide what to do with null values in grouping DataFrames.jl#1249)
freqtable similarly needs a skipmissing=false argument to avoid repeating the names of the columns on which the table is computed
cor, cov and pairwise functions need to allow dropping either all rows with at least one missing value on one of the variables (listwise deletion), or only those with a missing value in the considered pair of variables (pairwise deletion)

quinnj · 2017-11-18T12:51:23Z

My vote is skipmissing, replacemissing, and failmissing. I still don't know when anyone would actually use failmissing, so I'm not really concerned w/ that. I also think that replacemissing could just be replaced by coalesce.(A, repl), which is shorter/more explicit anyway.

ararslan · 2017-11-18T18:56:51Z

My vote is nonmissing, coalesce for replacement, @assert !any(ismissing, x) for failing. I don't think we really need a dedicated function for that.

nalimilan · 2017-11-19T12:10:17Z

So basically you both support option 2, and the question is whether it use skipmissing or nonmissing?

I agree we probably don't need replacemissing and failmissing. However, note that coalesce.(x, r) is eager, so that's not a perfect replacement, and that the advantage of failmissing is that you don't need to make two passes over the data. But as I noted something under Base.Iterators should be enough.

cjprybol · 2017-11-20T22:41:45Z

As someone who's been biased and influenced by the current Missings.skip returning an iterator, I think as long as the idea is to have nonmissing/skipmissing return an iterator then skipmissing is a more intuitive name (skip any missing elements you come across when you reach them). nonmissing is technically correct but if I were using that function for the first time I think I'd expect it to act similarly to dropmissing/dropmissing! from DataFrames before I'd expect it to return an iterator.

nalimilan mentioned this issue Nov 19, 2017

Add missing and Missing JuliaLang/julia#24653

Merged

nalimilan mentioned this issue Nov 22, 2017

Deprecate Missings.skip in favor of skipmissing #55

Merged

quinnj closed this as completed in #55 Nov 23, 2017

nalimilan mentioned this issue Dec 8, 2017

Add skipmissing() to skip missing values JuliaLang/julia#24982

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename skip, replace and fail #53

Rename skip, replace and fail #53

nalimilan commented Nov 18, 2017

nalimilan commented Nov 18, 2017

quinnj commented Nov 18, 2017

ararslan commented Nov 18, 2017

nalimilan commented Nov 19, 2017

cjprybol commented Nov 20, 2017

Rename skip, replace and fail #53

Rename skip, replace and fail #53

Comments

nalimilan commented Nov 18, 2017

nalimilan commented Nov 18, 2017

quinnj commented Nov 18, 2017

ararslan commented Nov 18, 2017

nalimilan commented Nov 19, 2017

cjprybol commented Nov 20, 2017