Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make date format guess algorithm more robust #69

Open
kescobo opened this issue Aug 2, 2018 · 2 comments
Open

Make date format guess algorithm more robust #69

kescobo opened this issue Aug 2, 2018 · 2 comments
Labels

Comments

@kescobo
Copy link

kescobo commented Aug 2, 2018

I've been running into a weird date parsing issue, and I can't sort out what the pattern is, though I've managed to nail down a MWE

The linked csv has 4 rows of dates.

julia> csvread("parse_test.csv")
ERROR: ArgumentError: Month: 27 out of range (1:12)
Stacktrace:
 [1] Date(::Int64, ::Int64, ::Int64) at ./dates/types.jl:204
 [2] tryparsenext(::TextParse.DateTimeToken{Date,DateFormat{Symbol("yyyy/mm/dd"),Tuple{Base.Dates.DatePart{'y'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'m'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'d'}}}}, ::String, ::Int64, ::Int64, ::TextParse.LocalOpts) at /Users/kev/.julia/v0.6/TextParse/src/field.jl:431
 [3] macro expansion at /Users/kev/.julia/v0.6/TextParse/src/util.jl:23 [inlined]
 [4] tryparsenext(::TextParse.Field{Date,TextParse.DateTimeToken{Date,DateFormat{Symbol("yyyy/mm/dd"),Tuple{Base.Dates.DatePart{'y'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'m'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'d'}}}}}, ::String, ::Int64, ::Int64, ::TextParse.LocalOpts) at /Users/kev/.julia/v0.6/TextParse/src/field.jl:569
#...

(the stack trace is super long, let me know if it would be useful to post the whole thing)

There are 3 27s, two in the second row, and one in the last row. If I remove just the last row, it works.

But if I leave the 4th row in and just change the 27 in the last row to a 2, I get the same ERROR: ArgumentError: Month: 27 out of range (1:12).

If I change all the 27s to 2s, I now get ERROR: ArgumentError: Month: 21 out of range (1:12), and again this error goes away if I delete the last row, even though there are no 21s in the last row.

There's not just something weird with that row - this is part of a much larger csv file, and removing only row 4 does not stop the error.

Note - originally posted as issue to CSVFiles.jl, but this error seems to be caused by this package.

@davidanthoff
Copy link
Member

The problem here is that the column type detection algorithm here goes wrong. It classifies the third column as yyyy/mm/dd, which is clearly wrong.

I think the whole classification logic for date time columns is not very good: as far as I can tell it essentially classifies purely based on the last row of the type detection rows. A better algorithm would choose the date format for a given column based on all rows in the type detection story for that column.

I'm changing the title to reflect the todo here: make the type detection algorithm more robust for date time columns.

The workaround for now is to manually specify the date format for the columns.

@davidanthoff davidanthoff changed the title Date parsing issue "Month out of range" Make date format guess algorithm more robust Mar 17, 2019
@CNOT
Copy link

CNOT commented Jun 7, 2024

Can I bump this issue? I recently had a .shp file that I wanted to read with the GeoDataFrames.read function and I get the error:
ERROR: ArgumentError: Month: 0 out of range (1:12)
However, upon further investigation, the months range from 3 to 12 and there is no such row with month being 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants