-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roundtrip test fails on TK2023 EML files #3
Comments
Regarding the first point, this is caused by the way xsdata handles datetime formatting. I've tried monkey patching this formatting and this fixes the issue, but it feels a bit hacky. I guess the other option is to make the change in the xsdata package, but xsdata handles dates and times according to ISO 8601 which does not require the miliseconds (or at least from what I can see). Regarding the second point, that EML files also fails to validate against the original XSD, so I feel it's fair that these bindings throw the same error. Perhaps this could be handled in a separate test case?
|
Thanks for the clarification on the first point. It looks like xsData decides to omit milliseconds in the timestamp when the millisecond part is zero, which is a shortsighted way of dealing with it (a bit like saying 1.000 = 1, losing significant figures). Monkeypatching definitely feels hacky. Maybe we can raise an issue with xsData upstream to come up with a better solution? As for the second point, I'd say we have to figure out why this EML file does not follow the XSD (i.e. how can the OSV2020 software generate an invalid EML file?). I'd say this library has to be able to parse all EML files that the Kiesraad publishes, so skipping files that we deem invalid is not the right way to go. |
I agree that fixing the data serialization issue upstream feels like a better solution but we'll have to see if they agree with our point of view regarding the formatting. I'll raise an issue! I don't fully agree with your second point, however. I agree that OSV2020 should always generate valid EML files, but for this election this unfortunately was not the case for the NBSB file. This should be fixed for the following elections. One of the goals of this package is to give developers guarantees about the underlying data in terms of fields being present, being of the correct type etc. by strictly following the EML_NL specification so that you can be sure that if the data loads, it is complete and correct. If we choose to relax these data validation rules in this package we lose some of these guarantees, which in my opinion goes against one of the goals of this package. I'd be more in favor of fixing the issue at the data level (i.e. fixing the EML file so that it complies with the standard). What do you think? |
Sounds good!
That depends on whether the field should be required from a functional perspective. If keeping the field required in the XSD means we'd in some cases have to fill it with some kind of zero or null value, I think it would be better to make the field optional instead. And to clarify, I meant that we make it optional in the EML_NL standard and not just in this library. If the field can always be filled with useful information, then it should have been filled by the software that created the EML file and that software should be fixed instead. |
Ah, thanks for the clarification! I thought you suggested for this package to deviate from the standard but I fully agree that we should look into if this field should be made optional since the EML_NL standard has a different interpretation of this field. We (EML_NL) use it to indicate the amount of eligible voters, whereas the original EML standard intended it to be the total amount of cast votes (valid and invalid), see 8.24.1. It seems to me that there can always be a total amount of cast votes (0 is also a total), while we do not necessarily always know the amount of eligible voters for a given reporting unit. |
When running the roundtrip test on the TK2023 EML files, two files fail the test:
Fixed by Custom datetime converter #4.Telling_TK2023_gemeente_Oisterwijk.eml.xml
fails due to timestamp with milliseconds (2023-11-23T19:38:38.000
) in the original EML file, whereas the milliseconds part (.000
) is not in the re-serialized file.Telling_TK2023_NBSB.eml.xml
fails due to a parsing error, this EML file is missing the required<Cast>
element under the<ReportingUnitVotes>
element.Complete `pytest` log
The text was updated successfully, but these errors were encountered: