-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Zayo parser to handle additional examples #109
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -77,7 +77,8 @@ def init_from_emailmessage(cls: Type["NotificationData"], email_message) -> Opti | |||
# Adding extra headers that are interesting to be parsed | |||
data_parts.add(DataPart(EMAIL_HEADER_SUBJECT, email_message["Subject"].encode())) | |||
data_parts.add(DataPart(EMAIL_HEADER_DATE, email_message["Date"].encode())) | |||
return cls(data_parts=list(data_parts)) | |||
# Ensure the data parts are processed in a consistent order | |||
return cls(data_parts=sorted(data_parts, key=lambda part: part.type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why order is important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where we have information of different accuracy/precision available from different data parts. For example with Zayo we can always pull an (approximate) stamp
from the email date, but for some notifications we may have a more precise stamp
explicitly stated in the email body. Without this change it was unpredictable as to which one of these would be processed last and therefore win out - see the CI failures on 7970ecc above for an example of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, good catch.
@@ -24,6 +43,14 @@ def parse_html(self, soup): | |||
self.parse_bs(soup.find_all("b"), data) | |||
self.parse_tables(soup.find_all("table"), data) | |||
|
|||
if data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering why do we need this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the case where we might decide that there's no relevant content in the email (and hence return an empty data
dict) I wouldn't want to just fill in these additional fields in an otherwise empty dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, otherwise you could just rely on Pydantic validation once the Maintenance
output is instantiated.
Handle some additional Zayo email notifications that were failing the current parser.
I'm marking this as a draft for now as I have some questions about how to handle information that may be omitted from certain Zayo notification emails (see TODO comments in zayo.py).