Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/develop' into issue-56-stamp-parser
Browse files Browse the repository at this point in the history
  • Loading branch information
chadell committed Sep 6, 2021
2 parents 404a56b + 4790848 commit 5cd56c1
Show file tree
Hide file tree
Showing 22 changed files with 684 additions and 40 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
emails.
- Tests refactor to make them more specific to each type of data, mocking interfaces between different classes.
- #59 - Added a new parser `EmailDateParser` that uses the temail `Date` to get the `Stamp` and use in most of the `Providers` via the `CombinedProcessor`. Also, `Maintenance.stamp` attribute is mandatory.
- #60 - Added new provider `Seaborn` using `Html` and a new parser for Email Subject: `EmailSubjectParser`

### Fixed

Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ By default, there is a `GenericProvider` that support a `SimpleProcessor` using
- GTT
- Lumen
- Megaport
- Seaborn
- Telstra
- Turkcell
- Verizon
Expand Down Expand Up @@ -216,12 +217,11 @@ The project is following Network to Code software development guidelines and is

### How to add a new Circuit Maintenance provider?

1. If your Provider requires a custom parser, within `circuit_maintenance_parser/parsers`, **add your new parser**, inheriting from generic
`Parser` class or custom ones such as `ICal` or `Html` and add a **unit test for the new provider parser**, with at least one test case under
`tests/unit/data`.
2. Add new class in `providers.py` with the custom info, defining in `_parser_classes` the list of parsers that you will use, using the generic `ICal` and/or your custom parsers.
3. **Expose the new parser class** updating the map `SUPPORTED_PROVIDERS` in
`circuit_maintenance_parser/__init__.py` to officially expose the parser.
1. Define the `Parsers`(inheriting from some of the generic `Parsers` or a new one) that will extract the data from the notification, that could contain itself multiple `DataParts`. The `data_type` of the `Parser` and the `DataPart` have to match. The custom `Parsers` will be placed in the `parsers` folder.
2. Update the `unit/test_parsers.py` with the new parsers, providing some data to test and validate the extracted data.
3. Define a new `Provider` inheriting from the `GenericProvider`, defining the `Processors` and the respective `Parsers` to be used. Maybe you can reuse some of the generic `Processors` or maybe you will need to create a custom one. If this is the case, place it in the `processors` folder.
4. Update the `unit/test_e2e.py` with the new provider, providing some data to test and validate the final `Maintenances` created.
5. **Expose the new `Provider` class** updating the map `SUPPORTED_PROVIDERS` in `circuit_maintenance_parser/__init__.py` to officially expose the `Provider`.

## Questions

Expand Down
2 changes: 2 additions & 0 deletions circuit_maintenance_parser/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
Megaport,
NTT,
PacketFabric,
Seaborn,
Telia,
Telstra,
Turkcell,
Expand All @@ -29,6 +30,7 @@
Megaport,
NTT,
PacketFabric,
Seaborn,
Telia,
Telstra,
Turkcell,
Expand Down
32 changes: 23 additions & 9 deletions circuit_maintenance_parser/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,31 @@ def init_from_email_bytes(cls, raw_email_bytes: bytes):
return cls.init_from_emailmessage(email_message)

@classmethod
def init_from_emailmessage(cls, email_message):
"""Initialize the data_parts from an email.message.Email object."""
data_parts = []
def walk_email(cls, email_message, data_parts):
"""Recursive walk_email using Set to not duplicate data entries."""
for part in email_message.walk():
if "multipart" in part.get_content_type():
if "image" in part.get_content_type():
# Not interested in parsing images/QRs yet
continue
data_parts.append(DataPart(part.get_content_type(), part.get_payload().encode()))

if "multipart" in part.get_content_type():
for inner_part in part.get_payload():
if isinstance(inner_part, email.message.Message):
cls.walk_email(inner_part, data_parts)
elif "message/rfc822" in part.get_content_type():
if isinstance(part.get_payload(), email.message.Message):
cls.walk_email(part.get_payload(), data_parts)
else:
data_parts.add(DataPart(part.get_content_type(), part.get_payload(decode=True)))

@classmethod
def init_from_emailmessage(cls, email_message):
"""Initialize the data_parts from an email.message.Email object."""
data_parts = set()
cls.walk_email(email_message, data_parts)

# Adding extra headers that are interesting to be parsed
data_parts.append(DataPart("email-header-subject", email_message["Subject"].encode()))
data_parts.add(DataPart("email-header-subject", email_message["Subject"].encode()))
# TODO: Date could be used to extend the "Stamp" time of a notification when not available, but we need a parser
data_parts.append(DataPart("email-header-date", email_message["Date"].encode()))

return cls(data_parts=data_parts)
data_parts.add(DataPart("email-header-date", email_message["Date"].encode()))
return cls(data_parts=list(data_parts))
34 changes: 34 additions & 0 deletions circuit_maintenance_parser/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,11 @@ class Html(Parser):

_data_types = ["text/html", "html"]

@staticmethod
def remove_hex_characters(string):
"""Convert any hex characters to standard ascii."""
return string.encode("ascii", errors="ignore").decode("utf-8")

def parse(self, raw: bytes) -> List[Dict]:
"""Execute parsing."""
result = []
Expand Down Expand Up @@ -180,3 +185,32 @@ def parse(self, raw: bytes) -> List[Dict]:
raise ParserError("Not parsed_date available.")
except Exception as exc:
raise ParserError from exc


class EmailSubjectParser(Parser):
"""Parse data from subject or email."""

_data_types = ["email-header-subject"]

def parse(self, raw: bytes) -> List[Dict]:
"""Execute parsing."""
result = []

try:
for data in self.parse_subject(self.bytes_to_string(raw)):
result.append(data)
logger.debug("Successful parsing for %s", self.__class__.__name__)

return result

except Exception as exc:
raise ParserError from exc

def parse_subject(self, subject: str) -> List[Dict]:
"""Custom subject parsing."""
raise NotImplementedError

@staticmethod
def bytes_to_string(string):
"""Convert bytes variable to a string."""
return string.decode("utf-8")
140 changes: 140 additions & 0 deletions circuit_maintenance_parser/parsers/seaborn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
"""Seaborn parser."""
import logging
import re

from dateutil import parser

from circuit_maintenance_parser.errors import ParserError
from circuit_maintenance_parser.parser import CircuitImpact, Html, Impact, Status, EmailSubjectParser

# pylint: disable=too-many-branches


logger = logging.getLogger(__name__)


class SubjectParserSeaborn1(EmailSubjectParser):
"""Parser for Seaborn subject string, email type 1.
Subject: [{ACOUNT NAME}] {MAINTENACE ID} {DATE}
[Customer Direct] 1111 08/14
"""

def parse_subject(self, subject):
"""Parse subject of email file."""
data = {}
try:
search = re.search(r".+\[(.+)\].([0-9]+).+", subject)
if search:
data["account"] = search.group(1)
data["maintenance_id"] = search.group(2)
return [data]

except Exception as exc:
raise ParserError from exc


class SubjectParserSeaborn2(EmailSubjectParser):
"""Parser for Seaborn subject string, email type 2.
Subject: [## {ACCOUNT NUMBER} ##] Emergency Maintenance Notification CID: {CIRCUIT} TT#{MAINTENACE ID}
[## 11111 ##] Emergency Maintenance Notification CID: AAA-AAAAA-AAAAA-AAA1-1111-11 TT#1111
"""

def parse_subject(self, subject):
"""Parse subject of email file."""
data = {}
try:
search = re.search(r".+\[## ([0-9]+) ##\].+", subject)
if search:
data["account"] = search.group(1)
return [data]

except Exception as exc:
raise ParserError from exc


class HtmlParserSeaborn1(Html):
"""Notifications HTML Parser 1 for Seaborn notifications.
<div>
<p>DESCRIPTION: This is a maintenance notification.</p>
<p>SERVICE IMPACT: 05 MINUTE OUTAGE</p>
<p>LOCATION: London</p>
...
</div>
"""

def parse_html(self, soup, data_base):
"""Execute parsing."""
data = data_base.copy()
try:
self.parse_body(soup, data)
return [data]

except Exception as exc:
raise ParserError from exc

def parse_body(self, body, data):
"""Parse HTML body."""
data["circuits"] = []
p_elements = body.find_all("p")

for index, element in enumerate(p_elements):
if "DESCRIPTION" in element.text:
data["summary"] = element.text.split(":")[1].strip()
elif "SCHEDULE" in element.text:
schedule = p_elements[index + 1].text
start, end = schedule.split(" - ")
data["start"] = self.dt2ts(parser.parse(start))
data["end"] = self.dt2ts(parser.parse(end))
data["status"] = Status("CONFIRMED")
elif "AFFECTED CIRCUIT" in element.text:
circuit_id = element.text.split(": ")[1]
data["circuits"].append(CircuitImpact(impact=Impact("OUTAGE"), circuit_id=circuit_id))


class HtmlParserSeaborn2(Html):
"""Notifications HTML Parser 2 for Seaborn notifications.
<div>
<div>DESCRIPTION: This is a maintenance notification.</div>
<div>SERVICE IMPACT: 05 MINUTE OUTAGE</div>
<div>LOCATION: London</div>
...
</div>
"""

def parse_html(self, soup, data_base):
"""Execute parsing."""
data = data_base.copy()
try:
self.parse_body(soup, data)
return [data]

except Exception as exc:
raise ParserError from exc

def parse_body(self, body, data):
"""Parse HTML body."""
data["circuits"] = []
div_elements = body.find_all("div")
for element in div_elements:
if "Be advised" in element.text:
if "been rescheduled" in element.text:
data["status"] = Status["RE_SCHEDULED"]
elif "been scheduled" in element.text:
data["status"] = Status["CONFIRMED"]
elif "Description" in element.text:
data["summary"] = element.text.split(":")[1].strip()
elif "Seaborn Ticket" in element.text:
data["maintenance_id"] = element.text.split(":")[1]
elif "Start date" in element.text:
start = element.text.split(": ")[1]
data["start"] = self.dt2ts(parser.parse(start))
elif "Finish date" in element.text:
end = element.text.split(": ")[1]
data["end"] = self.dt2ts(parser.parse(end))
elif "Circuit impacted" in element.text:
circuit_id = self.remove_hex_characters(element.text).split(":")[1]
data["circuits"].append(CircuitImpact(impact=Impact("OUTAGE"), circuit_id=circuit_id))
16 changes: 16 additions & 0 deletions circuit_maintenance_parser/provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@
from circuit_maintenance_parser.parsers.gtt import HtmlParserGTT1
from circuit_maintenance_parser.parsers.lumen import HtmlParserLumen1
from circuit_maintenance_parser.parsers.megaport import HtmlParserMegaport1
from circuit_maintenance_parser.parsers.seaborn import (
HtmlParserSeaborn1,
HtmlParserSeaborn2,
SubjectParserSeaborn1,
SubjectParserSeaborn2,
)
from circuit_maintenance_parser.parsers.telstra import HtmlParserTelstra1
from circuit_maintenance_parser.parsers.turkcell import HtmlParserTurkcell1
from circuit_maintenance_parser.parsers.verizon import HtmlParserVerizon1
Expand Down Expand Up @@ -150,6 +156,16 @@ class PacketFabric(GenericProvider):
_default_organizer = "[email protected]"


class Seaborn(GenericProvider):
"""Seaborn provider custom class."""

_processors: List[GenericProcessor] = [
CombinedProcessor(data_parsers=[EmailDateParser, HtmlParserSeaborn1, SubjectParserSeaborn1]),
CombinedProcessor(data_parsers=[EmailDateParser, HtmlParserSeaborn2, SubjectParserSeaborn2]),
]
_default_organizer = "[email protected]"


class Telia(GenericProvider):
"""Telia provider custom class."""

Expand Down
89 changes: 89 additions & 0 deletions tests/unit/data/seaborn/seaborn1.eml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
Date: Mon, 16 Aug 2021 17:29:56 +0100
Message-ID: <CACtiu=[email protected]>
Subject: Fwd: [rd-notices] Re:[## 99999 ##] Emergency Maintenance Notification
CID: AAA-AAAAA-AAAAA-AAA1-00000-00 TT#7777
Content-Type: multipart/related; boundary="000000000000dfb73e05c9afb661"

--000000000000dfb73e05c9afb661
Content-Type: multipart/alternative; boundary="000000000000dfb73c05c9afb660"
--000000000000dfb73c05c9afb660
Content-Type: text/plain; charset="UTF-8"
---------- Forwarded message ---------
From: NOC Seaborn <[email protected]>
Date: Wed, 11 Aug 2021 at 23:09
Subject: [rd-notices] Re:[## 99999 ##] Emergency Maintenance Notification
CID: AAA-AAAAA-AAAAA-AAA1-00000-00 TT#7777
To: <[email protected]>

--000000000000dfb73c05c9afb660
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br clear=3D"all"><div><div dir=3D"ltr" class=3D"gmail_sig=
nature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div dir=3D"ltr=
"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div =
dir=3D"ltr"><p><span style=3D"background-color:rgb(255,255,255)" lang=3D"EN=
-US"><font size=3D"2" face=3D"tahoma, sans-serif" color=3D"#000000">Be brig=
ht</font></span></p>

<table style=3D"border:none;border-collapse:collapse"><colgroup><col width=
=3D"65"><col width=3D"241"></colgroup><tbody><tr style=3D"height:46pt"><td =
style=3D"vertical-align:top;padding:5pt 5pt 5pt 5pt;overflow:hidden"><p dir=
=3D"ltr" style=3D"line-height:1.2;margin-top:0pt;margin-bottom:0pt"><span s=
tyle=3D"font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:=
transparent;vertical-align:baseline;white-space:pre-wrap"><span style=3D"bo=
rder:none;display:inline-block;overflow:hidden;width:47px;height:48px"><img=
src=3D"https://lh5.googleusercontent.com/U9uUZC2e3L55zxG_yWsPLz4ffrHQwFRbD=
LJynW4VqWiW8f4SMROxnkrO0KZBZoV8Y3MZPEqbBHwShT5SyQ0VnQ7FuAEZfYvWTM6Ha5WVmXSq=
qN8WOUFW1J726dGUynkZm7f4LsfH" width=3D"47" height=3D"48.01172447484123" sty=
le=3D"margin-left:0px"></span></span></p></td><td style=3D"vertical-align:t=
op;padding:5pt 5pt 5pt 5pt;overflow:hidden"><p dir=3D"ltr" style=3D"line-he=
ight:1.2;margin-top:0pt;margin-bottom:0pt"><span style=3D"font-size:9pt;fon=
t-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:70=
0;vertical-align:baseline;white-space:pre-wrap">Engineer</span></p><p =
dir=3D"ltr" style=3D"line-height:1.2;margin-top:0pt;margin-bottom:0pt"><spa=
n style=3D"font-size:9pt;font-family:Arial;color:rgb(0,0,0);background-colo=
r:transparent;vertical-align:baseline;white-space:pre-wrap">Network Enginee=
r III=C2=A0 |=C2=A0 Customer</span></p><p dir=3D"ltr" style=3D"line-heigh=
t:1.2;margin-top:0pt;margin-bottom:0pt"><span style=3D"font-size:9pt;font-f=
amily:Arial;color:rgb(0,0,0);background-color:transparent;vertical-align:ba=
seline;white-space:pre-wrap">Summoner: Customer Eng</span></p></td></tr></t=
body></table></div></div></div></div></div></div></div></div></div><br><br>=
<div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">----------=
Forwarded message ---------<br>From: <strong class=3D"gmail_sendername" di=
r=3D"auto">NOC Seaborn</strong> <span dir=3D"auto">&lt;<a href=3D"mailto:no=
[email protected]">[email protected]</a>&gt;</span><br>Date: Wed,=
11 Aug 2021 at 23:09<br>Subject: [rd-notices] Re:[## 51346 ##] Emergency =
Maintenance Notification CID: AAA-AAAAA-AAAA-AAA1-00000-00 TT#7777<br>To: =
&lt;<a href=3D"mailto:[email protected]">[email protected]</a=
>&gt;<br></div><br><br><u></u><div><div style=3D"font-size:13px;font-family=
:Arial,Helvetica,Verdana,sans-serif"><div><div>Dear Customer,<br></div><div><br=
></div><div>=C2=A0</div><div><br></div><div>Be advised that this maintenanc=
e has been rescheduled and the details are below:</div><div><br></div><div>=
=C2=A0</div><div><br></div><div>Notification Details: Emergency=C2=A0 maint=
enance.</div><div><br></div><div>Description: An emergency work will be car=
ried out to relocate fiber cable due to civil works in the zone.</div><div>=
<br></div><div>Seaborn Ticket number:7777<br></div><div>Start date/time: 8/=
12/2021 2:00:00 am GMT</div><div><br></div><div>Finish date/time: 8/12/2021=
11:00:00 am GMT</div><div><br></div><div>Circuit impacted:=AAA-AAAAA-AAAAA=
-AAA1-00000-00<br><br></div><div>Service Impact: Switch hits UP to 5 mi=
nutes</div><div>=C2=A0</div><div><br></div><div>We regret any inconvenience=
this may cause you.</div><div><br></div><div>=C2=A0</div><div><br></div><d=
iv>Regards,=C2=A0</div></div><div><br></div><div title=3D"sign_holder::star=
t"></div><div><div style=3D"font-size:13px;font-family:Arial,Helvetica,Verd=
ana,sans-serif"><div><div><img style=3D"padding:0px;max-width:100%;box-sizi=
ng:border-box" src=3D"cid:17b4fcc3b448217e76e1"><br></div><div><br></div><d=
iv>Engineer Name<br></div><div>NOC Engineer<br></=
div><div>Seaborn Networks<br></div><div>1-201-351-5806 (US)<br></div><div>0=
800-SEABRAS (0800 732-2727)(Brazil)<br></div><div><a rel=3D"noreferrer" hre=
f=3D"mailto:[email protected]" target=3D"_blank">noc@seabornnetworks.=
com</a><br></div><div><a rel=3D"noreferrer" href=3D"http://www.seabornneetw=
orks.com/" target=3D"_blank">www.seabornnetworks.com</a><br></div><div><br =
style=3D"font-family:Arial,Helvetica,Verdana,sans-serif"></div></div></div>=
</div><div title=3D"sign_holder::end"></div><div><br></div><div title=3D"be=
forequote:::"></div><div><blockquote style=3D"border-left:1px dotted rgb(22=
9,229,229);margin-left:5px;padding-left:5px"><div style=3D"padding-top:10px=
"> <br></div></blockquote></div> <div><br></div></div><div id=3D"m_-8814736=
200534887510ZDeskInteg"></div><br></div>
Loading

0 comments on commit 5cd56c1

Please sign in to comment.