Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider implements a include filter to define relevant notifications #91

Merged
merged 14 commits into from
Oct 1, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Changelog

## v2.0.3

### Added

- #91 - `Provider` now adds `_include_filter` and `_exclude_filter` attributes to filter in and out notifications that are relevant to be parsed vs other that are not, avoiding false positives.

## v2.0.2 - 2021-09-28

### Fixed
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ Circuit Maintenance Notification #0
circuit-maintenance-parser --data-file "/tmp/___ZAYO TTN-00000000 Planned MAINTENANCE NOTIFICATION___.eml" --data-type email --provider-type zayo
Circuit Maintenance Notification #0
{
"account": "Linode",
"account": "some account",
"circuits": [
{
"circuit_id": "/OGYX/000000/ /ZYO /",
Expand Down Expand Up @@ -226,6 +226,7 @@ The project is following Network to Code software development guidelines and is
1. Define the `Parsers`(inheriting from some of the generic `Parsers` or a new one) that will extract the data from the notification, that could contain itself multiple `DataParts`. The `data_type` of the `Parser` and the `DataPart` have to match. The custom `Parsers` will be placed in the `parsers` folder.
2. Update the `unit/test_parsers.py` with the new parsers, providing some data to test and validate the extracted data.
3. Define a new `Provider` inheriting from the `GenericProvider`, defining the `Processors` and the respective `Parsers` to be used. Maybe you can reuse some of the generic `Processors` or maybe you will need to create a custom one. If this is the case, place it in the `processors` folder.
- The `Provider` also supports the definition of a `_include_filter` and a `_exclude_filter` to limit the notifications that are actually processed, avoiding false positive errors for notification that are not relevant.
4. Update the `unit/test_e2e.py` with the new provider, providing some data to test and validate the final `Maintenances` created.
5. **Expose the new `Provider` class** updating the map `SUPPORTED_PROVIDERS` in `circuit_maintenance_parser/__init__.py` to officially expose the `Provider`.

Expand Down
4 changes: 4 additions & 0 deletions circuit_maintenance_parser/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""Constants used in the library."""

EMAIL_HEADER_SUBJECT = "email-header-subject"
EMAIL_HEADER_DATE = "email-header-date"
7 changes: 4 additions & 3 deletions circuit_maintenance_parser/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

import email
from pydantic import BaseModel, Extra
from circuit_maintenance_parser.constants import EMAIL_HEADER_SUBJECT, EMAIL_HEADER_DATE


logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -73,9 +75,8 @@ def init_from_emailmessage(cls: Type["NotificationData"], email_message) -> Opti
cls.walk_email(email_message, data_parts)

# Adding extra headers that are interesting to be parsed
data_parts.add(DataPart("email-header-subject", email_message["Subject"].encode()))
# TODO: Date could be used to extend the "Stamp" time of a notification when not available, but we need a parser
data_parts.add(DataPart("email-header-date", email_message["Date"].encode()))
data_parts.add(DataPart(EMAIL_HEADER_SUBJECT, email_message["Subject"].encode()))
data_parts.add(DataPart(EMAIL_HEADER_DATE, email_message["Date"].encode()))
return cls(data_parts=list(data_parts))
except Exception: # pylint: disable=broad-except
logger.exception("Error found initializing data from email message: %s", email_message)
Expand Down
5 changes: 3 additions & 2 deletions circuit_maintenance_parser/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

from circuit_maintenance_parser.errors import ParserError
from circuit_maintenance_parser.output import Status, Impact, CircuitImpact
from circuit_maintenance_parser.constants import EMAIL_HEADER_SUBJECT, EMAIL_HEADER_DATE

# pylint: disable=no-member

Expand Down Expand Up @@ -177,7 +178,7 @@ def clean_line(line):
class EmailDateParser(Parser):
"""Parser for Email Date."""

_data_types = ["email-header-date"]
_data_types = [EMAIL_HEADER_DATE]

def parser_hook(self, raw: bytes):
"""Execute parsing."""
Expand All @@ -190,7 +191,7 @@ def parser_hook(self, raw: bytes):
class EmailSubjectParser(Parser):
"""Parse data from subject or email."""

_data_types = ["email-header-subject"]
_data_types = [EMAIL_HEADER_SUBJECT]

def parser_hook(self, raw: bytes):
"""Execute parsing."""
Expand Down
46 changes: 45 additions & 1 deletion circuit_maintenance_parser/provider.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
"""Definition of Provider class as the entry point to the library."""
import logging
import re
import traceback

from typing import Iterable, List
from typing import Iterable, List, Dict

from pydantic import BaseModel

Expand All @@ -13,6 +14,7 @@
from circuit_maintenance_parser.parser import ICal, EmailDateParser
from circuit_maintenance_parser.errors import ProcessorError, ProviderError
from circuit_maintenance_parser.processor import CombinedProcessor, SimpleProcessor, GenericProcessor
from circuit_maintenance_parser.constants import EMAIL_HEADER_SUBJECT

from circuit_maintenance_parser.parsers.aquacomms import HtmlParserAquaComms1, SubjectParserAquaComms1
from circuit_maintenance_parser.parsers.aws import SubjectParserAWS1, TextParserAWS1
Expand Down Expand Up @@ -50,6 +52,14 @@ class GenericProvider(BaseModel):
that will be used. Default: `[SimpleProcessor(data_parsers=[ICal])]`.
_default_organizer (optional): Defines a default `organizer`, an email address, to be used to create a
`Maintenance` in absence of the information in the original notification.
_include_filter (optional): Dictionary that defines matching string per data type to take a notification into
account.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to flesh this out with an example or two - it's not obvious from just reading this how a valid _include_filter would be structured.

_exclude_filter (optional): Dictionary that defines matching string per data type to NOT take a notification
into account.
glennmatthews marked this conversation as resolved.
Show resolved Hide resolved

Notes:
- If a notification matches both, the `_include_filter` and `_exclude_filter`, the second takes precedence and
chadell marked this conversation as resolved.
Show resolved Hide resolved
the notification will be filtered out.

Examples:
>>> GenericProvider()
Expand All @@ -59,12 +69,44 @@ class GenericProvider(BaseModel):
_processors: List[GenericProcessor] = [SimpleProcessor(data_parsers=[ICal])]
_default_organizer: str = "unknown"

_include_filter: Dict[str, List[str]] = {}
_exclude_filter: Dict[str, List[str]] = {}

def include_filter_check(self, data: NotificationData) -> bool:
"""If `_include_filter` is defined, it verifies that the matching criteria is met."""
if self._include_filter:
return self.filter_check(self._include_filter, data)
return True

def exclude_filter_check(self, data: NotificationData) -> bool:
"""If `_exclude_filter` is defined, it verifies that the matching criteria is met."""
if self._exclude_filter:
return self.filter_check(self._exclude_filter, data)
return False

@staticmethod
def filter_check(filter_dict: Dict, data: NotificationData) -> bool:
"""Generic filter check."""
for data_part in data.data_parts:
glennmatthews marked this conversation as resolved.
Show resolved Hide resolved
filter_data_type = data_part.type
if filter_data_type not in filter_dict:
continue

data_part_content = data_part.content.decode()
if any(re.search(filter_re, data_part_content) for filter_re in filter_dict[filter_data_type]):
return True
chadell marked this conversation as resolved.
Show resolved Hide resolved

return False

def get_maintenances(self, data: NotificationData) -> Iterable[Maintenance]:
"""Main entry method that will use the defined `_processors` in order to extract the `Maintenances` from data."""
provider_name = self.__class__.__name__
error_message = ""
related_exceptions = []

if self.exclude_filter_check(data) or not self.include_filter_check(data):
return []
chadell marked this conversation as resolved.
Show resolved Hide resolved

for processor in self._processors:
try:
return processor.process(data, self.get_extended_data())
Expand Down Expand Up @@ -172,6 +214,8 @@ class HGC(GenericProvider):
class Lumen(GenericProvider):
"""Lumen provider custom class."""

_include_filter = {EMAIL_HEADER_SUBJECT: ["Scheduled Maintenance Window"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at more examples that I have locally, I think just "Scheduled Maintenance" would be better here. I see some subject lines like Lumen Scheduled Maintenance #: 22194642, Scheduled that appear to me to be valid maintenance notifications.


_processors: List[GenericProcessor] = [
CombinedProcessor(data_parsers=[EmailDateParser, HtmlParserLumen1]),
]
Expand Down
1 change: 1 addition & 0 deletions tests/unit/data/lumen/subject_work_planned
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Scheduled Maintenance Window
36 changes: 20 additions & 16 deletions tests/unit/test_e2e.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from circuit_maintenance_parser.data import NotificationData
from circuit_maintenance_parser.errors import ProviderError

from circuit_maintenance_parser.constants import EMAIL_HEADER_DATE, EMAIL_HEADER_SUBJECT

# pylint: disable=duplicate-code
from circuit_maintenance_parser.provider import (
Expand Down Expand Up @@ -65,7 +65,7 @@
Cogent,
[
("html", Path(dir_path, "data", "cogent", "cogent1.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "cogent", "cogent1_result.json"),
Expand All @@ -76,7 +76,7 @@
Cogent,
[
("html", Path(dir_path, "data", "cogent", "cogent2.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "cogent", "cogent2_result.json"),
Expand Down Expand Up @@ -105,7 +105,8 @@
Lumen,
[
("html", Path(dir_path, "data", "lumen", "lumen1.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_SUBJECT, Path(dir_path, "data", "lumen", "subject_work_planned")),
],
[
Path(dir_path, "data", "lumen", "lumen1_result.json"),
Expand All @@ -116,7 +117,8 @@
Lumen,
[
("html", Path(dir_path, "data", "lumen", "lumen2.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_SUBJECT, Path(dir_path, "data", "lumen", "subject_work_planned")),
],
[
Path(dir_path, "data", "lumen", "lumen2_result.json"),
Expand All @@ -127,7 +129,8 @@
Lumen,
[
("html", Path(dir_path, "data", "lumen", "lumen3.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_SUBJECT, Path(dir_path, "data", "lumen", "subject_work_planned")),
],
[
Path(dir_path, "data", "lumen", "lumen3_result.json"),
Expand All @@ -138,7 +141,8 @@
Lumen,
[
("html", Path(dir_path, "data", "lumen", "lumen4.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_SUBJECT, Path(dir_path, "data", "lumen", "subject_work_planned")),
],
[
Path(dir_path, "data", "lumen", "lumen4_result.json"),
Expand All @@ -150,7 +154,7 @@
Megaport,
[
("html", Path(dir_path, "data", "megaport", "megaport1.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "megaport", "megaport1_result.json"),
Expand All @@ -161,7 +165,7 @@
Megaport,
[
("html", Path(dir_path, "data", "megaport", "megaport2.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "megaport", "megaport2_result.json"),
Expand Down Expand Up @@ -221,7 +225,7 @@
Telstra,
[
("html", Path(dir_path, "data", "telstra", "telstra1.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "telstra", "telstra1_result.json"),
Expand All @@ -232,7 +236,7 @@
Telstra,
[
("html", Path(dir_path, "data", "telstra", "telstra2.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "telstra", "telstra2_result.json"),
Expand All @@ -245,7 +249,7 @@
Turkcell,
[
("html", Path(dir_path, "data", "turkcell", "turkcell1.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "turkcell", "turkcell1_result.json"),
Expand All @@ -256,7 +260,7 @@
Turkcell,
[
("html", Path(dir_path, "data", "turkcell", "turkcell2.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "turkcell", "turkcell2_result.json"),
Expand All @@ -268,7 +272,7 @@
Verizon,
[
("html", Path(dir_path, "data", "verizon", "verizon1.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "verizon", "verizon1_result.json"),
Expand All @@ -279,7 +283,7 @@
Verizon,
[
("html", Path(dir_path, "data", "verizon", "verizon2.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "verizon", "verizon2_result.json"),
Expand All @@ -290,7 +294,7 @@
Verizon,
[
("html", Path(dir_path, "data", "verizon", "verizon3.html")),
("email-header-date", Path(dir_path, "data", "date", "email_date_1")),
(EMAIL_HEADER_DATE, Path(dir_path, "data", "date", "email_date_1")),
],
[
Path(dir_path, "data", "verizon", "verizon3_result.json"),
Expand Down
50 changes: 50 additions & 0 deletions tests/unit/test_providers.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,53 @@ def test_provide_get_maintenances_one_exception(provider_class):
else:
provider.get_maintenances(fake_data)
assert mock_processor.call_count == 2


def test_provider_with_include_filter():
"""Tests usage of _include_filter."""

class ProviderWithIncludeFilter(GenericProvider):
"""Fake Provider."""

_include_filter = {fake_data.data_parts[0].type: [fake_data.data_parts[0].content.decode()]}

# Because the include filter is matching with the data, we expect that we hit the `process`
with pytest.raises(ProviderError):
ProviderWithIncludeFilter().get_maintenances(fake_data)

# With a non matching data to include, the notification will be skipped and just return empty
other_fake_data = NotificationData.init_from_raw("other type", b"other data")
assert ProviderWithIncludeFilter().get_maintenances(other_fake_data) == []


def test_provider_with_exclude_filter():
"""Tests usage of _exclude_filter."""

class ProviderWithIncludeFilter(GenericProvider):
"""Fake Provider."""

_exclude_filter = {fake_data.data_parts[0].type: [fake_data.data_parts[0].content.decode()]}

# Because the exclude filter is matching with the data, we expect that we skip the processing
assert ProviderWithIncludeFilter().get_maintenances(fake_data) == []

# With a non matching data to exclude, the notification will be not skipped and processed
other_fake_data = NotificationData.init_from_raw("other type", b"other data")
with pytest.raises(ProviderError):
ProviderWithIncludeFilter().get_maintenances(other_fake_data)


def test_provider_with_include_and_exclude_filters():
"""Tests matching of include and exclude filter, where the exclude takes precedence."""
data = NotificationData.init_from_raw("fake_type", b"fake data")
data.add_data_part("other_type", b"other data")

class ProviderWithIncludeFilter(GenericProvider):
"""Fake Provider."""

_include_filter = {data.data_parts[0].type: [data.data_parts[0].content.decode()]}
_exclude_filter = {data.data_parts[1].type: [data.data_parts[1].content.decode()]}

# Because the exclude filter and the include filter are matching, we expect the exclude to take
# precedence
assert ProviderWithIncludeFilter().get_maintenances(data) == []