Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combined schedulers #3839

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open

Combined schedulers #3839

wants to merge 40 commits into from

Conversation

jpbruinsslot
Copy link
Contributor

@jpbruinsslot jpbruinsslot commented Nov 13, 2024

Warning

Ready for review, not for merging

Changes

  • app.py: Removal of creation of organisational schedulers. We don't gather all available organisations from the katalogus and create individual boefje, report, and normalizer schedulers.
  • app.py: Removal of continuous checking of added or removed organisations. We removed the monitor_organisations method that was running in a thread to check if we added or removed organisation and create schedulers for it.
  • models/ : Task and Schedule get a specific organisation field. Since we don't differentiate tasks that are pushed on an organisation queue anymore, we need to be able to determine for what organisation an item on a queue is from.1
  • schedulers/: Since we've consolidated the organisational schedulers to 3 types of schedulers, and we assume that we get task creation from consolidated message queues, we need to update the schedulers to handle that.
    • BoefjeScheduler
      • retrieve scan profile mutations from one message queue
      • iterate over organisation to check for enables boefjes2
      • refactor of method names and exception handling
  • server/
    • Merger of queues/ and schedulers/ endpoint. The notion of a queue and scheduler are used interchangeably. Therefore it makes more sense at the moment to merge those endpoints to avoid confusion and simplify the code.
    • Support to pop off more than one Task from a scheduler. Task runner will now be able to batch pop tasks from a scheduler using filters.

Next steps and impact

Issue link

#3838

QA notes

Please add some information for QA on how to test the newly created code.


Code Checklist

  • All the commits in this PR are properly PGP-signed and verified.
  • This PR only contains functionality relevant to the issue.
  • I have written unit tests for the changes or fixes I made.
  • I have checked the documentation and made changes where necessary.
  • I have performed a self-review of my code and refactored it to the best of my abilities.
  • Tickets have been created for newly discovered issues.
  • For any non-trivial functionality, I have added integration and/or end-to-end tests.
  • I have informed others of any required .env changes files if required and changed the .env-dist accordingly.
  • I have included comments in the code to elaborate on what is not self-evident from the code itself, including references to issues and discussions online, or implicit behavior of an interface.

Checklist for code reviewers:

Copy-paste the checklist from the docs/source/templates folder into your comment.


Checklist for QA:

Copy-paste the checklist from the docs/source/templates folder into your comment.

Footnotes

  1. The attentive reader will know that the organisation information is present in the data specification in a Task , and can be filtered on by using a FilterRequest. However in practice the scheduler is mainly used in the context of multiple organisations and filtering on this id is deemed developer friendly when interfacing with the scheduler. Hence the choice explicitly defining the organisation field on a top level.

  2. as mentioned in the code this iterating over all organisations in order to check if there are enabled plugins is very inefficient, please refer to issue Katalogus caching in the scheduler #3357

@jpbruinsslot jpbruinsslot added the mula Issues related to the scheduler label Nov 13, 2024
@jpbruinsslot jpbruinsslot self-assigned this Nov 13, 2024
@jpbruinsslot jpbruinsslot force-pushed the poc/mula/combined-schedulers branch from 57d040f to da3f337 Compare November 18, 2024 13:29
@jpbruinsslot jpbruinsslot linked an issue Nov 21, 2024 that may be closed by this pull request
@jpbruinsslot jpbruinsslot changed the title POC: combined schedulers Combined schedulers Dec 2, 2024
* main: (64 commits)
  Bug fix: KAT-alogus parameter is now organization member instead of organization code (#3895)
  Remove sigrid workflows (#3920)
  Updated packages (#3898)
  Fix mula migrations Debian package (#3919)
  Adds loggers to report flow (#3872)
  Add additional check if task already run for report scheduler (#3900)
  Create separate finding for Microsoft RDP port (#3882)
  fix: 🐛 allow boefje completion with 404 (#3893)
  Feature/improve rename bulk modal (#3885)
  Update scheduler folder structure (#3883)
  Translations update from Hosted Weblate (#3870)
  Increase max number of PostgreSQL connections (#3889)
  Fix for task id as valid UUID (#3744)
  Add `auto_calculate_deadline` attribute to Scheduler (#3869)
  Ignore specific url parameters when following location headers (#3856)
  Let mailserver inherit l1 (#3704)
  Change plugins enabling in report flow to checkboxes (#3747)
  Fix rocky katalogus tests and delete unused fixtures (#3884)
  Enable/disable scheduled reports (#3871)
  optimize locking in katalogus.py, reuse available data (#3752)
  ...
Copy link
Contributor

@ammar92 ammar92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. Just a few remarks while skimming through it, not a detailed look yet

auto_calculate_deadline=False,
)
ID: str = "report"
ITEM_TYPE: Any = models.ReportTask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any should be a type?

create_schedule=True,
auto_calculate_deadline=False,
)
ID: str = "report"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ID: str = "report"
ID: Literal["report"] = "report"

]
),
)
return not count > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return not count > 0
return not count

@jpbruinsslot jpbruinsslot mentioned this pull request Dec 10, 2024
9 tasks
* main: (21 commits)
  Bump django from 5.0.9 to 5.0.10 in /rocky (#3940)
  Do not let enabling plugins affect the global plugin cache (#3944)
  Fix typing in more places and configure mypy to follow imports (#3932)
  Updates CWE archive to 4.16 (#3943)
  Report flaws (#3880)
  Translations update from Hosted Weblate (#3939)
  Fix report recipe API (#3942)
  Boefje runonce functionality in scheduler (#3906)
  fix: 🔨 do not store CDN findings (#3931)
  Dont check for Locations on local Ip's. (#3894)
  add unpkg.com to disallowed hostnames in CSP (#3927)
  Update website_discovery.py (#3921)
  Add export http boefje (#3901)
  Bump python-multipart from 0.0.9 to 0.0.18 in /bytes (#3925)
  Fix layout issues on scheduled reports page (#3930)
  Create scheduled report with zero objects selectable (#3907)
  Improve the KATalogus `/plugins` endpoint performance (#3892)
  Add bgp.jsonl and bgp-meta.json to .gitignore (#3928)
  Update pre-commit and all hooks (#3923)
  add support for detecting Lame dns delegations on ip ranges (#3899)
  ...
@stephanie0x00
Copy link
Contributor

Current state (16/12) of this PR.

  • Tasks don't seem to run. I did the onboarding and no tasks for boefjes or normalizers show up on the Tasks page. However, some findings are shown on the Findings page (which remain pending).

No tasks on the tasks page:
image

Findings page for this state:
image

boefje-1  | HTTPStatusError: Client error '404 Not Found' for url 'http://scheduler:8000/queues'
boefje-1  | For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
boefje-1  | Traceback (most recent call last):
boefje-1  |   File "/app/boefjes/boefjes/app.py", line 89, in _fill_queue
boefje-1  |     queues = self.scheduler_client.get_queues()
boefje-1  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
boefje-1  |   File "/app/boefjes/boefjes/clients/scheduler_client.py", line 71, in get_queues
boefje-1  |     self._verify_response(response)
boefje-1  |   File "/app/boefjes/boefjes/clients/scheduler_client.py", line 67, in _verify_response
boefje-1  |     response.raise_for_status()
boefje-1  |   File "/usr/local/lib/python3.11/site-packages/httpx/_models.py", line 763, in raise_for_status
boefje-1  |     raise HTTPStatusError(message, request=request, response=self)
boefje-1  | httpx.HTTPStatusError: Client error '404 Not Found' for url 'http://scheduler:8000/queues'
boefje-1  | For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
boefje-1  | HTTP Request: GET http://scheduler:8000/queues "HTTP/1.1 404 Not Found"
boefje-1  | 2024-12-16T11:47:16.426262 [error] Getting the queues from the scheduler failed
boefje-1  | ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
boefje-1  | │ /app/boefjes/boefjes/app.py:89 in _fill_queue                                                    │
boefje-1  | │                                                                                                  │
boefje-1  | │    86 │   │   │   return                                                                         │
boefje-1  | │    87 │   │                                                                                      │
boefje-1  | │    88 │   │   try:                                                                               │
boefje-1  | │ ❱  89 │   │   │   queues = self.scheduler_client.get_queues()                                    │
boefje-1  | │    90 │   │   except HTTPError:                                                                  │
boefje-1  | │    91 │   │   │   # Scheduler is having issues, so make note of it and try again                 │
boefje-1  | │    92 │   │   │   logger.exception("Getting the queues from the scheduler failed")               │
boefje-1  | │                                                                                                  │
boefje-1  | │ ╭────────────────────────────────── locals ──────────────────────────────────╮                   │
boefje-1  | │ │ queue_type = <Queue.BOEFJES: 'boefje'>                                     │                   │
boefje-1  | │ │       self = <boefjes.app.SchedulerWorkerManager object at 0x74d6402b2e10> │                   │
boefje-1  | │ │ task_queue = <AutoProxy[Queue] object, typeid 'Queue' at 0x74d6406533d0>   │                   │
boefje-1  | │ ╰────────────────────────────────────────────────────────────────────────────╯                   │
boefje-1  | │                                                                                                  │
boefje-1  | │ /app/boefjes/boefjes/clients/scheduler_client.py:71 in get_queues                                │
boefje-1  | │                                                                                                  │
boefje-1  | │   68 │                                                                                           │
boefje-1  | │   69 │   def get_queues(self) -> list[Queue]:                                                    │
boefje-1  | │   70 │   │   response = self._session.get("/queues")                                             │
boefje-1  | │ ❱ 71 │   │   self._verify_response(response)                                                     │
boefje-1  | │   72 │   │                                                                                       │
boefje-1  | │   73 │   │   return TypeAdapter(list[Queue]).validate_json(response.content)                     │
boefje-1  | │   74                                                                                             │
boefje-1  | │                                                                                                  │
boefje-1  | │ ╭───────────────────────────────────────── locals ──────────────────────────────────────────╮    │
boefje-1  | │ │ response = <Response [404 Not Found]>                                                     │    │
boefje-1  | │ │     self = <boefjes.clients.scheduler_client.SchedulerAPIClient object at 0x74d6402b2890> │    │
boefje-1  | │ ╰───────────────────────────────────────────────────────────────────────────────────────────╯    │
boefje-1  | │                                                                                                  │
boefje-1  | │ /app/boefjes/boefjes/clients/scheduler_client.py:67 in _verify_response                          │
boefje-1  | │                                                                                                  │
boefje-1  | │   64 │                                                                                           │
boefje-1  | │   65 │   @staticmethod                                                                           │
boefje-1  | │   66 │   def _verify_response(response: Response) -> None:                                       │
boefje-1  | │ ❱ 67 │   │   response.raise_for_status()                                                         │
boefje-1  | │   68 │                                                                                           │
boefje-1  | │   69 │   def get_queues(self) -> list[Queue]:                                                    │
boefje-1  | │   70 │   │   response = self._session.get("/queues")                                             │
boefje-1  | │                                                                                                  │
boefje-1  | │ ╭─────────────── locals ────────────────╮                                                        │
boefje-1  | │ │ response = <Response [404 Not Found]> │                                                        │
boefje-1  | │ ╰───────────────────────────────────────╯                                                        │
boefje-1  | │                                                                                                  │
boefje-1  | │ /usr/local/lib/python3.11/site-packages/httpx/_models.py:763 in raise_for_status                 │
boefje-1  | │                                                                                                  │
boefje-1  | │    760 │   │   }                                                                                 │
boefje-1  | │    761 │   │   error_type = error_types.get(status_class, "Invalid status code")                 │
boefje-1  | │    762 │   │   message = message.format(self, error_type=error_type)                             │
boefje-1  | │ ❱  763 │   │   raise HTTPStatusError(message, request=request, response=self)                    │
boefje-1  | │    764 │                                                                                         │
boefje-1  | │    765 │   def json(self, **kwargs: typing.Any) -> typing.Any:                                   │
boefje-1  | │    766 │   │   return jsonlib.loads(self.content, **kwargs)                                      │
boefje-1  | │                                                                                                  │
boefje-1  | │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
boefje-1  | │ │   error_type = 'Client error'                                                                │ │
boefje-1  | │ │  error_types = {                                                                             │ │
boefje-1  | │ │                │   1: 'Informational response',                                              │ │
boefje-1  | │ │                │   3: 'Redirect response',                                                   │ │
boefje-1  | │ │                │   4: 'Client error',                                                        │ │
boefje-1  | │ │                │   5: 'Server error'                                                         │ │
boefje-1  | │ │                }                                                                             │ │
boefje-1  | │ │      message = "Client error '404 Not Found' for url 'http://scheduler:8000/queues'\nFor     │ │
boefje-1  | │ │                more inf"+76                                                                  │ │
boefje-1  | │ │      request = <Request('GET', 'http://scheduler:8000/queues')>                              │ │
boefje-1  | │ │         self = <Response [404 Not Found]>                                                    │ │
boefje-1  | │ │ status_class = 4                                                                             │ │
boefje-1  | │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
boefje-1  | ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mula Issues related to the scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Combine all schedulers for all organisations
3 participants