-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cylc server monitor #72
Comments
I think for the UI we can actually leverage from existing tools. Graphite, Prometheus, Grafana, and so many other tools are able to digest this sort of information. Our dashboard could then have dummy components that simply use these other libraries - or we could even simply use the tools in the UI. These tools are also common in cloud deployments, so if the server side is able to produce a JSON document in the format for prometheus (for example) users woulf be able to choose their monitoring and even alerting solution. Just my 0.02 cents, but great idea and should be fun to implement. |
Lots of fun plotting libraries we could use, interesting point on alerting, the old system does this in the Python backend. |
Grafana etc. are really nice; we should certainly look at using something like that (in due course) since you say a rewrite is needed anyway. |
We could potentially keep the old frontend but it wouldn't take long to re-write so lets do it properly! Some screenshots of the old frontend for reference: Some issues hanging over from the old system transcribed from the old issue tracker (sticky notes on my desk):
Some screenshots of a Python3 CLI utility which works with the JSON dump files produced by the old system: $ suitetool3 --latest
# 732 rows in dataset
0 Add field Add derived field to the data set.
1 Filter Filter by field value.
2 View Print all data
3 Summary Print the first few rows of data.
4 Count Count unique values for a given field.
5 Debug Insert pdb breakpoint
6 Export Data Export the current dataset as a CSV file.
7 Email Users Send an email to all users present in the dataset
8 Stack Action (undo, export, import)
9 Exit
Choose an action (int): 0
0 suite_dir The FS location of the suite directory.
1 root_dir The FS mount which the suite is installed on.
2 shared_account True if the account is *likely* to be a shared account
3 suite_grep Grep *.rc files against a pattern.
4 diff Diff suites present at another checkpoint.
5 cylc_tags Tuple of taggs for the cylc_version
Choose a field (int): 1
[=============================================================================]
[=============================================================================]
# 732 rows in dataset
0 Add field Add derived field to the data set.
1 Filter Filter by field value.
2 View Print all data
3 Summary Print the first few rows of data.
4 Count Count unique values for a given field.
5 Debug Insert pdb breakpoint
6 Export Data Export the current dataset as a CSV file.
7 Email Users Send an email to all users present in the dataset
8 Stack Action (undo, export, import)
9 Exit
Choose an action (int): 4
Available fields "server, suite_id, user_name, user_id, cylc_version, memory, cpu, run_days, last_activity, suite_dir, root_dir"
Choose field: root_dir
field: root_dir
unique items: 4
frequency
---------
/net/home 413
/net/data 289
/net/spice/scratch 29
/net/spice/project 1
items
-----
/net/data|/net/home|/net/spice/scratch|/net/spice/project |
The neatest way to implement this is likely as a jupyter-hub service. This will allow us to run the extension with the hub account privileges if necessary and provide integration with The service would scape Cylc processes from |
This is worth a look, someone worked out how to "proxy" graphana as a Jupyter Hub service - https://github.com/rcthomas/jupyterhub-prometheus-grafana |
(For the record, now using the original "exvcylc" monitor at NIWA, it's super helpful). |
Writing this up here as it's uncertain where this code would live.
At the Met Office we have a pool of
1216 servers which suites can run on. To help us keep track of the health of these servers and the usage of Cylc on them we wrote a tool which provides a web dashboard with:This is important functionality for larger sites, there are lots of ways in which is can be improved (e.g. daily job counts).
This code is written in Python2.6 and has to run in a bare environment so is kinda ugly and not especially portable. It needs a re-write!
We should be able to re-implement this functionality within the Cylc UI/UI-Server infrastructure to provide an admin dashboard. That way this functionality would ship with Cylc and be available to all.
This would involve the creation of a dashboard for Cylc admins (we could make it accessible to all users), it would require an always-running UI Server running under a specified account, which, depending on site specifics may require certain privileges to be effective. It will need to maintain a database, sqlite3 is more than sufficient.
Infrastructure aside the actual code component is pretty simple:
psutils
.Infrastructure wise:
The text was updated successfully, but these errors were encountered: