Skip to content

Latest commit

 

History

History
329 lines (252 loc) · 16.2 KB

README.md

File metadata and controls

329 lines (252 loc) · 16.2 KB

ZenMonitor

CI Hex.pm Version Hex.pm License HexDocs

ZenMonitor allows for the efficient monitoring of remote processes with minimal use of ERTS Distribution.

Installation

Add ZenMonitor to your dependencies

def deps do
  [
    {:zen_monitor, "~> 2.1.0"}
  ]
end

Using ZenMonitor

ZenMonitor strives to be a drop-in replacement for Process.monitor/1. To those ends, the programming interface and all the complexities of how it carries out its task are simplified by a simple unified programming interface. All the functions that the caller needs to use have convenient delegates available in the top-level ZenMonitor module. The interface is detailed below.

ZenMonitor.monitor/1

This is a drop-in replacement for Process.monitor/1 when it comes to processes. It is compatible with the various ways that Process.monitor/1 can establish monitors and will accept one of a pid, a name which is the atom that a local process is registered under, or a tuple of {name, node} for a registered process on a remote node. These are defined as the ZenMonitor.destination type.

ZenMonitor.monitor/1 returns a standard reference that can be used to demonitor and can be matched against the reference provided in the :DOWN message.

Similar to Process.monitor/1, the caller is allowed to monitor the same process multiple times, each monitor will be provided with a unique reference and all monitors will fire :DOWN messages when the monitored process goes down. Even though the caller can establish multiple monitors, ZenMonitor is designed to handle this efficiently, the only cost is an additional ETS row on the local node and additional processing time at fan-out.

ZenMonitor.demonitor/2

This is a mostly drop-in replacement for Process.demonitor/2 when it comes to processes. The first argument is the reference returned by ZenMonitor.monitor/1. It accepts a list of option atoms, but only honors the :flush option at this time. Passing the :info option is allowed but has no effect, this function always returns true.

ZenMonitor.compatibility/1

When operating in a mixed environment where some nodes are ZenMonitor compatible and some are not, it may be necessary to check the compatibility of a remote node. ZenMonitor.compatibility/1 accepts any ZenMonitor.destination and will report back one of :compatible or :incompatible for the remote's cached compatibility status.

All remotes start off as :incompatible until a positively acknowledged connection is established. See the ZenMonitor.connect/1 function for more information on connecting nodes.

ZenMonitor.compatibility_for_node/1

Performs the same operation as ZenMonitor.compatibility/1 but it accepts a node atom instead of a ZenMonitor.destination.

ZenMonitor.connect/1

Attempts a positive connection with the provided remote node. Connections are established by using the @gen_module's call/4 method to send a :ping message to the process registered under the atom ZenMonitor.Proxy on the remote. If this process responds with a :pong atom then the connection is positively established and the node is marked as :compatible. Any other response or error condition (timeout / noproc / etc) will be considered negative acknowledgement.

ZenMonitor.connect/1 is actually a delegate for ZenMonitor.Local.Connector.connect/1 see the documentation there for more information about how connect behaves.

Handling Down Messages

Any :DOWN message receivers (most commonly GenServer.handle_info/2 callbacks) that match on the reason should be updated to include an outer {:zen_monitor, original_match} wrapper.

def handle_info({:DOWN, ref, :process, pid, :specific_reason}, state) do
  ...
end

Should be updated to the following.

def handle_info({:DOWN, ref, :process, pid, {:zen_monitor, :specific_reason}}, state) do
  ...
end

Why?

ZenMonitor was developed at Discord to improve the stability of our real-time communications infrastructure. ZenMonitor improves stability in a couple of different ways.

Traffic Calming

When a process is being monitored by a large number of remote processes, that process going down can cause both the node hosting the downed process and the node hosting the monitoring processes to be suddenly flooded with an large amount of work. This is commonly referred to as a thundering herd and can overwhelm either node depending on the situation.

ZenMonitor relies on interval batching and GenStage to help calm the deluge into a throttled stream of :DOWN messages that may take more wall clock time to process but has more predictable scheduler utilization and network consumption.

Message Interspersing

In the inverse scenario, a single process monitoring a large number of remote processes, a systemic failure of a large number of monitored processes can result in blocking the message queue. This can cause other messages being sent to the process to backup behind the :DOWN messages.

Here's what a message queue might look like if 100,000 monitors fired due to node failure.

+------------------------------------------------+
|    {:DOWN, ref, :process, pid_1, :nodedown}    |
+------------------------------------------------+
|    {:DOWN, ref, :process, pid_2, :nodedown}    |
+------------------------------------------------+
...             snip 99,996 messages           ...
+------------------------------------------------+
| {:DOWN, ref, :process, pid_99_999, :nodedown}  |
+------------------------------------------------+
| {:DOWN, ref, :process, pid_100_000, :nodedown} |
+------------------------------------------------+
|                     :work                      |
+------------------------------------------------+
|                     :work                      |
+------------------------------------------------+
|                     :work                      |
+------------------------------------------------+
...                    etc                     ...

The process has to process the 100,000 :DOWN messages before it can get back to doing work, if the processing of a :DOWN message is non-trivial then this could result in the process effectively appearing unresponsive to callers expecting it to do :work.

ZenMonitor.Local.Dispatcher provides a configurable batch sweeping system that dispatches a fixed demand_amount of :DOWN messages every demand_interval (See the documentation for ZenMonitor.Local.Dispatcher for configuration and defaults). Using ZenMonitor the message queue would look like this.

+------------------------------------------------+
|    {:DOWN, ref, :process, pid_1, :nodedown}    |
+------------------------------------------------+
...             snip 4,998 messages           ...
+------------------------------------------------+
|  {:DOWN, ref, :process, pid_5000, :nodedown}   |
+------------------------------------------------+
|                     :work                      |
+------------------------------------------------+
...    snip messages during demand_interval    ...
+------------------------------------------------+
|                     :work                      |
+------------------------------------------------+
|  {:DOWN, ref, :process, pid_5001, :nodedown}   |
+------------------------------------------------+
...             snip 4,998 messages           ...
+------------------------------------------------+
| {:DOWN, ref, :process, pid_10_000, :nodedown}  |
+------------------------------------------------+
|                     :work                      |
+------------------------------------------------+
...    snip messages during demand_interval    ...
+------------------------------------------------+
|                     :work                      |
+------------------------------------------------+
...                    etc                     ...

This means that the process can continue processing work messages while working through more manageable batches of :DOWN messages, this improves the effective responsiveness of the process.

Message Truncation

:DOWN messages include a reason field that can include large stack traces and GenServer state dumps. Large reasons generally don't pose an issue, but in a scenario where thousands of processes are monitoring a process that generates a large reason the cumulative effect of duplicating the large reason to each monitoring process can consume all available memory on a node.

When a :DOWN message is received for dispatch to remote subscribers, the first step is to truncate the message using ZenMonitor.Truncator, see the module documentation for more information about how truncation is performed and what configuration options are supported.

This prevents the scenario where a single process with a large stack trace or large state gets amplified on the receiving node and consumes an large amount of memory.

Design

ZenMonitor is constructed of two cooperating systems, the Local ZenMonitor System and the Proxy ZenMonitor System. When a process wishes to monitor a remote process, it should inform the Local ZenMonitor System which will efficiently dispatch the monitoring request to the remote node's Proxy ZenMonitor System.

Local ZenMonitor System

The Local ZenMonitor System is composed of a few processes, these are managed by the ZenMonitor.Local.Supervisor. The processes that comprise the Local ZenMonitor System are described in detail in the following section.

ZenMonitor.Local

ZenMonitor.Local is responsible for accepting monitoring and demonitoring requests from local processes. It will send these requests to the Connector processes for efficient transmission to the responsible ZenMonitor.Proxy processes.

When a monitored process dies, the ZenMonitor.Proxy will send this information in a summary message to the ZenMonitor.Local.Connector process which will use the send down_dispatches to ZenMonitor.Local for eventual delivery by the ZenMonitor.Local.Dispatcher.

ZenMonitor.Local is also responsible for monitoring the local interested process and performing clean-up if the local interested process crashes for any reason, this prevents the Local ZenMonitor System from leaking memory.

ZenMonitor.Local.Tables

This is a simple process that is responsible for owning shared ETS tables used by various parts of the Local ZenMonitor System.

It maintains two tables, ZenMonitor.Local.Tables.Nodes and ZenMonitor.Local.Tables.References these tables are public and are normally written to and read from by the ZenMonitor.Local and ZenMonitor.Local.Connector processes.

ZenMonitor.Local.Connector

ZenMonitor.Local.Connector is responsible for batching monitoring requests into summary requests for the remote ZenMonitor.Proxy. The Connector handles the actual distribution connection to the remote ZenMonitor.Proxy including dealing with incompatible and down nodes.

When processes go down on the remote node, the Proxy ZenMonitor System will report summaries of these down processes to the corresponding ZenMonitor.Local.Connector.

There will be one ZenMonitor.Local.Connector per remote node with monitored processes.

ZenMonitor.Local.Dispatcher

When a remote node or remote processes fail, messages will be enqueued for delivery. The ZenMonitor.Local.Dispatcher is responsible for processing these enqueued messages at a steady and controlled rate.

Proxy ZenMonitor System

The Proxy ZenMonitor System is composed of a few processes, these are managed by the ZenMonitor.Proxy.Supervisor. The processes that comprise the Proxy ZenMonitor System are described in detail in the following section.

ZenMonitor.Proxy

ZenMonitor.Proxy is responsible for handling subscription requests from the Local ZenMonitor System and for maintaining the ERTS Process Monitors on the processes local to the remote node.

ZenMonitor.Proxy is designed to be efficient with local monitors and will guarantee that for any local process there is, at most, one ERTS monitor no matter the number remote processes and remote nodes are interested in monitoring that process.

When a local process goes down ZenMonitor.Proxy will enqueue a new death certificate to the ZenMonitor.Proxy.Batcher processes that correspond to the interested remotes.

ZenMonitor.Proxy.Tables

This is a simple process that is responsible for owning shared ETS tables used by various parts of the Proxy ZenMonitor System.

It maintains a single table, ZenMonitor.Proxy.Tables.Subscribers. This table is used by both the ZenMonitor.Proxy and ZenMonitor.Proxy.Batcher processes.

ZenMonitor.Proxy.Batcher

This process has two primary responsibilities, collecting and summarizing death certificates and monitoring the remote process.

For every remote ZenMonitor.Local.Connector that is interested in monitoring processes on this node, a corresponding ZenMonitor.Proxy.Batcher is spawned that will collect and ultimately deliver death certificates. The ZenMonitor.Proxy.Batcher will also monitor the remote ZenMonitor.Local.Connector and clean up after it if it goes down for any reason.

Running a Compatible Node

ZenMonitor ships with an Application, ZenMonitor.Application which will start the overall supervisor, ZenMonitor.Supervisor. This creates a supervision tree as outlined below.

                                                                            -------------------------
                                                                      +----| ZenMonitor.Local.Tables |
                                                                      |     -------------------------
                                                                      |
                                                                      |     ------------------
                                                                      +----| ZenMontior.Local |
                                    -----------------------------     |     ------------------
                              +----| ZenMonitor.Local.Supervisor |----|
                              |     -----------------------------     |     -------------       ----------------------------
                              |                                       +----| GenRegistry |--N--| ZenMonitor.Local.Connector |
                              |                                       |     -------------       ----------------------------
                              |                                       |
                              |                                       |     -----------------------------
                              |                                       +----| ZenMonitor.Local.Dispatcher |
                              |                                             -----------------------------
  -----------------------     |
 | ZenMonitor.Supervisor |----|
  -----------------------     |                                             -------------------------
                              |                                       +----| ZenMonitor.Proxy.Tables |
                              |                                       |     -------------------------
                              |                                       |
                              |     -----------------------------     |     ------------------
                              +----| ZenMonitor.Proxy.Supervisor |----+----| ZenMonitor.Proxy |
                                    -----------------------------     |     ------------------
                                                                      |
                                                                      |     -------------       --------------------------
                                                                      +----| GenRegistry |--M--| ZenMonitor.Proxy.Batcher |
                                                                            -------------       --------------------------