Skip to content

Commit

Permalink
docs on tracing
Browse files Browse the repository at this point in the history
  • Loading branch information
jjallaire committed Dec 26, 2024
1 parent c2ea3e7 commit c53c67e
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 1 deletion.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## v0.3.54 (26 December 2024)

- [Action tracing](https://github.com/UKGovernmentBEIS/inspect_ai/pull/1038) for diagnosing runs with unterminated action (e.g. model calls, docker commands, etc.).
- [Tracing](https://inspect.ai-safety-institute.org.uk/tracing.html) for diagnosing runs with unterminated action (e.g. model calls, docker commands, etc.).
- Provide default timeout/retry for docker compose commands to mitigate unreliability in some configurations.
- Switch to sync S3 writes to overcome unreliability observed when using async interface.
- Task display: Added `--no-score-display` option to disable realtime scoring metrics.
Expand Down
1 change: 1 addition & 0 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ website:
- interactivity.qmd
- approval.qmd
- eval-logs.qmd
- tracing.qmd
- extensions.qmd


Expand Down
2 changes: 2 additions & 0 deletions docs/llms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,7 @@
- [Interactivity](https://inspect.ai-safety-institute.org.uk/interactivity.html.md): Covers various ways to introduce user interaction into the implementation of tasks (for example, confirming consequential actions or prompting the model dynamically based on the trajectory of the evaluation).
- [Approval](https://inspect.ai-safety-institute.org.uk/approval.html.md): Approvals enable you to create fine-grained policies for approving tool calls made by models.
- [Eval Logs](https://inspect.ai-safety-institute.org.uk/eval-logs.html.md): Explores how to get the most out of evaluation logs for developing, debugging, and analyzing evaluations.
- [Tracing](https://inspect.ai-safety-institute.org.uk/tracing.html.md): Describes advanced execution tracing tools used to diagnose runtime issues.
- [Extensions](https://inspect.ai-safety-institute.org.uk/extensions.html.md) describes the various ways you can extend Inspect, including adding support for new Model APIs, tool execution environments, and storage platforms (for datasets, prompts, and logs).


58 changes: 58 additions & 0 deletions docs/tracing.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
title: Tracing
---

## Overview

Inspect includes a runtime tracing tool that can be used to diagnose issues that aren't readily observable in eval logs and error messages. Trace logs are written in JSON Lines format and by default include log records from level `TRACE` and up (including `HTTP` and `INFO`).

Trace logs also do explicit enter and exit logging around actions that may encounter errors or fail to complete. For example:

1. Model API `generate()` calls
2. Call to `subprocess()` (e.g. tool calls that run commands in sandboxes)
3. Control commands sent to Docker Compose.
4. Writes to log files in remote storage (e.g. S3).
5. Model tool calls
6. Subtasks spawned by solvers.

Action logging enables you to observe execution times, errors, and commands that hang and cause evaluation tasks to not terminate. The [`inspect trace anomalies`](#anomalies) command enables you to easily scan trace logs for these conditions.

## Usage

Trace logging does not need to be explicitly enabled—logs for the last 10 top level evaluations (i.e. CLI commands or scripts that calls eval functions) are preserved and written to a data directory dedicated to trace logs. You can list the last 10 trace logs with the `inspect trace list` command:

``` bash
inspect trace list # --json for JSON output
```

Trace logs are written using [JSON Lines](https://jsonlines.org/) format and all but the most recent one are gzip compressed, so reading them requires some special handing. The `inspect trace dump` command encapsulates this and gives you a normal JSON array with the contents of the trace log:

``` bash
inspect trace dump trace.log
```

## Anomalies

If an evaluation is running and is not terminating, you can execute the following command to list instances of actions (e.g. model API generates, docker compose commands, tool calls, etc.) that are still running:

``` bash
inspect trace anomalies
```

You will first see currently running actions (useful mostly for a "live" evaluation). If you have already cancelled an evaluation you'll see a list of cancelled actions (with the most recently completed cancelled action on top) which will often also tell you which cancelled action was keeping an evaluation from completing.

Passing no arguments shows the most recent trace log, pass an explicit trace log name to view another log:

``` bash
inspect trace anomalies trace.log.1.gz
```

### Errors and Timeouts

By default, the `inspect trace anomalies` command prints only currently running or cancelled actions (as these are what is required to diagnose an evaluation that doesn't complete). You can optionally also display actions that ended with errors or timeouts by passing the `--all` flag:

``` bash
inspect trace anomalies --all
```

Note that errors and timeouts are not by themselves evidence of problems, since both occur in the normal course of running evaluations (e.g. model generate calls can return errors that are retried and Docker or S3 can also return retryable errors or timeout when they are under heavy load).

0 comments on commit c53c67e

Please sign in to comment.