Add HTML output support to web browser tool #959

farrelmahaztra · 2024-12-07T20:45:47Z

This PR contains:

What is the current behavior? (You can also link to an open issue here)

Output from the web browser tool is currently always in the form of accessibility trees. This is great for readability and minimizing token count but excludes important context from websites that aren't a11y-friendly.

For instance, I suspect issues like this #636 arise from elements that aren't actually e.g. <input type="checkbox"> (since attributes like checked seem to already be surfaced in the ATs) but rather things like divs styled to look like checkboxes without the appropriate ARIA attributes. (Not sure if that specific issue is because of that though)

What is the new behavior?

You can now pass an output format into the web browser tool, like:

 solver=[
            use_tools(web_browser(output_format=CrawlerOutputFormat.HTML)),
            generate(),
        ]

Which will output HTML instead:

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

I don't believe so, as the output format still defaults to accessibility trees.

Other information:

This is my first contribution here so do let me know if I've missed something obvious. I'm also on the Inspect Slack with the same name.

jjallaire · 2024-12-08T21:16:53Z

Hi @farrelmahaztra, thanks for submitting this. If we do go ahead with it I'll leave some comments on the review but before we get to that I'd like to check with @MariaIzobava for her take on this.

One concern I have is that context-windows are limited, and HTML web pages can be substantially larger than accessibility trees. We also limit tool output to 16k which it seems like many pages could overflow? Depending on how complex the HTML is models also might have a much harder time understanding/navigating compared to accessibility trees. @MariaIzobava just wondering if you have in-house experience with this you'd want to share. I don't want to add a feature without a strong use case and I certainly don't want to add one that is a footgun!

@farrelmahaztra I'll do a review of the code separately but let's hold off on addressing until we have the discussion here about whether to proceed.

jjallaire

A few comments on the implementation, but the bigger picture is that I'm not entirely sold on whether we should do this. HTML will be harder for humans to do approvals on, spill more distracting stuff into the context window, and in many cases be less semantically obvious to the model in terms of understanding the page content.

I wonder if there are really two different scenarios: navigating the web for semantic content and navigating more interactive web experiences. I wonder for the latter if images might be better than HTML? While images would also put pressure on context windows, some adaptations could be made to be smarter about clipping all but the most recent images (that's what we've started working towards for desktop evals).

Appreciate the PR and we certainly might be convinced to take it, but I'd like to get input from @MariaIzobava and others first as they certainly have more experience than I in this domain!

jjallaire · 2024-12-08T21:18:03Z

src/inspect_ai/tool/_tools/_web_browser/_web_browser.py

@@ -11,43 +12,59 @@
 from inspect_ai.util._store import store


-def web_browser(interactive: bool = True) -> list[Tool]:


For this sort of thing our style is to use typed Literal, so something like format: Literal["at", "html"]

jjallaire · 2024-12-08T21:22:08Z

src/inspect_ai/tool/_tools/_web_browser/_web_browser.py

@@ -57,7 +74,9 @@ def web_browser_go() -> Tool:
    async def execute(url: str) -> str:
        """Navigate the web browser to a URL.

-        Once you have navigated to a page, you will be presented with a web accessibilty tree of the elements on the page. Each element has an ID, which is displayed in brackets at the beginning of its line. For example:
+        Once you have navigated to a page, you will be presented with HTML or a web accessibility tree of the elements on the page.


Here we would need to do dynamic prompting based on the output format chosen (note that the API above also implies that DOM and Pixels are supported -- not sure if they are I'd lean against for now -- but we'd need to prompt for them as well if they were in play).

Note that dynamic prompting for interactive vs not interactive is done here:

inspect_ai/src/inspect_ai/tool/_tools/_web_browser/_web_browser.py

Line 43 in cd99abb

# start with go tool (excluding interactive docs if necessary)

. We'd need to do something similar for the output format. At some point it will become easier to just use ToolDefs for this: https://inspect.ai-safety-institute.org.uk/tools.html#sec-dynamic-tools

jjallaire · 2024-12-08T21:24:32Z

src/inspect_ai/tool/_tools/_web_browser/_web_browser.py

-def web_at_viewer(call: ToolCall) -> ToolCallView:
-    # get the web accessiblity tree, if we have it create a view from it
+def web_viewer(call: ToolCall) -> ToolCallView:
+    web_html = store().get(WEB_BROWSER_HTML, "")


Another reason I don't love HTML here is that we end up resenting the entire page, which will get truncated and make it very difficult for human reviewers to see what the model is doing (note below we can vector in on exactly what is being clicked and show a coherent snippet around it). It's quite important to have human oversight for browser tools and accessibility trees seem much better suited to this.

jjallaire · 2024-12-08T21:25:51Z

src/inspect_ai/tool/_tools/_web_browser/_web_browser.py

+                store().set(WEB_BROWSER_AT, web_at)
+                return web_at
+            else:
+                raise ValueError(f"Unknown output format: {output_format}")


Note that enum up above implies support for DOM and Pixels but if the user tries to use these they'll get an error here (not saying we should support them here, just supporting the argument to narrow that enum).

jjallaire · 2024-12-08T21:28:37Z

src/inspect_ai/tool/_tools/_web_browser/_resources/playwright_crawler.py

@@ -285,10 +285,12 @@ def render(self, output_format: CrawlerOutputFormat) -> Any:
          the currently active webpage rendered using given format.
        """
        match output_format:
+            case CrawlerOutputFormat.HTML:
+                return self._page.content()


One other thought, we limit tool output to 16kb to go easy on context windows (if you have tool calls that frequently exceed this you will both fill up the context window and sometimes confuse the model with spurious content). One reason to favor accessibility trees is that they are much smaller and semantically clear so better as model input.

Add HTML output support to web browser tool

cd99abb

jjallaire reviewed Dec 8, 2024

View reviewed changes

farrelmahaztra added 2 commits December 16, 2024 05:42

Merge branch 'main' into feature/add-html-output

fbe3567

Remove enum, use Literal

8801902

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HTML output support to web browser tool #959

Add HTML output support to web browser tool #959

farrelmahaztra commented Dec 7, 2024

jjallaire commented Dec 8, 2024

jjallaire left a comment

jjallaire Dec 8, 2024

jjallaire Dec 8, 2024

jjallaire Dec 8, 2024

jjallaire Dec 8, 2024

jjallaire Dec 8, 2024

		@@ -11,43 +12,59 @@
		from inspect_ai.util._store import store


		def web_browser(interactive: bool = True) -> list[Tool]:

Add HTML output support to web browser tool #959

Are you sure you want to change the base?

Add HTML output support to web browser tool #959

Conversation

farrelmahaztra commented Dec 7, 2024

This PR contains:

What is the current behavior? (You can also link to an open issue here)

What is the new behavior?

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

Other information:

jjallaire commented Dec 8, 2024

jjallaire left a comment

Choose a reason for hiding this comment

jjallaire Dec 8, 2024

Choose a reason for hiding this comment

jjallaire Dec 8, 2024

Choose a reason for hiding this comment

jjallaire Dec 8, 2024

Choose a reason for hiding this comment

jjallaire Dec 8, 2024

Choose a reason for hiding this comment

jjallaire Dec 8, 2024

Choose a reason for hiding this comment