Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Dashboard Launcher UI (To launch old dashboards) #1147

Open
wants to merge 582 commits into
base: main
Choose a base branch
from

Conversation

Nanthagopal-Eswaran
Copy link

Items to add to release announcement:

  • Heading: 🚀 TruLens Dashboard Launcher

It provides a simple UI, where user can select sqlite files from previous runs and launch the TruLens Dashboard. In this way, users don't have to worry about logging the results for later and just use the trulens provided stremlit UI any time.

🕹️ Usage
Run the following command:

poetry run trulens-eval-dashboard

Features

  1. Simple UI to quickly select different sqlite files and launch trulens dashboards.
    image

  2. Multiple dashboards can be viewed by giving different port numbers.

Open Multiple Dashboards

Currently, the tool can open only one dashboard at a time. To open multiple dashboards, quick work around is to launch this tool multiple times.

Other details that are good to know but need not be announced:

🌱 Improvements [For Future]

  1. Open multiple dashboards from single window.
  2. Allow sharing the streamlit dashboard with others using sharable link.
  3. Build an exe to avoid the need of installing python and poetry. Also to make the tool more portable.
  4. UI/UX Improvements

github-actions bot and others added 30 commits December 18, 2023 17:36
* bump versions in quickstarts

* bump version

* remove openai references in function definitions page

* gemini example

* headers

* second example: semantic evals

* updates, add rag triad

* update top header
…truera#696)

* add aliases for selectors for main method args and main method return

* break down

* refine

---------

Co-authored-by: Josh Reini <[email protected]>
* exposed AzureOpenAI provider

* added docs

* Update CONTRIBUTING.md

* typo in mkdocs.yml

---------

Co-authored-by: Josh Reini <[email protected]>
* import llama only if needed

* use optional imports instead

---------

Co-authored-by: Piotr Mardziel <[email protected]>
* fix

* typo

* don't print external if internal is available
Co-authored-by: Piotr Mardziel <[email protected]>
* adjust docstring for select_context

* langchain select_context, update quickstarts

* undo app name change

* remove dev cell

* generalized langchain select_context (truera#711)

* generalized langchain select_context

* typo

* typo in string

* update langchain example to pass app in select_context

---------

Co-authored-by: Josh Reini <[email protected]>

* comments, clarity updates to quickstarts

* add lib-independent select_context

* update lc li quickstarts

---------

Co-authored-by: Piotr Mardziel <[email protected]>
* add optional

* bug class_info fix
* update configs

* bugfix

* dont add class info to dicts
* Fix correctness prompt

Fixes truera#718

* Update base.py
* Bump suggested notebook versions

* Combine notebooks and py files

---------

Co-authored-by: Shayak Sen <[email protected]>
* Bump suggested notebook versions

* Combine notebooks and py files

* Update __init__.py

---------

Co-authored-by: Shayak Sen <[email protected]>
* debug

* display python version

* python version

* PromptTemplate update import

* bad escape fix

* add msg to exception

* pass kwargs in Groundedness

* pass kwargs with GroundTruthAgreement

* give default value to ground_truth_imp

* migrate db on reset
* fix example notebook

* fixes

* remove commented out
* always use prompt instead of messages

* use messages in base

* use prompt in bedrock

* move score to top of cot template, request entire template be used

* remove dev

* add TODO
* update langchain instrumentation page

* include instrumented methods

* llama-index instrumentation updates

* update the overview

* change path to instrumentation overview

* add some more info in appendices and line space

---------

Co-authored-by: Piotr Mardziel <[email protected]>
* add instructions and text wrapping

* format

* clean up github scripts and update README sources

* typo

---------

Co-authored-by: Josh Reini <[email protected]>
* fix

* remove redundant

---------

Co-authored-by: Josh Reini <[email protected]>
* add instructions and text wrapping

* format

* debugging

* making obj arg no longer required

* remove obj and add documentation for WithClassInfo

* remove IPython from most notebooks and organize imports

* fix test errors

* forgot warning

---------

Co-authored-by: Josh Reini <[email protected]>
* update notebooks to test

* rehack

* update langchain requirement

* add core lowerbound
joshreini1 and others added 21 commits May 9, 2024 08:56
* fix rag triad and awaitable calls

* remove locals printout in awaitables message

* update __getattr__ in select_context (truera#1119)

---------

Co-authored-by: Piotr Mardziel <[email protected]>
* Update feedback.py

* use name
retreivers -> retrievers
* unify groundedness start

* remove groundedness.py

* groundedness nli moves

* remove custom aggregator

* groundedness aggregator to user code

* move agg to trulens side by default (groundedness)

* remove extra code

* remove hf key setting

* remove hf import

* add comment about aggregation for context relevance

* update init

* remove unneeded import

* use generate_score_and_reasons for groundedness internally

* f-strings for groundedness prompts

* docstring

* docstrings formatting

* groundedness reasons template

* remove redundant prompt

* update quickstarts

* llama-index notebooks

* rag triad helper update

* oai assistant nb

* update readme

* models notebooks updates

* iterate nbs

* mongo, pinecone nbs

* update huggingface docstring

* remove outdated docstring selector notes

* more docstring cleaning
* open ai streaming adjustments in cost tracking

* notes

* delete outputs
* Update selecting_components.md

* Update MultiQueryRetrievalLangchain.ipynb

* Update random_evaluation.ipynb

* Update canopy_quickstart.ipynb
* update comprehensiveness + nb

* nb expansion

* fix typo

* meetingbank transcript data

* oss models in app

* test

* benchmarking gpt-3.5-turbo, gpt-4-turbo, and gpt-4o

* update path

* comprehensiveness benchmark

* updated summarization_eval nb

* fix normalization

* show improvement in comprehensiveness feedback functions

---------

Co-authored-by: Daniel <[email protected]>
* version bump

* simpler lc quickstart

* update installs and imports

* update langchain instrumentation docs

* remove groundedness ref from providers.md

* build docs fixes

* remove key cell

* fix docs build

* firx formatting for stock.md

* remove extra spaces

* undo format change

* update docstrings for hugs and base provider

* openai docstring updates

* hugs docstring update

* update context relevance hugs docstring

* more docstring updates

* remove can be changed messages from openai provider docstrings
* add to glossary

* finish some terms
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 21, 2024
@Nanthagopal-Eswaran
Copy link
Author

We have been internally using this tool since we need to open multiple dashboards side by side and compare. Thought this might be a common need for all trulens users. So sharing it here.

I know this might not be very attractive UI but the need is true.

@joshreini1
Copy link
Contributor

@Nanthagopal-Eswaran would love to understand the need more here. Why do you need to compare multiple dashboards rather than logging the different apps to the same sqlite db and thus comparing the apps in the same dashboard?

@Nanthagopal-Eswaran
Copy link
Author

Nanthagopal-Eswaran commented May 22, 2024

@Nanthagopal-Eswaran would love to understand the need more here. Why do you need to compare multiple dashboards rather than logging the different apps to the same sqlite db and thus comparing the apps in the same dashboard?

Hi @joshreini1,

There are two main reasons,

Evaluating the changes in different versions of app
I get that we can use the same db and add. But this is an ideal case right. What if we want to compare the tests executed by different engineers / teams or if we want to automate these tests through github actions or azdo pipelines and share the db alone through mail.

Sharing the report / Re-open the report
Since it is a streamlit UI, we always need to use small snippet of python code to load previous results. I initially thought of an exe which I can share with my stakeholders, so that, they don't have to install python and dependencies to open the reports. Instead they can launch this tool and open the report.

@joshreini1
Copy link
Contributor

Thanks @Nanthagopal-Eswaran - I hope you don't mind if I drill down a bit more :)

Evaluating the changes in different versions of app

I get that we can use the same db and add. But this is an ideal case right. What if we want to compare the tests executed by different engineers / teams or if we want to automate these tests through github actions or azdo pipelines and share the db alone through mail.

Comparing tests executed by different engineers/teams/automation would be better supported by using a shared database(s) to store the results than tracking a bunch of different sqlite dbs. Adding a shared database for TruLens to log to only requires passing a database URL compatible with SQLAlchemy (docs). Does this seem reasonable?

Sharing the report / Re-open the report
Since it is a streamlit UI, we always need to use small snippet of python code to load previous results. I initially thought of an exe which I can share with my stakeholders, so that, they don't have to install python and dependencies to open the reports. Instead they can launch this tool and open the report.

This use case seems reasonable, however I'm not sure it makes sense to support it directly in the package; it might be better as an internal tool. I would suggest that hosting the dashboard might be easier here (for you and the stakeholder), via an ec2 or similar.

@Nanthagopal-Eswaran
Copy link
Author

Nanthagopal-Eswaran commented May 22, 2024

@joshreini1, Thanks for your insights.

For first point, I get your point. We have to try how practical it is though. I am more worried about the data getting lost / corrupted by mistake and we have to spend huge cost to regenerate all the previous reports as they are in single location now. FYR, it takes more than $40 to execute one test run in our case.

And for the second point, yes it would be good to have this as a separate tool. Feel free to skip this PR. But please have an internal discussion and add this as a separate repo if really needed.

But this conversation actually made me realize the main problem here.
you can see the problem with reports being streamlit apps right?.
As a developer, I can use this tool to view previous results whenever I want. But to Stakeholders, they might not need full details. Is there a way to export the report as standalone html file (similar to pytest-html reports)? it doesn't have to have lot of features and details, atleast the leaderboard alone. This would also be helpful if we want to have automated tests and send the leaderboard alone in automated mail.

I quickly went through streamlit repo and found this issue - This clearly shows the important of standalone html reports - streamlit/streamlit#611

@Nanthagopal-Eswaran Nanthagopal-Eswaran changed the title Simple Dashboard Launcher UI (To view launch old dashboards) Simple Dashboard Launcher UI (To launch old dashboards) May 22, 2024
@joshreini1
Copy link
Contributor

Thanks @Nanthagopal-Eswaran - definitely understand and agree with your points on the importance of standalone reports. I'll continue to discuss with the team and get back to you once we've got a plan here.

BTW - one additional workaround might be to use tru.get_leaderboard() in a notebook and export that to HTML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.