Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Support reporting Polars Dataframes #1356

Open
BlakeJC94 opened this issue Dec 2, 2024 · 1 comment
Open

[Feature request] Support reporting Polars Dataframes #1356

BlakeJC94 opened this issue Dec 2, 2024 · 1 comment

Comments

@BlakeJC94
Copy link

Proposal Summary

Explain your proposed feature.

Add support for Polars DataFrames in the following functions:

  • Logger.report_table
  • Model.report_table
  • Artifact.get

Motivation

Explain the use case that needs this feature

ClearML already has great support for reporting tables and saving artifacts from Pandas DataFrames. Lately, the Polars library has increased in popularity amongst a variety of data science teams (using a proxy metric: 30.9k stars on Github vs. 43.9k for Pandas)

Polars is designed to have a similar API to pandas, however there are a couple of subtle differences that clashes with the code in ClearML where pandas DataFrames are expected.

I'm happy to take a first pass at a PR to test this concept and to see how much work is needed to fully implement

Related Discussion

If this continues a slack thread, please provide a link to the original slack thread.

@BlakeJC94
Copy link
Author

BlakeJC94 commented Dec 2, 2024

Having a look at examples/reporting/pandas_reporting.py, we can make the following minimal changes:

diff --git a/pandas_reporting.py b/polars_reporting.py
index 3abe6c3..20ddb86 100644
--- a/pandas_reporting.py
+++ b/polars_reporting.py
@@ -1,5 +1,4 @@
-
-import pandas as pd
+import polars as pl
 
 from clearml import Task, Logger
 
@@ -14,16 +13,15 @@ def report_table(logger, iteration=0):
     # report tables
 
     # Report table - DataFrame with index
-    df = pd.DataFrame(
+    df = pl.DataFrame(
         {
+            "id": ["falcon", "dog", "spider", "fish"],
             "num_legs": [2, 4, 8, 0],
             "num_wings": [2, 0, 0, 0],
             "num_specimen_seen": [10, 2, 1, 8],
         },
-        index=["falcon", "dog", "spider", "fish"],
     )
-    df.index.name = "id"
-    logger.report_table("table pd", "PD with index", iteration=iteration, table_plot=df)
+    logger.report_table("table pl", "PL with index", iteration=iteration, table_plot=df)
 
     # Report table - CSV from path
     csv_url = "https://raw.githubusercontent.com/plotly/datasets/master/Mining-BTC-180.csv"

Running this as is on the latest version of ClearML results in the traceback

Traceback (most recent call last):
  File "/home/blake/Workspace/repos/scratchpad/clearml-polars/polars_reporting.py", line 64, in <module>
    main()
  File "/home/blake/Workspace/repos/scratchpad/clearml-polars/polars_reporting.py", line 53, in main
    report_table(logger)
  File "/home/blake/Workspace/repos/scratchpad/clearml-polars/polars_reporting.py", line 24, in report_table
    logger.report_table("table pl", "PL with index", iteration=iteration, table_plot=df)
  File "/home/blake/Workspace/repos/scratchpad/clearml-polars/.venv/lib/python3.11/site-packages/clearml/logger.py", line 412, in report_table
    reporter_table = table.fillna(str(np.nan))
                     ^^^^^^^^^^^^
AttributeError: 'DataFrame' object has no attribute 'fillna'. Did you mean: 'fill_nan'?

So a very subtle API difference

@BlakeJC94 BlakeJC94 changed the title Support reporting Polars Dataframes [Feature request] Support reporting Polars Dataframes Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant