Dynamical schema #465
-
Question about panderaHi, I have a question. I have methods that accept a dataframe and column names related to specific task (like target column name for forecasting, index column name for time indexing etc.) so I have n predefined column names and column names are dynamical and argument of methods. How can I use a validator for such a case? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 2 replies
-
hey @quancore this is a fun question! I've had this use case several times and here are the two solutions I've implemented in the past:
import pandas as pd
import pandera as pa
schema = pa.DataFrameSchema({
"target": pa.Column(int),
"^feature_": pa.Column(int, pa.Check.gt(0), regex=True),
}, index=pa.Index(int, name="time"))
df = pd.DataFrame({
"target": [1, 2, 3],
"feature_1": [3, -1, 5],
"feature_2": [3, 4, 5],
"feature_3": [3, 4, 5],
})
schema(df) # this should fail due to -1 in "feature_1"
base_schema = pa.DataFrameSchema({"target": pa.Column(int)}, index=pa.Index(int, name="time"))
# inputs should be sufficient to derive your finished schema.
# this is a simple example of creating n columns
def create_schema(n):
return base_schema.add_columns({
f"feature_{i}": Column(...)
for i in range(n)
})
class MyClass:
def method(self, df, n):
out_df = df # do stuff with df
schema = create_schema(n)
return schema(out_df) I'd recommend solution (1) for conciseness, but (2) has the benefit of giving you a fixed, fully-specified schema if you care about that. Let me know if this covers your use case! |
Beta Was this translation helpful? Give feedback.
-
Thank you for the fast answer. Unfortunately, my feature names as no common pattern so case 1 should not work. For case 2, I have also runtime-defined no-pattern index names. Is there any functionality like add_index etc? |
Beta Was this translation helpful? Give feedback.
-
hey @quancore, yes there is the set_index method which is similar in behavior to pandas.DataFrame.set_index, so you could set |
Beta Was this translation helpful? Give feedback.
-
also @quancore would you mind if I converted this into a discussion? It may be easier to find in the future |
Beta Was this translation helpful? Give feedback.
-
Yes of course. |
Beta Was this translation helpful? Give feedback.
hey @quancore this is a fun question! I've had this use case several times and here are the two solutions I've implemented in the past:
Column(..., regex=True)