-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RegEx expressions are un-evaluatable due to missing (pc)re
imports
#421
Comments
Right now, what happens? Does your code raise some sort of exception? Looking into (J)PMML source code, then there doesn't seem to be any restrictions to RegEx pattern specification - it can be a string literal or a string variable (feature). |
The conversion to PMML definitely works: from pandas import DataFrame
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.preprocessing import ExpressionTransformer
X = DataFrame([["Hello World", "Hello"], ["One Two Three", "Zero"], ["alpha omega", "beta"]], columns = ["sentence", "word"])
print(X)
#transformer = ExpressionTransformer("X[0] + X[1]")
transformer = ExpressionTransformer("re.search(X[1], X[0])")
transformer.n_features_in_ = 2
sklearn2pmml(transformer, "Expression.pmml") However, there seems to be an imports issue on the Python side: import re
Xt = transformer.transform(X)
print(Xt) The above raises an import error regarding the "re" module:
|
(pc)re
imports
Right now, Python expressions are evaluated in an environment that imports To make the current example work, it would need to include the Unfortunately, there is no way to override/customize the module list from the |
Maybe RegEx is overkill for what I am trying to do. It would be enough to have some sort of |
RegEx is definitely expensive, but it is the only tool for custom string manipulation in PMML.
There is no built-in method for "find index of substring in string" in PMML (analogous to Python's Anyway, if string (pre-)processing functionality is available in the form of standalone Python and Java libraries, then it's possible to use the UDF approach - the JPMML-Evaluator library can invoke a 3rd party Java library function. But the UDF approach loses portability, and is better to be avoided if RegExes will do. |
OK, I see.
Sounds good but how would I achieve this? |
See https://github.com/jpmml/sklearn2pmml/blob/0.108.0/sklearn2pmml/preprocessing/__init__.py#L250 Replace this: expr_func = to_expr_func(self.expr) With this: expr_func = to_expr_func(self.expr, modules = ["math", "re"]) The long-term solution is to make the list of modules easily customizable via an ExpressionTransformer constructor parameter. |
For simple It doesn't suffer from missing imports, because it has everything hard-coded. |
Hello Villu,
I am coming back with a topic that we have already discussed here.
Was wondering if meanwhile there is a possibility to match one string with another dynamically, e.g.
Maybe you remember that I wanted to create a binary feature reflecting whether or not the name could be found within the email address.
Thanks in advance!
The text was updated successfully, but these errors were encountered: