-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PMML pipeline not working as expected after version upgrade #434
Comments
Your issue's description matches changes that happened in the 0.103.2 version (pay attention to "Breaking changes"): So, for starters, you can upgrade to the 0.103.1 version.
I would expect the
In your final pipeline, why do you use The goial (of fixing this issue) should be to make the above statement hold true. That is' there is no need for an explicit Numpy to Pandas data container conversion operation. |
In other words, what's the Python error if you omit these meta-transformers? Is there something wrong in the interaction between (the last-) If it's anything replated to the size/shape of Numpy arrays, then you can addess those using the |
Hi vruusmann, thanks for your quick responses!
On my end, it seems like indeed the changes in 0.103.2 caused problems for my pipeline. I see that the
However, the problem here is that inclusion of this
There seems to be going something wrong in the conversion from Series to numpy within the transformer.
The Ah nice find! You are indeed correct about the |
The simplest way to perform "from Pandas to Numpy" data container conversion is using numpyfier = ColumnTransformer([], remainder = "passthrough")
# THIS!
numpyfier.set_output(transform = "default")
Looks like It calls Numpy string utility functions, without first verifying that the |
Ah alright, that makes sense. I tried to make it work but I think I'm doing something wrong. I tried the following:
where
In that case I get the following error message while calling Am I using your suggestion in the wrong way? |
It seems like I have found a solution by adding brackets around the input column. So, i replace:
By:
|
Hello,
After upgrading to the latest version (
0.110.0
) my pipeline isn't working as expected anymore. The version that I was using previously, which was working fine, is version0.95.1
.The situation is as follows. I am creating a pipeline to prepare the data, to later on train a classification model. The relevant part here is:
The following worked fine under version
0.95.1
:Under version
0.110.0
, this doesn't work anymore. I get the following error message:It seems like it couldn't properly handle the dataframe column input in the StringNormalizer anymore, since the datatype keeps sticking to
Object
as the string column containsNone
values. A rather ugly solution that seems to be working for me is as follows, where I'm making use ofDataFrameMapper
fromsklearn-pandas
:Although this seems to be working, I would rather have cleaner code, without the dependency on the
DataFrameMapper
. Do you have any suggestions on how to improve this? Thanks!The text was updated successfully, but these errors were encountered: