Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter values as Apache Arrow objects #353

Open
soininen opened this issue Feb 8, 2024 · 4 comments
Open

Parameter values as Apache Arrow objects #353

soininen opened this issue Feb 8, 2024 · 4 comments
Assignees

Comments

@soininen
Copy link
Collaborator

soininen commented Feb 8, 2024

(This issue is not about the binary blobs we have in the 'value' column of 'parameter_value' table in Spine database scheme.)

We have been discussing using Apache Arrow as an alternative for the data structures in parameter_value module. To get things rolling, I though I could get my hands dirty with Arrow by implementing an equivalent to parameter_value module which deals with Arrow tables instead of TimeSeries, Maps and whatnot. Initially, this will be more like a technological demo or proof-of-concept. Also, I am not planning to replace parameter_value, rather provide an alternative interface for parsing parameter values.

@soininen soininen added the 0.8 label Feb 8, 2024
@soininen soininen self-assigned this Feb 8, 2024
@manuelma
Copy link
Collaborator

manuelma commented Feb 8, 2024

Very good, I'd be looking forward to see the results! I understand you plan to keep the 'public' API from spinedb_api.parameter_value but just change the internals, right?

@soininen
Copy link
Collaborator Author

soininen commented Feb 8, 2024

I understand you plan to keep the 'public' API from spinedb_api.parameter_value but just change the internals, right?

I am not planning to change parameter_value at all but add a new module next to it. I think we should leave parameter_value as-is for backwards compatibility if we ever make the full switch to Arrow.

The new module (spinedb_api.arrow?) should emulate the interface of parameter_value. I guess the most important functions would be from_database() which returns an Arrow object and to_database() which converts an Arrow object to a binary blob.

@manuelma
Copy link
Collaborator

manuelma commented Feb 8, 2024

Sounds good! But ParameterValue and its subclasses are also 'public' - do you think it's possible to implement them in arrow?

@soininen
Copy link
Collaborator Author

soininen commented Feb 8, 2024

But ParameterValue and its subclasses are also 'public' - do you think it's possible to implement them in arrow?

In fact, I am going to drop ParameterValue and just use the Arrow data types. I see no benefit in wrapping working data types in interfaces that do not offer any real improvements and can be considered niche. Client code can then work directly with standard Arrow API without the need to convert to/from ParameterValue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants