Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider whither using Apache Arrow intermediate representation #12

Open
syucream opened this issue Apr 19, 2020 · 2 comments
Open

Consider whither using Apache Arrow intermediate representation #12

syucream opened this issue Apr 19, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@syucream
Copy link
Contributor

Columnify uses Apache Arrow Schema/Record as an intermediate representation between various input formant and output ( currently only parquet ). It's powerful, fast memory accesses, supports columnar like representation. But Go implementation is not perfect yet e.g. Arrow record type doesn't support some types on its sub fields so it's not still applicable for Columnify. Additionally Arrow Go implementation doesn't support rich data conversion like PyArrow. Finally it's using "only Arrow Schema" as a necessary intermediate data now.

So we have some options to tackle this problems like:

  • Remove Arrow dependency. It's unnecessary now and reducing dependencies make clear maintainancability of this product. Arrow Schema type is replacable with Avro Schema or others.
  • Improve Arrow! It's an OSS and we probably have various chances to contribute to Go Arrow implementation.
  • Just keep current Columnify implementation ant watch activities on Arrow community.

As a tirivial topic, gocredits doesn't work on Go Arrow dependency. #4

@t2y t2y added the enhancement New feature or request label Apr 21, 2020
@syucream
Copy link
Contributor Author

syucream commented Jul 9, 2020

Arrow intermediate records should be memory efficient, will mitigate memory usage! #44

@syucream
Copy link
Contributor Author

syucream commented Jul 9, 2020

And also it can validate input data by given schema #27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants