Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to implement read_json function #554

Closed
modin-bot opened this issue Apr 15, 2019 · 4 comments
Closed

Request to implement read_json function #554

modin-bot opened this issue Apr 15, 2019 · 4 comments
Labels
new feature/request 💬 Requests and pull requests for new features

Comments

@modin-bot
Copy link

🤖 This is a bot message 🤖

[email protected] has been sent an email requesting parallel implementation for read_json.

Note: Issues are created only once per method.

@devin-petersohn devin-petersohn added the new feature/request 💬 Requests and pull requests for new features label Apr 15, 2019
@hamx0r
Copy link

hamx0r commented Apr 20, 2020

To add: even though #715 adds read_json(), it is limited to:

  1. Reading JSON files, and not strings (ie fetched from a web source)
  2. each row of the JSON file has to be a complete DataFrame row's worth of data (and all columns must be present in every row).

I would like to see the ability to read JSON strings too, or at least allow some kind or workaround using BytesIO or similar. Currently (Python 3.7.7, modin 0.7.2, ray 0.8.0, pandas 1.0.1) fails even a simple BytesIO hack:

import modin.pandas as pd
from io import BytesIO
j = b"""
{"name": "hamx0r"}
"""
bio = BytesIO(j)
df = pd.read_json(sbo, lines=True)
print(df.head())

...results in this error:

...
  File "...modin/engines/base/io/file_reader.py", line 15, in get_path
    if S3_ADDRESS_REGEX.search(file_path):
TypeError: expected string or bytes-like object

The only workaround I've found is to load with regular pandas, then convert the dataframe to Modin:

import pandas
import modin.pandas as pd
j = """[{"name": "hamx0r"}]"""
df = pandas.read_json(j)
df = pd.DataFrame(df)
print(df.head())

@devin-petersohn
Copy link
Collaborator

Thanks @hamx0r, this is a bug, would you be okay to open a bug report for the issue you described so we do not lose track of this? New features and bugs are tracked differently and have different development timeframes. We can fix this much sooner than we can implement all of read_json functionality.

@hamx0r
Copy link

hamx0r commented Apr 20, 2020

Done! I wrote up #1379. I'm just getting into modin and appreciate all the hard work!

@mvashishtha
Copy link
Collaborator

I think read_json is defaulting to pandas now, but not implemented in parallel. The default implementation should mean that the bugs in the original post are fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature/request 💬 Requests and pull requests for new features
Projects
None yet
Development

No branches or pull requests

4 participants