Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide convertion from dictionary to DataFrames #591

Closed
rominf opened this issue May 1, 2014 · 5 comments
Closed

Provide convertion from dictionary to DataFrames #591

rominf opened this issue May 1, 2014 · 5 comments

Comments

@rominf
Copy link

rominf commented May 1, 2014

Sometimes it's convenient to work with Arrays that starts from the zero (or from any other index).

I can use OrderedDict to achieve Array-like structures. The problem is that printing dictionaries is still ugly (JuliaLang/julia#1759). But DataFrames give me better printing. I want to be able to convert Dict (or OrderedDict) to the DataFrames in one command.

This code works fine for me but I don't know whether is good or not and where should I put it (in the DataFrames or in the DataStructures code?)

import Base.convert
function convert{K, V}(::Type{DataFrame}, d::Associative{K, V})
    DataFrame(collect(keys(d)), collect(values(d)))
end

or (I saw somewhere that DataFrame constructor without column names is deprecated):

import Base.convert
function convert{K, V}(::Type{DataFrame}, d::Associative{K, V})
    DataFrame(keys=collect(keys(d)), values=collect(values(d)))
end

Example of usage:

od = OrderedDict()
od[1] = 1.0
od[1.1] = 2.1
convert(DataFrame, od)

Output:

    x1  x2
1   1   1.0
2   1.1 2.1

UPD: I want to use Dict instead of Array because I want to be able to do those kind of stuff:

od[1.5] = 42.0  # for storing values of non analytic function with one argument

or

# Array like structure with negative indeces
od = OrderedDict()
for i = -5:5
    od[i] = 2*i
end
@garborg
Copy link
Contributor

garborg commented May 1, 2014

Those methods would go in DataFrames, but there are existing DataFrame constructors for Dict and Associative:

od = OrderedDict()
od[:a] = [1, 2]
od[:b] = [2, 3]

# The Associative method is broken -- I'll try to fix it soon
DataFrame(od)
ERROR: no method getindex(KeyIterator{HashDict{Any,Any,Int64}}, Int64)
 in DataFrame at /Users/sean/.julia/v0.3/DataFrames/src/dataframe/dataframe.jl:94

# In the meantime, the method for Dicts works
DataFrame(Dict(od))
2x2 DataFrame
|-------|---|---|
| Row # | a | b |
| 1     | 1 | 2 |
| 2     | 2 | 3 |

# But it won't work for your example because column names are expected to be
#   symbols or convertible to symbols

You could hook up something like this in your own code:

df = DataFrame()
for (k, v) in od
    df[DataFrames.identifier(string(k))] = v
end

But I'm guessing you'll want to go elsewhere for printing Dicts with numeric keys -- DataFrame colnames are required to be valid Julia identifiers, so 1.1 becomes :x1_1.

@rominf
Copy link
Author

rominf commented May 1, 2014

I think that I didn't make point of using dictionaries instead of Arrays clear. I updated issue. Your solution solves different problem.

@garborg
Copy link
Contributor

garborg commented May 1, 2014

Thanks @rominf -- you can definitely use the code snippet at the bottom of my last post, though you may not be happy with the way it mangles the "indices" (keys).

The packaged DataFrame(Dict) constructor works as intended for creating DataFrames -- it's unfortunate for your OrderedDict{Number, Any}-printing needs, but something like that really can't override DataFrame design decisions -- I'm sure any work on improving Dict printing in base would be appreciated, or there's creating your own type or just creating your own print method for existing types.

@rominf
Copy link
Author

rominf commented May 1, 2014

OK, I agree.

@rominf rominf closed this as completed May 1, 2014
@NaelsonDouglas
Copy link

NaelsonDouglas commented Mar 7, 2018

function createdataframe(input::Dict)  
  parsedinput = Dict()
  for x in keys(input)
    parsedinput[Symbol(x)] = [input[x]]
  end
  return DataFrame(parsedinput)
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants