Skip to content
This repository has been archived by the owner on Aug 17, 2024. It is now read-only.

[FEATURE] Merge dataframes with different columns #112

Open
ThatIsAPseudo opened this issue Sep 27, 2020 · 2 comments
Open

[FEATURE] Merge dataframes with different columns #112

ThatIsAPseudo opened this issue Sep 27, 2020 · 2 comments

Comments

@ThatIsAPseudo
Copy link

Is your feature request related to a problem? Please describe.
I'd like to merge DataFrames with different columns.

Describe the solution you'd like
I'd like to have a df1.merge(df2) way to automatically merge two dataframes, even if a column is in df1 but not in df2, filling it with

Describe alternatives you've considered
Here is a snippet from @lmeyerov I found (and completed) on issue #15, that makes just what I want :

function unionDFs(a, b, fill='n/a') {
    // Merge two dataframes with different columns
    const aCols = a.listColumns(); // this line was missing on lmeyerov's original snippet
    const bCols = b.listColumns(); // this line was missing on lmeyerov's original snippet

    const aNeeds = b.listColumns().filter((v) => aCols.indexOf(v) === -1);
    const bNeeds = a.listColumns().filter((v) => bCols.indexOf(v) === -1);

    const a2 = aNeeds.reduce((df, name) => df.withColumn(name, () => fill), a);
    const b2 = bNeeds.reduce((df, name) => df.withColumn(name, () => fill), b);

    return a2.union(b2);
}

Additional context
Current behaviour
Capture d’écran 2020-09-27 à 16 13 57

What I'd like
Capture d’écran 2020-09-27 à 16 16 06

@ThatIsAPseudo
Copy link
Author

ThatIsAPseudo commented Sep 27, 2020

A better implementation of the unionDFs snippet :

DataFrame.prototype.merge = function(df2, fill = null) {
    // Merge two dataframes with different columns
    const aCols = df2.listColumns();
    const bCols = this.listColumns();

    const aNeeds = this.listColumns().filter((v) => aCols.indexOf(v) === -1);
    const bNeeds = df2.listColumns().filter((v) => bCols.indexOf(v) === -1);

    const a2 = aNeeds.reduce((df, name) => df.withColumn(name, () => fill), df2);
    const b2 = bNeeds.reduce((df, name) => df.withColumn(name, () => fill), this);

    return a2.union(b2);
}

@lachisis
Copy link

lachisis commented Dec 7, 2020

This bug can be particularly insidious - if one dataframe's columns are a subset of another's, the behavior is inconsistent.

  • If you concatenate the df with fewer columns to the one with all columns, the union will execute without issue.
  • If you concatenate the df with all columns to the one with fewer, then it will fail.

This error is due to the use of an incorrect column comparison. It is still an issue in master:

export function arrayEqual(a, b, byOrder = false) {

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants