Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption when using db_field #1730

Open
th-erker opened this issue Jan 25, 2018 · 4 comments
Open

Data corruption when using db_field #1730

th-erker opened this issue Jan 25, 2018 · 4 comments
Labels

Comments

@th-erker
Copy link
Contributor

Summary
The mongoengine code makes the implicit assumption that db field names and model field names only overlap if they refer to the same field. If this condition is not satisfied, either by explicit model design (test case 1 and 2) or by garbage/old data in the database (test case 3 and 4), all kinds of data corruption happen.

How to reproduce
Run the attached file db_field_test.txt with python2 -R db_field_test.txt. The expected (bug free) output would be that in all four test cases f, g, h show the same values. But the actual output is like this:

name      x1         x2         y1         y2         z1         z2        
f1        True       None       True       None       True       False     
g1        True       None       None       True       False      None      
h1        True       None       None       True       False      True      

name       x1         x2         y1         y2         z1         z2        
f2         True       None       True       None       True       False     
g2         None       True       None       True       None       False     
h2         True       None       True       None       True       False     

name       w1        
f3         True      
g3         False     

name       w1        
f4         True      
g4         False     

Some bugs are dependent on the order iterators return dictionary items, so several runs might be necessary to see the bugs (and that's the reason for the -R flag).

The test program defines a strict / dynamic document with the following fields

    w1 = fields.BooleanField(db_field='w2')

    x1 = fields.BooleanField(db_field='x2')
    x2 = fields.BooleanField(db_field='x3')

    y1 = fields.BooleanField(db_field='y0')
    y2 = fields.BooleanField(db_field='y1')

    z1 = fields.BooleanField(db_field='z2')
    z2 = fields.BooleanField(db_field='z1')

In each of the four test cases it creates an object f and sets some of the fields to True or False, as shown in the output. The object is saved and loaded again (g). The object h has set the same fields with the same values as f, but using the constructor instead of attribute access (h = Doc(x1=False, ...).
The first and third test cases use strict documents, the second and fourth test cases use a dynamic document.
In the last two test cases, the field 'w1' is set directly in the database to False after f is saved and before g is loaded.

Analysis
In the mongoengine code, two patterns are used which work if the above assumption holds, but break down if not:

  1. Field names can be converted multiple times to db names (or model names).
  2. Model field names and db field names can be mixed in the same data structure.

In base.document.BaseDocument.__init__, field names are converted from db names to model names, but the field names should already be model names. This explains h1. This conversion only happens if the document is strict, therefore h2 does not show the bug.

In base.document.BaseDocument._from_son, field names are converted to db names. Again, this does not make sense, as the SON object uses db names when loading objects from the database. (As this bug is "repaired" by __init__ for strict documents, only g2.x1 shows the wrong value and not g1.x1.)
The names/values are copied to data dictionary, there they are converted to model names by overwriting data in a loop and deleting the db name from it. At this point, the order of the items returned by field.iteritems() matters, as this might result in multiple conversions in the data dictionary. For example y1 is saved in the database as y0. The conversion does not change it, so it is y0 in data. In the loop it is copied to y1. but if field.iteritems() returns y2 after y1, the loop treats y1 as the db name of the model field y2, and therefore copies it to data['y2'].

The 3rd and 4th test case, the (double) conversion to db names in base.document.BaseDocument._from_sonis the culprit. In the database, w1 and w2 exists. The latter by saving f, the former by setting it explicitly. Both are converted to db name w2 in a loop over son.iteritems(). If w1 was last, the wrong data is loaded.
It is remarkable that this is even the case with strict documents (g3), as fields not defined in a strict model are filtered at the last moment.

If both x1 and x2 are set (test case not included) and saved, this conversion of db names (x2 and x3) would both copy to x3, resulting in only x2 is present after loading, with the value of either the original x1 or x2 (depending of the order of son.iteritems()).

@erdenezul
Copy link
Collaborator

could you please add simple failing test case and send pr request?

th-erker added a commit to th-erker/mongoengine that referenced this issue Feb 6, 2018
Problem regarding model/db field name mapping.
@th-erker
Copy link
Contributor Author

th-erker commented Feb 6, 2018

Done: #1741

@erdenezul erdenezul added the Bug label Feb 6, 2018
@erdenezul
Copy link
Collaborator

I think I can able to work on your reported bug.

@dshmatkou
Copy link

dshmatkou commented Mar 7, 2019

I think that the issue described below is related to this:

class A(EmbeddedDocument):
    x = IntField(db_field='fx')

a = A.from_json('{"x": 1, "fx": 2}')
assert a.x == 2

(will raise assertion error)
The pr fixed this #904 started current issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants