Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a Redis backend storage #106

Merged
merged 21 commits into from
May 25, 2022
Merged

Implement a Redis backend storage #106

merged 21 commits into from
May 25, 2022

Conversation

chadell
Copy link
Collaborator

@chadell chadell commented Apr 13, 2022

This PRs addresses issue #57 partially, abstracting the store for DiffSyncModel objects, and extending to using Redis backend besides the local memory.

This feature it's specially important when working on a distributed data pipeline when different workers would load the data from adapters and finally the data must be shared to create the diff. It leverages on #58 proposal.

The implementation approach has moved the previous _data into the LocalStore implementation, creating abstract methods to interact with the storage, so we were able to create another implementation using Redis.

TODO

  • Extend Redis test coverage and refine cornercase
  • Fix basic testing
  • Update Docs (to be done when final approach is agreed)

@chadell chadell changed the title Implement a Redis backend storage [WIP] Implement a Redis backend storage Apr 13, 2022
@chadell chadell changed the title [WIP] Implement a Redis backend storage Implement a Redis backend storage Apr 13, 2022
@chadell chadell marked this pull request as ready for review April 13, 2022 14:43
@chadell chadell requested a review from glennmatthews as a code owner April 13, 2022 14:43
@chadell chadell mentioned this pull request Apr 24, 2022
@chadell
Copy link
Collaborator Author

chadell commented May 2, 2022

@glennmatthews I found an issue with this PR, specifically on the handling of children. Today we run the add_child and remove_child directly in the DiffSyncModel, and this works well when we run it in memory, but when using another store, such as Redis, this changes in the memory object are not update to the backend store.

For instance, from example 01:

...
                self.add(device)
                site.add_child(device)
...

In this example, site is no longer updated in the store. We would need to to a self.update(site) after this change, but I'm concerned on the backwards compatibility implications this may have, because if we forgot the update the code only works well for local memory.

...
                self.add(device)
                site.add_child(device)
                self.update(site)
...

I have tested it and if we do the self.update() after the add_child works well for both cases. So, my proposal would be to at least update the examples, to use the pattern with the update.

Looking forward to your thoughts

diffsync/store/__init__.py Outdated Show resolved Hide resolved
diffsync/store/__init__.py Outdated Show resolved Hide resolved
diffsync/store/__init__.py Outdated Show resolved Hide resolved
diffsync/store/__init__.py Outdated Show resolved Hide resolved
diffsync/store/__init__.py Outdated Show resolved Hide resolved
diffsync/store/redis.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated
@@ -23,6 +23,8 @@ packaging = "^21.3"
colorama = {version = "^0.4.3", optional = true}
# For Pydantic
dataclasses = {version = "^0.7", python = "~3.6"}
redis = "*"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is there a known minimum version of redis that's required?
  2. Can we make this into an optional/extras dependency, i.e. so that pip install diffsync doesn't pull in redis (and only supports local storage as a result), but pip install diffsync[redis] does?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

tests/unit/test_diff_element.py Outdated Show resolved Hide resolved
tests/unit/test_diffsync.py Outdated Show resolved Hide resolved
"""Init method for RedisStore."""
super().__init__(*args, **kwargs)

if url and host and port:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps?

Suggested change
if url and host and port:
if url and (host or port):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do like this, then I have to remove the default port from the kwargs, that I thought it would be convenient. But, it could be also confusing... not sure

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, perhaps just if url and host then?

diffsync/store/redis.py Outdated Show resolved Hide resolved
@@ -72,6 +72,10 @@ def get_by_uids(
"""
raise NotImplementedError

def _remove_item(self, modelname: str, uid: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I like having a private but abstract method. If this is something subclasses are responsible for implementing, it shouldn't be private, IMHO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I remember a previous conversation about this public/private question :)

self.remove(obj=child_obj, remove_children=remove_children)
except ObjectNotFound:
# Since this is "cleanup" code, log an error and continue, instead of letting the exception raise
self._log.error(f"Unable to remove child {child_id} of {modelname} {uid} - not found!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me we should probably be taking advantage of structlog better on this error message. Perhaps something like:

Suggested change
self._log.error(f"Unable to remove child {child_id} of {modelname} {uid} - not found!")
self._log.error(f"Unable to remove child element as it was not found!", child_type=child_type, child_id=child_id, parent_type=modelname, parent_id=uid)

Comment on lines 7 to 8
from redis import Redis
from redis.exceptions import ConnectionError as RedisConnectionError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need some logic here to handle the case where redis isn't installed?

@chadell chadell requested a review from glennmatthews May 20, 2022 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants