Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ldap caching and move to the Zitadel v2 API #79

Merged
merged 15 commits into from
Dec 5, 2024

Conversation

tlater-famedly
Copy link
Contributor

@tlater-famedly tlater-famedly commented Oct 30, 2024

Closes: #24
Closes: #53
Closes: #30

WIP because there are some things to clear up still. The tests will all fail here, but with some fudging of the sources almost all tests pass.

There are a few things that need to be discussed/covered:

Multi-source config/test suite

Previously, we wanted to support specifying multiple sources simultaneously, since that came somewhat for free. With the removal of the cache, however, only the UKT source uses email addresses to identify users, and in fact only it has a concept of "removed" users, which means that it fundamentally needs to be handled separately from the other sources. Trying to support multi-source definitions is much more tricky with this limitation, so I believe we should not do so.

This is the cause of most of the remaining failures; the test suite was written around the assumption that we can just dump all kinds of sources into one big config file, and for getting this over the line I've ignored also rewriting the test suite so far, instead testing by only having the respective source actively usable in code for the time being.

External user ID encoding

Since we universally use the external user IDs to uniquely identify users now, and we actually pull data from Zitadel instead of just pushing, we now need to re-ingest the user IDs we've written to the nickname field. Unfortunately, because of ldap3's odd behavior of silently converting byte-typed values to strings when they can be, but otherwise returning bytes, we need to base64-encode these IDs.

So far we've just encoded any values that were bytes, and left strings alone - alas, apparently you can end up with strings randomly being valid base64 values (e.g. starttls). This means that we do not have a way to distinguish between IDs that were encoded and IDs that were not - when we get unlucky, this can make it impossible to uniquely identify a user, since decoding stringly-typed IDs can sometimes result in valid values.

Fixing this in the current implementation is easy (simply prefix encoded strings with a prefix, or encode all values, since UTF-8 strings are also valid byte strings), but backwards compatibility is a problem, since users with encoded IDs already exist in production.

It's possible that there is no way around runs in which some users will be deleted and immediately re-created.

Zitadel version

Due to the various recent Zitadel bugs, we need to update the Zitadel version in this PR. I've currently gone with the latest (2.64.1), but maybe we want another version?


Remaining tasks:

@tlater-famedly tlater-famedly requested a review from a team as a code owner October 30, 2024 11:53
@emgrav
Copy link
Member

emgrav commented Oct 30, 2024

It's possible that there is no way around runs in which some users will be deleted and immediately re-created.

For UKE we'll need to make sure to test this with a dry run

src/zitadel.rs Outdated Show resolved Hide resolved
@tlater-famedly
Copy link
Contributor Author

It's possible that there is no way around runs in which some users will be deleted and immediately re-created.

For UKE we'll need to make sure to test this with a dry run

The UKE source should be unaffected, because it doesn't need to parse external IDs from Zitadel. But yeah, probably prudent to do dry runs for the first sync after updating in general, this is a pretty big change.

emgrav
emgrav previously approved these changes Oct 31, 2024
src/lib.rs Show resolved Hide resolved
@emgrav
Copy link
Member

emgrav commented Oct 31, 2024

Remaining before this can be merged:

  1. Some tests need to be refactored to no longer combine multiple sources.
  2. We need to document the potential quirks of external user ID encoding, including instructions for infra on what to look for in a dry run to identify a problem before it happens.

@jannden
Copy link
Contributor

jannden commented Nov 3, 2024

There is a clash between CSV and LDAP at this point. When we import CSV, it wrongly deletes Zitadel users not present in the CSV (or rather users that it couldn't match).
A) Is that a desired behavior?
B) The old and new users are compared with their external_user_id which won't match if we mix CSV and LDAP, because they use different external_user_id. CSV uses email addresses for that and LDAP uses user_id attribute pulled from LDAP for that. Shall we add another field in CSV to explicitly add a user_id that has the potential to match LDAP user_id?

@tlater-famedly
Copy link
Contributor Author

There is a clash between CSV and LDAP at this point. When we import CSV, it wrongly deletes Zitadel users not present in the CSV (or rather users that it couldn't match).

A) Is that a desired behavior?

Ideal behavior would of course be that switching between LDAP/CSV is possible, but this is not a use case we actually currently have, so while not ideal, it's probbaly fine for this to happen.

B) The old and new users are compared with their external_user_id which won't match if we mix CSV and LDAP, because they use different external_user_id. CSV uses email addresses for that and LDAP uses user_id attribute pulled from LDAP for that. Shall we add another field in CSV to explicitly add a user_id that has the potential to match LDAP user_id?

Yes, this is already a planned feature, see #75. We should do that as a separate task, though, as mixing the sources will for now be unsupported.

We should probably make sure with product that this is indeed fine, however.

@jannden
Copy link
Contributor

jannden commented Nov 4, 2024

Okay, in that case, we can merge the updated tests from: #80

Copy link

codecov bot commented Nov 4, 2024

Codecov Report

Attention: Patch coverage is 92.47788% with 34 lines in your changes missing coverage. Please review.

Project coverage is 92.01%. Comparing base (483af5f) to head (519ddba).
Report is 15 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/lib.rs 86.40% 17 Missing ⚠️
src/zitadel.rs 90.96% 16 Missing ⚠️
src/main.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #79      +/-   ##
==========================================
+ Coverage   86.76%   92.01%   +5.25%     
==========================================
  Files           7        8       +1     
  Lines        1352     1277      -75     
==========================================
+ Hits         1173     1175       +2     
+ Misses        179      102      -77     
Files with missing lines Coverage Δ
src/config.rs 97.76% <100.00%> (+1.45%) ⬆️
src/sources/csv.rs 99.25% <100.00%> (+5.87%) ⬆️
src/sources/ldap.rs 97.85% <100.00%> (+2.16%) ⬆️
src/sources/ukt.rs 87.19% <ø> (+0.46%) ⬆️
src/user.rs 100.00% <100.00%> (+10.29%) ⬆️
src/main.rs 0.00% <0.00%> (ø)
src/zitadel.rs 92.55% <90.96%> (+17.09%) ⬆️
src/lib.rs 86.40% <86.40%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 483af5f...519ddba. Read the comment docs.

emgrav
emgrav previously approved these changes Nov 4, 2024
@tlater-famedly tlater-famedly force-pushed the tlater/remove-cache branch 6 times, most recently from edd027e to 6f6db25 Compare November 4, 2024 15:46
jannden
jannden previously approved these changes Nov 4, 2024
@emgrav
Copy link
Member

emgrav commented Nov 5, 2024

When we merge this we should tag a new version and notify Niklas

@lukaslihotzki-f lukaslihotzki-f mentioned this pull request Nov 5, 2024
src/user.rs Outdated Show resolved Hide resolved
tests/e2e.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jannden jannden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things I noticed, meaning as points of discussion.

src/main.rs Outdated Show resolved Hide resolved
src/sources/csv.rs Outdated Show resolved Hide resolved
src/user.rs Outdated Show resolved Hide resolved
src/user.rs Show resolved Hide resolved
jannden
jannden previously approved these changes Nov 18, 2024
Copy link
Contributor

@jannden jannden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for manual tests now.

@tlater-famedly tlater-famedly force-pushed the tlater/remove-cache branch 4 times, most recently from 1e099d5 to b9d9048 Compare December 5, 2024 07:42
@tlater-famedly tlater-famedly changed the title WIP: Remove ldap caching and move to the Zitadel v2 API Remove ldap caching and move to the Zitadel v2 API Dec 5, 2024
@tlater-famedly tlater-famedly force-pushed the tlater/remove-cache branch 2 times, most recently from be32f1b to c00730f Compare December 5, 2024 08:05
@tlater-famedly tlater-famedly merged commit 519ddba into main Dec 5, 2024
5 checks passed
@tlater-famedly tlater-famedly deleted the tlater/remove-cache branch December 5, 2024 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get rid of "cache" concept Allow non-SSO deployments Switch to new API
3 participants