Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup to S3 using IRSA does not work #4006

Open
jeffgus opened this issue Oct 1, 2024 · 4 comments
Open

Backup to S3 using IRSA does not work #4006

jeffgus opened this issue Oct 1, 2024 · 4 comments

Comments

@jeffgus
Copy link

jeffgus commented Oct 1, 2024

Overview

I'm unable to get the backup to S3 to work with a service account and IAM role (IRSA).

Environment

  • Platform: OpenShift (ROSA)
  • Platform Version: 4.14.x
  • PGO Image Tag: registry.connect.redhat.com/crunchydata/crunchy-postgres
  • Postgres Version: 15
  • Storage: gp3

Steps to Reproduce

Create an IAM role in AWS with a Trust Relationship.
Make sure that the ServiceAccounts are annotated.
set: repo2-s3-key-type = web-id
set bucket name, region, and endpoint.

I set s3.conf to be:

[global]
repo2-retention-full = 14
repo2-retention-full-type = time
repo2-s3-key-type = web-id

I'm not sure if these settings belong in the s3.conf file or the main config file. I've tried both.

EXPECTED

The pgbackrest should be able to find the token to commicate with the s3 bucket.

ACTUAL

I get one of two errors. I get an error saying that AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE env vars are missing. If I override the metadata for all serviceaccounts and edit the StatefulSet for the repo-host settings the serviceAccountName, that error goes away. It is replaced with:

command terminated with exit code 29: ERROR: [029]: unable to find child 'AssumeRoleWithWebIdentityResult':0 in node 'ErrorResponse'

Logs

command terminated with exit code 31: ERROR: [031]: option 'repo2-s3-key-type' is 'web-id' but 'AWS_ROLE_ARN' and 'AWS_WEB_IDENTITY_TOKEN_FILE' are not set

or

command terminated with exit code 29: ERROR: [029]: unable to find child 'AssumeRoleWithWebIdentityResult':0 in node 'ErrorResponse'

Additional Information

This is similar to #3135 and #3472, but these issues are old and things have changed.

I tried to tweak the role trust relationship rule and it doesn't seem to make a difference. I can run a container with awscli with the same serviceAccount and it works fine.

I can also try to run pgbackrest on the repo-node manually. It fails to properly backup (which is expected), bit it DOES communicate with S3 and creates the backup.info file.

What is the correct configuration for this to work?

@jvincze84
Copy link

Hi,
We have the same (or similar issue).
We running OKD on AWS.
OKD Version: 4.15.0-0.okd-2024-03-10-010116

Log:

time="2024-10-29T13:20:06Z" level=info msg="crunchy-pgbackrest starts"
time="2024-10-29T13:20:06Z" level=info msg="debug flag set to false"
time="2024-10-29T13:20:06Z" level=info msg="backrest backup command requested"
time="2024-10-29T13:20:06Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=1]"
time="2024-10-29T13:20:07Z" level=info msg="output=[]"
time="2024-10-29T13:20:07Z" level=info msg="stderr=[ERROR: [031]: option 'repo1-s3-key-type' is 'web-id' but 'AWS_ROLE_ARN' and 'AWS_WEB_IDENTITY_TOKEN_FILE' are not set\n]"
time="2024-10-29T13:20:07Z" level=fatal msg="command terminated with exit code 31"

But these system evironments are in place. I checked in a debug pod. I also checked web id token and role with aws cli and i was able to upload files to the bucket.

Can somebody help? It seems that the error message is missleading and there are other issues behind the scene.
But without proper log message we cannot contiunue debugging.

Thanks,
Jvincze84

@jeffgus
Copy link
Author

jeffgus commented Nov 1, 2024

I think the issue is how the backup runs. When I set the annotation, the cronjob runs with the AWS_ROLE_ARN, etc set. When I remove the "volume" from the s3 repo definition, the operator complains:

Stanza not created for \"repo2\" as specified for a scheduled backup

I don't think s3 repo's should have a volume section, but I can't make the operator write out the config without one. When it has a volume, then it interacts with the repo host which does NOT have AWS_ROLE_ARN set.

@zhangluva
Copy link

It looks to me that the problem is the repoHost sts uses the default service account, which does not have the annotation. It should work if you add the annotation to the default service account.
Looked at the code, there is no intend to specify the serviceAccount. I think it should simply use the service account created for pgbackrest jobs if available

@jvincze84
Copy link

It looks to me that the problem is the repoHost sts uses the default service account, which does not have the annotation. It should work if you add the annotation to the default service account. Looked at the code, there is no intend to specify the serviceAccount. I think it should simply use the service account created for pgbackrest jobs if available

You are absolutely right.
The problem is with the repo host and the default sa. After annotating the SA and restarting the repohost everything works fine.

Thank you very much for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants