Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce api call to token service endpoint #364

Closed
GregoireW opened this issue Jul 23, 2024 · 25 comments · Fixed by #368 or #373
Closed

Reduce api call to token service endpoint #364

GregoireW opened this issue Jul 23, 2024 · 25 comments · Fixed by #368 or #373
Labels
enhancement New feature or request

Comments

@GregoireW
Copy link

TL;DR

When we use this action with workload identity there is a high number of call made to the token exchange api endpoint.

image

By default the exchange token endpoint is rate limited to 6k per minutes. As I got some issue with this limit, I tracked down the usage, and this action is one of the few culprit I found.

Detailed design

No response

Additional information

I know this action is using the official @google-cloud/storage library, and I'm pretty sure it is something to be fix on this library but I hope it is ok to open this issue here.

@GregoireW GregoireW added the enhancement New feature or request label Jul 23, 2024
@sethvargo
Copy link
Member

Hi @GregoireW - can you please provide your action.yml so I can reproduce this on my end? Are you seeing that audit log once per object upload? Are you using google-github-actions/auth?

@GregoireW
Copy link
Author

Hello,

There is not a 1-1 api call - file to be transferred, it may be linked to the parallel things but not sure.

Here a simple test I use ( I tried direct workload identity and SA workload identity, the result is the same ) :

      - uses: actions/checkout@v4
        id: checkout

      - name: generate
        run: |
          mkdir site
          for i in {1..250} ; do 
            head -c 1M </dev/urandom > site/test-$i.txt ; 
          done;

      - id: 'auth'
        name: 'Authenticate to Google Cloud'
        uses: 'google-github-actions/auth@v2'
        with:
          workload_identity_provider: projects/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
          service_account: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

      - name: Publish report
        uses: google-github-actions/upload-cloud-storage@v2
        with:
          path: ./site
          destination: zzzzzzzzzzzzzzzzzzzzz
          parent: false
          process_gcloudignore: false
          gzip: false

I tried to modify this action to use the transfer manager ( main...GregoireW:upload-cloud-storage:main ) , but no luck there apparently

@sethvargo
Copy link
Member

Hi @GregoireW - I did some digging, and I think this is happening in the GCS library and/or Google Auth library itself. I have three theories from tracing the code:

  1. The default library creates a new AuthClient on each API call, which doesn't do token caching. Can you try the branch google-github-actions/upload-cloud-storage@sethvargo/auth and send debug logs and see if the number of auth requests goes down?

  2. I don't see any locking in the GoogleAuth library, so it's possible that all 100 requests are happening simultaneously, all mint a token, and all cache the token. That would be an upstream bug.

  3. I tried tracing the code in GoogleAuth, but it's quite complex. I do see functionality for caching the returned token, but I'm unclear on the scope to which that is cached and how the cache is invalidated. It's possible this is an upstream bug or misunderstanding.

@GregoireW
Copy link
Author

Hi @sethvargo

The google-github-actions/upload-cloud-storage@sethvargo/auth as the same issue as the v2 branch.

My guess was also a missing lock issue on the authentication side, If I found time this week I will try to override the google authent.

On a side note, if you use "gcloud storage cp ..." you have the same issue :/

@sethvargo
Copy link
Member

On a side note, if you use "gcloud storage cp ..." you have the same issue :/

In that case, I definitely think it's an upstream bug sadly. Are you able to file a bug against the google-auth library? Noting that it happens in multiple paths is helpful. We shouldn't be exchanging an STS token on each file upload.

@GregoireW
Copy link
Author

I started with the auth js lib, will see what will be the result.

In the meantime I put a lock on the authentication ( https://github.com/GregoireW/upload-cloud-storage/blob/7a3d155590ac7d56d02e7c150065170c27de9366/src/client.ts#L305-L311 ) but clearly it slow down upload ( but at least I got only 2 token exchange api call )

I tried to set something like "if alreadyAuthenticated return super.request else lock (super.request)" when the first file are uploaded it is ok, only 1-2 api call to token exchange, then when authentication is done and the lock is not there anymore the number of call increase again. so ...

@sethvargo
Copy link
Member

I tried to set something like "if alreadyAuthenticated return super.request else lock (super.request)" when the first file are uploaded it is ok, only 1-2 api call to token exchange, then when authentication is done and the lock is not there anymore the number of call increase again. so ...

Yea, you really need the lock around the call to the auth token, not the call to the object upload (but you don't have access to those internals at this scope).

@GregoireW
Copy link
Author

I refined the lock usage:

https://github.com/GregoireW/upload-cloud-storage/blob/ab0fd3cf9ce46216ecbefeead2072fd4194f271b/src/client.ts#L305-L321

With this I lock the authent only on the first call the number of call stay at 2, and the total duration is close to what it was without lock so until new thing, I guess I will stay with that.

@danielbankhead
Copy link

danielbankhead commented Jul 31, 2024

This should be resolved as of auth v9.13.0:

@sethvargo
Copy link
Member

sethvargo commented Aug 1, 2024

Thanks @danielbankhead - I think we need @google-cloud/storage to pull in that update first though. Do you know when the next release is scheduled by chance?

@danielbankhead
Copy link

It should be automatically pulled in as it is a minor release (no manual update required). This can be verified via npm ls google-auth-library after an npm install.

@sethvargo
Copy link
Member

Ah great, thank you!

@GregoireW
Copy link
Author

Hum... unfortunately, the issue is not resolved with the latest version. The number STS api call is still high.

@sethvargo
Copy link
Member

Hmm - the new version definitely pulled in the latest version.

@danielbankhead
Copy link

I double-checked on the library side and I’m unable to reproduce on the latest version with the auth schemes mentioned. Is there any way you can confirm the auth client in use?

@GregoireW
Copy link
Author

The package lock reference [email protected]

a npm explain google-auth-library give:

npm explain google-auth-library
[email protected]
node_modules/google-auth-library
  google-auth-library@"^9.6.3" from @google-cloud/[email protected]
  node_modules/@google-cloud/storage
    @google-cloud/storage@"^7.12.0" from the root project

If I compile this action on my computer, thebuild is exactly the same, so it is [email protected] that is in use.

@danielbankhead
Copy link

Thanks for that information. Would it be possible to determine the resolved type of AuthClient? This can be checked via GoogleAuth#getClient from the google-auth-library.

@sethvargo sethvargo reopened this Aug 9, 2024
@GregoireW
Copy link
Author

I did this in the actions:

const auth = new GoogleAuth({
      projectId: opts?.projectID,
      universeDomain: opts?.universe,
    });
    auth.getClient().then((client) => {
      console.log('The created client is: ', client);
    });
    const options: StorageOptions = {
      authClient: auth,
      ..............

The log I get :

The created client is:  IdentityPoolClient {
    _events: [Object: null prototype] {},
    _eventsCount: 0,
    _maxListeners: undefined,
    credentials: {},
    eagerRefreshThresholdMillis: 300000,
    forceRefreshOnFailure: false,
    universeDomain: 'googleapis.com',
    projectId: null,
    quotaProjectId: undefined,
    transporter: DefaultTransporter {
      instance: Gaxios {
        agentCache: Map(0) {},
        defaults: {},
        interceptors: [Object]
      }
    },
    cloudResourceManagerURL: URL {
      href: 'https://cloudresourcemanager.googleapis.com/v1/projects/',
      origin: 'https://cloudresourcemanager.googleapis.com/',
      protocol: 'https:',
      username: '',
      password: '',
      host: 'cloudresourcemanager.googleapis.com',
      hostname: 'cloudresourcemanager.googleapis.com',
      port: '',
      pathname: '/v1/projects/',
      search: '',
      searchParams: URLSearchParams {},
      hash: ''
    },
    stsCredential: StsCredentials {
      clientAuthentication: undefined,
      crypto: NodeCrypto {},
      tokenExchangeEndpoint: 'https://sts.googleapis.com/v1/token',
      transporter: DefaultTransporter { instance: [Gaxios] }
    },
    scopes: undefined,
    cachedAccessToken: null,
    audience: '//iam.googleapis.com/projects/123456789123456789/locations/global/workloadIdentityPools/my-pool/providers/github',
    subjectTokenType: 'urn:ietf:params:oauth:token-type:jwt',
    workforcePoolUserProject: undefined,
    serviceAccountImpersonationUrl: 'https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/[email protected]:generateAccessToken',
    serviceAccountImpersonationLifetime: 3600,
    configLifetimeRequested: false,
    projectNumber: '123456789123456798',
    supplierContext: {
      audience: '//iam.googleapis.com/projects/123456789123456798/locations/global/workloadIdentityPools/my-pool/providers/github',
      subjectTokenType: 'urn:ietf:params:oauth:token-type:jwt',
      transporter: DefaultTransporter { instance: [Gaxios] }
    },
    credentialSourceType: 'url',
    subjectTokenSupplier: UrlSubjectTokenSupplier {
      url: 'https://pipelinesghubeus2.actions.githubusercontent.com/xxxxxxxxxxxxxxxxxxxxxxxxx/00000000-0000-0000-0000-000000000000/_apis/distributedtask/hubs/Actions/plans/yyyyyyyyyyyyyyyyyyyy/jobs/zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz/idtoken?api-version=2.0&audience=https%3A%2F%2Fiam.googleapis.com%2Fprojects%2F123456789123456798%2Flocations%2Fglobal%2FworkloadIdentityPools%2Fmy-pool%2Fproviders%2Fgithub',
      formatType: 'json',
      subjectTokenFieldName: 'value',
      headers: {
        Authorization: '***'
      },
      additionalGaxiosOptions: { retry: true, retryConfig: [Object] }
    },
    [Symbol(shapeMode)]: false,
    [Symbol(kCapture)]: false
  }

@danielbankhead
Copy link

I've completed a deep dive on this and I'm still unable to reproduce after the fix. However, I do see a path were it would attempt to request a new token each request; do you mind checking the expiry_date value of the generated tokens? You can do so by modifying the provided snippet to:

auth.getClient().then((client) => {
  console.log('The created client is: ', client);
  client.on('tokens', (token) => console.dir({expiry_date: token.expiry_date});
  client.getAccessToken();
});

If the expiry is within 5 minutes it would be eagerly refreshed each time:

@GregoireW
Copy link
Author

I got Thu, 15 Aug 2024 06:18:49 GMT { expiry_date: 1723706329000 } so it means the expiry date is in one hour.

@danielbankhead
Copy link

Thanks, with that I believe we're dealing with the library creates a new AuthClient on each API call case mentioned above: #364 (comment)

I don't see much else we can do on the auth lib side to improve the experience.

@sethvargo
Copy link
Member

@danielbankhead is there anything I can do in this GitHub Action to re-use the auth client on API calls?

@GregoireW
Copy link
Author

in the authorizeRequest / request (and other) function in the auth library there is an await getClient(). There is no lock there, so multiple client can be created at the same time.

I did set the authentication manually in the storage object:

https://github.com/GregoireW/upload-cloud-storage/blob/4ac2eeba706588c37334cedb60c564701c0c204f/src/client.ts#L165-L171

then I force an await before launching upload:

https://github.com/GregoireW/upload-cloud-storage/blob/4ac2eeba706588c37334cedb60c564701c0c204f/src/client.ts#L231-L233

it reduces the number of API call to 2.

@danielbankhead is locking getClient is something possible on oauth side ? or should we put something like I did on client side ?

@danielbankhead
Copy link

@danielbankhead is locking getClient is something possible on oauth side ? or should we put something like I did on client side ?

That's an excellent point, I'll work on this on the auth side shortly. We haven't had this issue for any other AuthClient type, which made this very difficult to catch.

@danielbankhead
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
3 participants