Add RAM and threads options to init action #738

cklin · 2021-09-13T23:04:14Z

This PR adds ram and threads inputs to the init step of the Action to limit resource use of CodeQL extractors.

Merge / deployment checklist

Confirm this change is backwards compatible with existing workflows.
Confirm the readme has been updated if necessary.
Confirm the changelog has been updated if necessary.

edoardopirovano

Mostly looks good, some comments below.

edoardopirovano · 2021-09-15T10:24:48Z

.github/workflows/pr-checks.yml

+
+      - name: Run init
+        run: |
+          runner/dist/codeql-runner-linux init --ram=230 --threads=1 --repository $GITHUB_REPOSITORY --languages java --github-url $GITHUB_SERVER_URL --github-auth ${{ github.token }}


I think I'd slightly prefer this to use something other than 1 for the threads as 1 could very easily be a default in which case we wouldn't really be checking anything. Maybe 3 as that seems very unlikely to ever be the default value.

This is a decision that I struggled with.

The main challenge here is that the number of threads used defaults to (and is capped at) the number of CPU cores available on the machine. GitHub action runners have 2 CPUs, so both the default and the cap would be 2. Passing any value larger than 1 means that the test will break if it runs on a system with different number of CPUs.

I would love to hear your thoughts. For example, maybe we could assume that the runner will always have at least 2 CPUs and have two tests with threads set to 1 and 2?

Aha, this is a good point. As I mentioned below, I'm actually not sure that capping this is a good idea, but for now let's leave this test as is 🙂

src/init-action.ts

src/runner.ts

pr-checks/checks/extractor-ram-threads.yml

edoardopirovano · 2021-09-15T10:32:24Z

src/util.ts

- * Get the codeql `--threads` value specified for the `threads` input.
- * If no value was specified, all available threads will be used.
+ * Get the value of the codeql `--threads` flag specified for the `threads`
+ * input. If no value was specified, all available threads will be used.
 *
 * The value will be capped to the number of available CPUs.


I don't know how I feel about capping this. I note that the RAM is not capped and the user can request more RAM than is available on the machine thus causing an OOM. On the other hand, the threads are capped even though using more threads than there are cores wouldn't cause us to crash (just give suboptimal performance). Additionally, there are valid reasons to have more threads than cores - for instance in CPUs with hyperthreading there is a benefit to having 2x the number of threads as cores.

If we do indeed want to cap the number of threads, it should definitely be documented in the user-facing description of the threads input rather than in a comment on the code.

I want to note that this PR refactors the implementation of getMemoryFlag() and getThreadsFlag() but does not change their behavior. What you say makes sense; perhaps we can continue the discussion (and invite wider participation that just the two of us) outside this PR?

I agree that we should invite a wider discussion on this outside the PR as it is technically a breaking behaviour change - some users might be passing 99 in and relying on the capping which would cause them to experience massive performance issues if we remove it.

Given this wasn't introduced by this PR, I don't think we need to block on it.

cklin · 2021-09-15T15:51:49Z

While we are here, there is a code scanning warning that we have the same inputs (ram and threads) in different actions (init and analyze) with different help texts. In this case, they are indeed different:

The init inputs limit resource usage during extraction, and
The analyze inputs limit resource usage during query execution

Do you think there is more we can do to simplify or to clarify these different inputs?

edoardopirovano · 2021-09-15T16:29:46Z

Regarding your final point, I think I agree with Code Scanning that having the same option name of different steps and serving a different purpose may invite user confusion. It's too late to change the options on analyze as these are shipped but maybe these new options could be extractor-{ram,threads} instead?

adityasharad · 2021-09-15T17:44:37Z

Which threads argument gets passed to database finalize?
I'm not sure users will know (or need to know) the context about these values going to extractors, so another option is to weaken the custom query and continue to explain carefully in the options how they are different for the two commands.

edoardopirovano · 2021-09-15T17:50:01Z

finalize shares the arguments in the analyze step.

A further option - make init the place where these should be specified and then use the ones from init everywhere when they are available (storing them in the config for the analyze step to use later).

Obviously for backwards compatibility we still keep the inputs on analyze in the code, but we amend our documentation to remove all mention of them.

adityasharad · 2021-09-15T18:16:22Z

That sounds good. We could respect the analyze arguments if they are provided separately (might be handy for debugging), but if they are absent just respect the values passed to init.

cklin · 2021-09-15T18:38:47Z

A further option - make init the place where these should be specified and then use the ones from init everywhere when they are available (storing them in the config for the analyze step to use later).

Sounds like a plan. I will revise this PR to do so.

Obviously for backwards compatibility we still keep the inputs on analyze in the code, but we amend our documentation to remove all mention of them.

Can you point me to the documentation that need to be amended, just so that I do not miss it?

edoardopirovano · 2021-09-15T18:45:11Z

Can you point me to the documentation that need to be amended, just so that I do not miss it?

Huh, I'm actually not sure. I expected the README in this repository to mention it, but it does not.

In the github/docs-internal repository, I expected the page corresponding to https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning to mention this but actually it seems that it also does not.

@adityasharad Any ideas where we might be mentioning this?

adityasharad · 2021-09-15T18:48:03Z

That is the right place. I think we have never actually documented ram and threads outside this repo, though that is more by omission than by design.

cklin · 2021-10-19T23:38:59Z

New snapshot. I think this PR is ready for another look.

The main change is that the analyze action and runner command now by default use the same RAM and threads options as the init action and runner command. Explicit RAM and threads settings for analyze still take precedence.

For action, the option values are passed using core.exportVariable()
For runner, the option values are passed using the tracer environment file codeql-env.json

Settings propagation for the analyze action is tested with newly added unit tests. Settings propagation for the analyze runner command has no tests and could use a more careful review.

edoardopirovano

Mostly looking good, a few comments below.

edoardopirovano · 2021-10-20T14:35:40Z

init/action.yml

+    description: Override the amount of memory in MB to be used by CodeQL extractors. By default, almost all the memory of the machine is used.
+    required: false
+  threads:
+    description: The number of threads to be used by CodeQL extractors.


I think this description and the one above should clarify that this is also the threads and RAM used for analysis unless they are overidden. Also, we explain what the default for ram is but we do not explain the default value of threads - I think we should say something about that. (I haven't got to that bit of the code yet, but I assume it defaults to the number of cores on the runner?)

I updated the description based on your excellent suggestions.

edoardopirovano · 2021-10-20T14:38:49Z

analyze/action.yml

@@ -29,7 +29,7 @@ inputs:
    required: false
    default: "false"
  threads:
-    description: The number of threads to be used by CodeQL.
+    description: The number of threads to be used by CodeQL. By default, this action will use the same number of threads as the init action.


Here and above, instead of just saying used by CodeQL, I would say used by CodeQL for database finalization and query execution, to distinguish from the values in init that are used for extraction.

That is a good suggestion, and I changed the description accordingly.

edoardopirovano · 2021-10-20T14:41:25Z

src/runner.ts

+    const jsonEnvFile = path.join(config.tempDir, codeqlEnvJsonFilename);
+    return JSON.parse(fs.readFileSync(jsonEnvFile).toString("utf-8"));
+  } catch (err) {
+    return {};


We do we now silently fail if we can't load this file? If this is indeed a deliberate change and is necessary, I think we should have a comment here explaining why.

Good catch! I removed the catch block from the extracted loadTracerEnvironment() function so that the behavior of importTracerEnvironment() remains unchanged.

edoardopirovano · 2021-10-20T14:42:22Z

src/runner.ts

@@ -53,11 +55,19 @@ function getToolsDir(userInput: string | undefined): string {

 const codeqlEnvJsonFilename = "codeql-env.json";

+function loadTracerEnvironment(config: Config): any {


I think there should be a more specific return type than any that we can use here. I expect it to always be a map from strings to strings?

That is a good point. I changed the return type to { [name: string]: string }.

edoardopirovano · 2021-10-20T14:44:48Z

src/runner.ts

+  )
+  .option(
+    "--threads <number>",
+    "The number of threads to be used by CodeQL extractors."


The same comments I made to the Action's input descriptions also apply here. Note that since the runner is being deprecated, I personally think we can omit adding these flags to it - we're committed to maintaining it till March but I don't think we need to be adding new functionality to it.

I updated the runner option help text to match the descriptions of the action inputs.

Your point about the runner being deprecated is well taken, and I will keep that in mind for any future work in this area. Thanks!

edoardopirovano · 2021-10-21T00:10:26Z

Thanks for addressing my comments! It's my understanding (@adityasharad can confirm) that we don't want to merge anything until we've bumped the default CLI version to 2.6.4 and released that to v1.

I think this is good to go after that, though.

adityasharad

A few clarifying questions, overall looks sensible. No objection to merging into main, though we can hold off on v1 until the upcoming CLI release.

adityasharad · 2021-10-21T00:18:54Z

CHANGELOG.md

@@ -2,7 +2,7 @@

 ## [UNRELEASED]

-No user facing changes.
+- The `init` step of the Action now supports `ram` and `threads` inputs to limit resource use of CodeQL extractors. [#738](https://github.com/github/codeql-action/pull/738)


Should this mention that the analyze action will default to the settings from the init action?

That is a good point—I added that to the changelog entry.

analyze/action.yml

src/runner.ts

adityasharad · 2021-10-21T00:29:40Z

analyze/action.yml

@@ -18,7 +18,9 @@ inputs:
    required: false
    default: "brutal"
  ram:
-    description: Override the amount of memory in MB to be used by CodeQL. By default, almost all the memory of the machine is used.
+    description: >-


Since the input descriptions here are going to vary between the two actions, we should add an exception to the custom query that's producing the code scanning warning on this file.

I edited queries/inconsistent-action-input.ql to exclude inputs named ram or threads from the query.

adityasharad

Nice! Thanks for the clear explanations and option descriptions.

aeisenberg

Generally good. Just a suggestion on the input description.

aeisenberg · 2021-10-21T22:49:24Z

analyze/action.yml

+      The number of threads that can be used by CodeQL for database finalization and query execution.
+      By default, this action will use the same number of threads as previously set in the "init" action.
+      If the "init" action also does not have an explicit "threads" input, this action will use all the
+      hardware threads available in the system.


Maybe good to specify what the default values are. Something like this?

Suggested change

hardware threads available in the system.

hardware threads available in the system. The default runners on GitHub Actions have two cores.

Yes, that would indeed be very useful to users. I added the default value for each platform to the description. Thank you for the suggestion!

aeisenberg · 2021-10-21T22:49:55Z

init/action.yml

+  threads:
+    description: >-
+      The number of threads that can be used by CodeQL extractors.
+      By default, CodeQL extractors will use all the hardware threads available in the system.


Same comment as above.

I added the default value for each platform to the description. Thank you for the suggestion!

aeisenberg · 2021-10-22T16:15:31Z

Not sure why the update dependencies workflow failed. I'm triggering it again by adding the label.

cklin · 2021-10-22T16:17:49Z

Not sure why the update dependencies workflow failed. I'm triggering it again by adding the label.

What I was puzzling over was why that workflow ran in the first place when the Update dependencies label was absent. Shouldn't the absence of the label prevent the workflow from running in the first place?

aeisenberg · 2021-10-22T16:31:41Z

Shouldn't the absence of the label prevent the workflow from running in the first place?

This workflow runs on:

  pull_request_target:
    types: [opened, synchronize, reopened, ready_for_review, labeled]

so, it should run in this case, only it should be a noop.

edoardopirovano · 2021-10-22T16:37:28Z

Shouldn't the absence of the label prevent the workflow from running in the first place?

This workflow runs on:
  pull_request_target:
    types: [opened, synchronize, reopened, ready_for_review, labeled]
so, it should run in this case, only it should be a noop.

Hmm, something definitely did go wrong in this run, though, because it continued and failed even though there was no label: https://github.com/github/codeql-action/actions/runs/1372901091

I think I must've broken something in #789...

cklin · 2021-10-22T16:37:36Z

Shouldn't the absence of the label prevent the workflow from running in the first place?

This workflow runs on:
  pull_request_target:
    types: [opened, synchronize, reopened, ready_for_review, labeled]
so, it should run in this case, only it should be a noop.

I was more wondering why

codeql-action/.github/workflows/update-dependencies.yml

Line 10 in a35be9c

    
           if: contains(github.event.pull_request.labels.*.name, 'Update dependencies') && ${{ github.event.pull_request.head.repo.full_name == 'github/codeql-action' }}

did not inhibit the update job, which I would have expected it to when the Update dependencies label was absent.

edoardopirovano · 2021-10-22T16:42:17Z

Let's see if #790 fixes it... It's annoying there's no way to test that workflow in PRs because it runs from main.

cklin · 2021-10-22T16:42:36Z

Hmm, something definitely did go wrong in this run, though, because it continued and failed even though there was no label: https://github.com/github/codeql-action/actions/runs/1372901091

It looks like for the if conditional you should either omit ${{ ... }} entirely or use it to cover the entire conditional.
https://docs.github.com/en/actions/learn-github-actions/expressions

cklin force-pushed the cklin/extractor-ram-threads-options branch 14 times, most recently from 82fa535 to 0606941 Compare September 14, 2021 21:06

cklin marked this pull request as ready for review September 14, 2021 21:54

cklin requested a review from a team as a code owner September 14, 2021 21:54

edoardopirovano reviewed Sep 15, 2021

View reviewed changes

cklin force-pushed the cklin/extractor-ram-threads-options branch from 0606941 to 5b67955 Compare September 15, 2021 15:52

cklin requested a review from edoardopirovano September 15, 2021 15:53

cklin marked this pull request as draft September 22, 2021 15:14

cklin force-pushed the cklin/extractor-ram-threads-options branch from 5b67955 to e5d6137 Compare October 12, 2021 21:44

cklin force-pushed the cklin/extractor-ram-threads-options branch from e5d6137 to 9a31e5b Compare October 19, 2021 22:03

cklin marked this pull request as ready for review October 19, 2021 23:39

cklin requested a review from adityasharad October 19, 2021 23:39

edoardopirovano reviewed Oct 20, 2021

View reviewed changes

cklin force-pushed the cklin/extractor-ram-threads-options branch from b1f2597 to 5533fe9 Compare October 20, 2021 18:23

cklin requested a review from edoardopirovano October 20, 2021 18:31

adityasharad reviewed Oct 21, 2021

View reviewed changes

cklin force-pushed the cklin/extractor-ram-threads-options branch from 5533fe9 to 69cc9ec Compare October 21, 2021 22:20

adityasharad approved these changes Oct 21, 2021

View reviewed changes

aeisenberg approved these changes Oct 21, 2021

View reviewed changes

cklin force-pushed the cklin/extractor-ram-threads-options branch from 69cc9ec to 36e3455 Compare October 21, 2021 23:34

aeisenberg approved these changes Oct 22, 2021

View reviewed changes

cklin force-pushed the cklin/extractor-ram-threads-options branch from 36e3455 to c1a269b Compare October 22, 2021 15:58

aeisenberg added the Update dependencies Trigger PR workflow to update dependencies label Oct 22, 2021

github-actions bot removed the Update dependencies Trigger PR workflow to update dependencies label Oct 22, 2021

cklin force-pushed the cklin/extractor-ram-threads-options branch from c1a269b to aa96b93 Compare October 28, 2021 22:08

Add RAM and threads options to init action

70b730e

cklin force-pushed the cklin/extractor-ram-threads-options branch from aa96b93 to 70b730e Compare October 28, 2021 22:10

cklin merged commit 4293754 into main Oct 28, 2021

cklin deleted the cklin/extractor-ram-threads-options branch October 28, 2021 22:38

This was referenced Nov 1, 2021

Merge main into v1 #802

Closed

Merge main into v1 #805

Merged

		@@ -53,11 +55,19 @@ function getToolsDir(userInput: string \| undefined): string {

		const codeqlEnvJsonFilename = "codeql-env.json";

		function loadTracerEnvironment(config: Config): any {

	hardware threads available in the system.
	hardware threads available in the system. The default runners on GitHub Actions have two cores.

Add RAM and threads options to init action #738

Add RAM and threads options to init action #738

Conversation

cklin commented Sep 13, 2021 • edited Loading

Merge / deployment checklist

edoardopirovano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cklin commented Sep 15, 2021

edoardopirovano commented Sep 15, 2021

adityasharad commented Sep 15, 2021

edoardopirovano commented Sep 15, 2021 • edited Loading

adityasharad commented Sep 15, 2021

cklin commented Sep 15, 2021

edoardopirovano commented Sep 15, 2021

adityasharad commented Sep 15, 2021

cklin commented Oct 19, 2021

edoardopirovano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edoardopirovano commented Oct 21, 2021

adityasharad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adityasharad left a comment

Choose a reason for hiding this comment

aeisenberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aeisenberg commented Oct 22, 2021

cklin commented Oct 22, 2021

aeisenberg commented Oct 22, 2021 • edited Loading

edoardopirovano commented Oct 22, 2021

cklin commented Oct 22, 2021

edoardopirovano commented Oct 22, 2021

cklin commented Oct 22, 2021

cklin commented Sep 13, 2021 •

edited

Loading

edoardopirovano commented Sep 15, 2021 •

edited

Loading

aeisenberg commented Oct 22, 2021 •

edited

Loading