Skip to content
This repository has been archived by the owner on Jan 5, 2024. It is now read-only.

Add 'cache: true' option to enable caching #11

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

omelkonian
Copy link
Contributor

  • Caches output.cabal-store and ./dist-newstyle

  • Currently hashFiles is not exposed in the actions/cache API,
    hence the use of folder-hash.hashElements

@omelkonian omelkonian marked this pull request as draft December 18, 2020 11:11
@hazelweakly
Copy link
Collaborator

I like it so far! If we're going to have a generic cache: true option, it should work for stack as well as cabal out of the box.

For stack, output.stack-root would be what was cached (which I just realized I never properly put inside the action.yml file... sigh).

Then I suppose there's the question of what sort of API to expose to allow people to customize things. Something that will likely be important is doing the minimal amount of work while not being useless about it. (So, for example, running cabal freeze in CI is something I see people do a lot. I don't understand it because it seems to completely defeats the purpose of a freeze file; does this not break repositories where freeze files are checked in?).

I'll comment on the PR with a few more specific thoughts. Thanks for doing a lot of work on this!

setup/action.yml Outdated
@@ -23,6 +23,9 @@ inputs:
stack-setup-ghc:
required: false
description: 'If specified, enable-stack must be set. Will run stack setup to install the specified GHC'
cache:
required: false
description: 'If specified, automatically caches the cabal-related folders.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A catch-all cache option should work for all the build tools the action supports, so having the description reflect that would be less confusing.

@@ -4,6 +4,8 @@ import {getOpts, getDefaults, Tool} from './opts';
import {installTool} from './installer';
import type {OS} from './opts';
import {exec} from '@actions/exec';
import * as c from '@actions/cache';
import {hashElement} from 'folder-hash';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be better to use folder-hash or to copy in the exact code that the github runner uses for its hashFiles function? I'm leaning towards the latter since it would (in theory) preserve hash compatibility if that code ever gets migrated/exposed in actions/toolkit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that sounds more robust, I have copied hashFiles, but had to change it slightly to take normal arguments.

if (!opts.stack.enable) await exec('cabal update');
});
if (opts.cabal.enable) core.info('Loading cache...');
await exec('cabal', ['freeze'], {silent: true});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to want to run cabal freeze by default. Perhaps we should check for a cabal.project.freeze and a stack.yaml.lock by default, but allow users to override this by specifying an option like cache-keys that works exactly as key + restore-keys does on actions/cache now.

Then we can just split on newlines for the cache-keys and keep the standard os-$ghc-$hash-$sha logic as the default. This also has the benefit of being.

We will also probably want to support something like cache-paths in case some people don't want to cache dist-newstyle in addition to cabal-root (or don't want to cache stack-root, or...)

if (!opts.stack.enable) await exec('cabal update');
});

if (cacheHit && cacheHit != keys[0]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is != intentional here? (Just curious since I've got strict equality burned into my brain at this point)

  * Support both cabal and stack

  * Copy `hashFiles` until exposed exposed in the actions/cache API

  * Saving cache as a post-script

  * Additional action inputs: `cache-paths` and `cache-keys`

  * Add `output.stack-root` in `action.yml`
const stackSetupGhc = (inputs['stack-setup-ghc'] || '') !== '';
const stackEnable = (inputs['enable-stack'] || '') !== '';
const isEnabled = (s: string): boolean => (inputs[s] || '') !== '';
const readList = (s: string): string[] => (inputs[s] || '').split('\n');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we filter empty lines out here? I think (but haven't verified) that

key: |
 a
 b

would result in ['', 'a', 'b'] otherwise.

core.setOutput('stack-root', stackRoot);
if (os === 'win32') core.exportVariable('STACK_ROOT', 'C:\\sr');

if (opts.cache) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably want to be fairly careful here. Is there a plan for how the caching behavior should work if the defaults aren't overridden vs if the defaults are overridden?

In particular, I'm not sure it's completely intuitive to have "enabling the cache" result in running commands or otherwise creating files/directories you might not otherwise expect.

I also don't actually know how globbing and then accessing the first item of the array works if there's no stack.yaml.lock file. Does it throw an error? Would the **/stack.*.lock potentially grab way too many files and/or the wrong file(s) if you had git submodules checked out and one of your submodules had a stack.yaml.lock file but your stack project did not?


if (opts.cache) {
const matches = await glob
.create('**/cabal.*.freeze')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd question: How does this work if

  1. A haskell project uses cabal and the project is both an exe and a library.
  2. Said project has a cabal.project.freeze file as per exe recommendations, but names it something different by default so that building with cabal build can be tested to work if the project is used as a library.

What if the project does not want to cache the dist-newstyle directory and only wants to cache the cabal store? It's a reasonable want for some projects. Examples include: for a while, macOS was suffering dylib issues when caching dist-newstyle and so I had to turn off caching dist-newstyle until that was resolved; it can take up a lot of extra space with not much extra benefit, especially if you're using submodules for dependencies instead of "vendoring" them with a cabal.project file (you pay the penalty of having the submodules bloat dist-newstyle as well as not having mtime work well so you frequently have to rebuild all of them anyway even if it is cached; but if you use cabal.project to vendor them, they get built and stored in the cabal.store, making caching dist-newstyle more effective).

dist-newstyle is also not guaranteed to remain the same name forever. So there's that...

@larskuhtz
Copy link

(So, for example, running cabal freeze in CI is something I see people do a lot. I don't understand it because it seems to completely defeats the purpose of a freeze file; does this not break repositories where freeze files are checked in?).

One reason to run cabal freeze in CI is to include the freeze file in the artifacts for reproducible builds and documentation purposes. So, one builds and tests against the most recent versions from Hackage, but one can roll back to the latest working state if something breaks.

@lisanna-dettwyler
Copy link
Contributor

(So, for example, running cabal freeze in CI is something I see people do a lot. I don't understand it because it seems to completely defeats the purpose of a freeze file; does this not break repositories where freeze files are checked in?).

One reason to run cabal freeze in CI is to include the freeze file in the artifacts for reproducible builds and documentation purposes. So, one builds and tests against the most recent versions from Hackage, but one can roll back to the latest working state if something breaks.

@larskuhtz That does sound really useful! It could also allow for freeze files to not be used by default unless the build fails, then it's retried with the last known good one. If it's a pull request it could offer to update the file in the branch too if building from a fresh one works. And it's easy to check if there's a freeze file present already and not mess with it.

Comment on lines +99 to +104
keys: opts.cacheKeys || [
keys.join('-'),
keys.slice(0, 4).join('-') + '-',
keys.slice(0, 3).join('-') + '-',
keys.slice(0, 2).join('-') + '-'
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be careful about the caching growing unbounded as time goes on. More info here: https://markkarpov.com/post/github-actions-for-haskell-ci.html#comment-6035135802

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is designed to grow unbounded because github actions automatically limits it to the last 5G in size. It might theoretically make sense to let people override the key but if they want a very specific set of behavior, they'll probably end up writing their own usage of caching anyway.

@andreasabel andreasabel changed the title Add 'cabal: true' option to enable caching Add 'cache: true' option to enable caching Dec 22, 2022
@andreasabel andreasabel added the re: cache Concerning caching label Dec 22, 2022
@andreasabel andreasabel mentioned this pull request Mar 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
re: cache Concerning caching
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants