-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the linter #3076
Comments
The linter does this:
3 is actually the quickest! It's the first two that are problematic. On the first the Linter as one huge docker image. This has positives and negatives:
The Mega-Linter is a fork that produces several Docker images so you can pick the smallest one that has all the linter you need and not have the DotNet linter that we don't care about. But I'd rather stick with the original GitHub-supported one. The other alternative is to install the individual linters. That would work fine for JS and Python ones since we have those set up (sqlfluff is python btw and we do have that in out requirements.txt), but means we have to figure out changed files ourselves (another thing the Super-Linter handles for us). On the second issue, our Repo is just too damn big. I suspect this is due to PDF changes on released. PDFs, as binary files, aren't really suited to Git. We have a lot of them and every release where we update the PDFs adds 300 MB to the repo as we check in new versions 😔 I don't think there is a way to have only the latest versions in git - we don't really need the history of these files in git. Nor do I think there is a way to purge out the old PDFs from git history (git never forgets!). To be honest, in hindsight, storing these PDFs in this git repo was a bad idea for this reason. Maybe should build them at release time and upload to our CDN? But we haven't figured out the security model for that yet. And even then can't purge out the old ones. Should look at soon though as only going to get worse and worse - especially as we add more PDFs.
I don't think that's possible with git? Though looks like we could potentially use the new-ish |
Not supported with GitHub Actions checkout yet: actions/checkout#680 😔 Not sure what the difference is between that and just using |
Looks like it's goes from 3500Mb down to 700Mb then - see fork. I see the predeploy workflow generates all the PDF anyway, so no need to store PDFs in repo. P.S. It would be really great to get this done for dev container building speed - #3404.
|
I would love to get rid of the PDFs from this repo. They are massive, and only increase with every release where we generate the PDFs (and we generate ALL the PDFs - even if we only need one). Unfortunately at the moment we generate the PDFs in a GitHub Action, add them to the repo, and then release from our PCs (where we upload them to our CDN). This is good because it means we don't need the PDF software on our machines. With your suggested change above, we lose that ability as the PDFs are permanently gone from the repo. I would be lovely if we could somehow mark a folder as only requiring the latest entry and not a history, but IIRC git doesn't allow you to do that? So options are:
|
Regarding the options mentioned:
Seems both 2 and 4 actually make sense: make generation and preview easily available and store action-generated files to a dedicated storage. |
Some conclusions for above discussion.
Assuming changes above, my estimate for the repo checkout step is <20s (down from ~3m). |
Oh that's a bit annoying that we have to do step 5 :-( I presume this will keep all the git history apart from the PDFs? |
You can see a preview of these steps in this fork: https://github.com/max-ostapenko/almanac.httparchive.org |
@max-ostapenko I was going through the steps in #3076 (comment) but after doing the
Is that expected and do I just need to re-add it? |
I don't remember this happening. Otherwise I think I would write it down. |
@tunetheweb yes, I missed it. Please add it manually and proceed. |
OK that's done now. It also seems to have pushed the other branches. Do we need to do the "Contact GitHub to remove cache" step? Or will the cache be flushed soon enough? |
It's what their documentation suggested ;) |
Supper quick now. Completes in about a minute. The testing of the website still takes time as we lint all the generated HTML and run Lighthouse, but still under 10mins. Thanks @max-ostapenko - great improvements! |
It takes several minutes to run the linter. Is there any way to speed that up?
It seems like a lot of time is spent checking out the entire almanac repo. Would it be possible/faster to only check out the files that changed?
The text was updated successfully, but these errors were encountered: