Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: getHashedStaticPath, append hashes to URLs for cache busting #431

Closed
wants to merge 110 commits into from

Conversation

gurgunday
Copy link
Member

@gurgunday gurgunday commented Feb 1, 2024

This in my opinion can be a killer feature — it essentially lets the user set a very high maxAge by decorating the Fastify instance with a method that returns the file's path plus a unique hash that identifies its content

If an asset's content changes, it will still be referenceable via the same relative path while the client will see a new hash, which means the cache will bust

@gurgunday gurgunday marked this pull request as draft February 1, 2024 15:11
@gurgunday gurgunday marked this pull request as ready for review February 1, 2024 19:23
Copy link
Member

@jsumners jsumners left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand things correctly, you're building an unbounded Map of hashes to resolved file paths?

index.js Outdated Show resolved Hide resolved
@gurgunday gurgunday requested a review from a team February 1, 2024 21:29
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
@gurgunday gurgunday requested a review from mcollina February 5, 2024 15:43
Copy link
Member

@airhorns airhorns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offhand, do you know why folks seem to go for URLs like /assets/index-deadbeef.css instead of /assets/index.css?hash=deadbeef ? The latter seems like it'd be a bit easier to work with, since the unhashed and hashed paths would be the same.

README.md Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
lib/hash.js Outdated Show resolved Hide resolved
lib/hash.js Show resolved Hide resolved
await server.register(fastifyStatic, {
root: new URL('./public', import.meta.url).pathname,
prefix: '/assets/',
hash: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be annoying about names, this is take-it-or-leave-it feedback, but just reading this code without reading the readme, I feel like the option hash: true could mean a variety of different things. I'd probably guess it means "
"add an e-tag header", not that it allows fancy path building for CDN busting hashes.

types/index.d.ts Outdated Show resolved Hide resolved
@Uzlopak
Copy link
Contributor

Uzlopak commented Feb 5, 2024

I am seeing this PR in my notifications for days and did not comment on it, because I first did not understand the purpose. Also because I was not requested as reviewer I did not invest time to understand it. But I think that I get it now, despite not knowing what the current state of this PR is. So if you implemented it somehow and I claim something wrong, then dont get sad, because I did not read the code.

Somehow I think i saw this technique like 15 years ago, where we did this in php. So basically you reference a file by a hash and expose it via html or so. The browser caches the file very long reducing loading the file again and again. I guess comparable with webpacks bundled files, where it is adding a hash to the filename. Which is actually better than just having a hash, as you still can now what file you are loading.

Actually I think this is a neat function, but it is problematic at startup as it has potentially to hash alot of files for potential nothing? If I would implement it, I would write it in a way, that the call to getHashedPath will result in a hashing of the specified file and store it in a lookup or how you implement it. But the issue here is, that you would need to hash the necessary file at server start.
So maybe it should have a prepare functionality, where you pass all files you consider will be relevant for this feature. Maybe also register directly a route for those files. and then calling getHashedPath should only look up the path and if it does not have that file, it should fallback to the original file or throw an error.

Like I said already, I did not read your code, but this is what I think...

@gurgunday
Copy link
Member Author

Thanks for the input, @Uzlopak

Yeah, I didn't request a review because I know you're busy improving node and undici these days, didn't want to prevent you from doing that ;)

And I do apologize for this PR's slow and constantly pingy state, I didn't handle it in the best way since it started with an aha moment 🤣

The premise is very simple: we want to use a very long maxAge so that the server is pinged less, and even better, if we're using a solution like Cloudflare, we want the assets to get cached for life for speedy delivery

However, when we modify files, we want that to be reflected and so we append a hash to them (like what other tools do)

Because the startup time will increase when this feature is used, I tried adding a way to pregenerate the hashes (which is what took a little while to implement in an acceptable way and kept pinging the watchers)

Anyway, I'm doing a few more touches and the PR will be fully ready for review, then I was thinking of requesting one from you as well

@gurgunday
Copy link
Member Author

gurgunday commented Feb 5, 2024

Offhand, do you know why folks seem to go for URLs like /assets/index-deadbeef.css instead of /assets/index.css?hash=deadbeef ? The latter seems like it'd be a bit easier to work with, since the unhashed and hashed paths would be the same.

I cannot do ?hash= because it affects route registry, however appending to the end will be faster actually, you are right

If we don't modify anything, it is actually doable and it LGTM honestly

PTAL

@gurgunday gurgunday changed the title feat: hash assets feat: hash, generate hashed routes for each asset Feb 5, 2024
@gurgunday
Copy link
Member Author

gurgunday commented Feb 6, 2024

Applied all your suggestions except changing the name of the option (hash), as I don't know what else we can call it

@gurgunday gurgunday requested a review from airhorns February 6, 2024 10:54
@gurgunday gurgunday changed the title feat: hash, generate hashed routes for each asset feat: getHashedAsset, append hashes to URLs for cache busting Feb 6, 2024
@gurgunday gurgunday changed the title feat: getHashedAsset, append hashes to URLs for cache busting feat: getHashedStaticPath, append hashes to URLs for cache busting Feb 6, 2024
@gurgunday
Copy link
Member Author

@mcollina now incremental adoption is possible as assets are both referenceable by their original paths and with their hash using the decorated method

@gurgunday gurgunday requested a review from Uzlopak February 6, 2024 20:00
@airhorns
Copy link
Member

airhorns commented Feb 6, 2024

Ah I remembered one reason why folks put the hash in the filename instead of as a query param: it convinces CDNs that the resources really are different, as some of them are configured by default to ignore query strings. Cloudflare by default respects the query string, but I can imagine some folks use UTM params and whatnot that they really do want ignored from a caching perspective. Not sure exactly what that means we should do. What do webpack / vite do by default? https://developers.cloudflare.com/cache/how-to/set-caching-levels/

@gurgunday
Copy link
Member Author

gurgunday commented Feb 6, 2024

Yeah, but normally this should indeed bust the cache in terms of standards behavior:

The "cache key" is the information a cache uses to choose a response and is composed from, at a minimum, the request method and target URI used to retrieve the stored response; the method determines under which circumstances that response can be used to satisfy a subsequent request. However, many HTTP caches in common use today only cache GET responses and therefore only use the URI as the cache key.

The URI includes the query parameter, so it should be fine in modern browsers

The added benefit of doing it this way is that we can incrementally adopt this option because the URLs of the assets stay the same, just the hashed versions returned by the decorated function are different

@gurgunday
Copy link
Member Author

I released https://github.com/gurgunday/hasset, which scans the assets folder and auto replaces the occurrences of the files with their hashed versions

I no longer see a raison d'etre for this PR, feel free to continue the work if interested

@gurgunday gurgunday closed this Feb 7, 2024
@mcollina
Copy link
Member

mcollina commented Feb 7, 2024

That's very close to the path I recommended you to take :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants