Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add php.ini option to disable stat cache #17178

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

billynoah
Copy link

#5894 (comment)

... Just copy the changes from the diff. Make the RFC. See how you get on. Best of luck!

@billynoah
Copy link
Author

billynoah commented Dec 16, 2024

TBH, I have no idea what an RFC is in this context. I've recreated Kevin Lyda's branch based on the diff. If someone can point me in the right direction I can spend a bit more time on next steps.

@Girgias
Copy link
Member

Girgias commented Dec 16, 2024

An RFC in this context is a PHP RFC, that needs to be published on the wiki: https://wiki.php.net/rfc and discussed on the PHP internals mailing list.

Zend/zend_virtual_cwd.h Outdated Show resolved Hide resolved
Co-authored-by: Gina Peter Banyard <[email protected]>
@billynoah
Copy link
Author

billynoah commented Dec 16, 2024

An RFC in this context is a PHP RFC, that needs to be published on the wiki: https://wiki.php.net/rfc and discussed on the PHP internals mailing list.

That might be a bit over my head. Is it absolutely necessary for such a change? @bukka is this something you can help with?

@lyda
Copy link

lyda commented Dec 16, 2024

The PHP RFC process is described here: https://wiki.php.net/rfc/howto

@Girgias
Copy link
Member

Girgias commented Dec 16, 2024

@Crell maybe you want to write an RFC for this, considering you got bitten by it recently?

@Crell
Copy link
Contributor

Crell commented Dec 17, 2024

I don't know enough to say if the statcache is good or bad in 2025. I just want it documented better. And I'm generally not a fan of ini settings to begin with, as it just means one more hidden variable I have to keep track of, and code around.

So I'd be more in favor of removing it outright if it's shown to not actually be helpful, but I don't know if it's helpful.

@billynoah
Copy link
Author

billynoah commented Dec 17, 2024

My personal opinion is that if an app is making so many system calls that this cache is actually beneficial, then they can easily implement their own. For the rest of us it just gives unpredictable and often incorrect data. I'm not a fan of behind the scenes hidden caching that I can't turn off.

Since I am just looking for a way to turn it off, copying the diff from the other PR seemed like potentially the fastest way forward. If axing the stat cache is an better option how can we make that happen?

@lyda
Copy link

lyda commented Dec 17, 2024

Here's an example that shows the goofiness that is the stat cache. Note that it only caches a single path.

<?php

function all_the_stats($filename, $message) {
  $seen = false;
  if (@lstat($filename)) {
    print("lstat: $message.\n");
    $seen = true;
  }
  if (@stat($filename)) {
    print("stat: $message.\n");
    $seen = true;
  }
  if (!$seen) {
    print("all_the_stats: stat calls for $filename both failed.\n");
  }
}

$me = str_replace(getcwd(), ".", __FILE__);

print("test: touch kitten.\n");
system("touch kitten");
all_the_stats("kitten", "kitten exists");

print("\ntest: rm kitten.\n");
system("rm kitten");
all_the_stats("kitten", "kitten exists (it shouldn't)");

print("\ntest: stat /dev/null/nope.\n");
printf("stat(/dev/null/nope): %s.\n", @stat("/dev/null/nope")? "exists": "doesn't exist");
all_the_stats("kitten", "kitten exists (it shouldn't)");

print("\ntest: is_file $me.\n");
printf("is_file($me): %s.\n", @is_file($me)? "is true": "is false");
all_the_stats("kitten", "kitten exists (it shouldn't)");

print("\ntest: lstat $me.\n");
printf("lstat($me): %s.\n", @lstat($me)? "exists": "doesn't exist");
all_the_stats("kitten", "kitten exists (it shouldn't)");

Output:

test: touch kitten.
lstat: kitten exists.
stat: kitten exists.

test: rm kitten.
lstat: kitten exists (it shouldn't).
stat: kitten exists (it shouldn't).

test: stat /dev/null/nope.
stat(/dev/null/nope): doesn't exist.
lstat: kitten exists (it shouldn't).
stat: kitten exists (it shouldn't).

test: is_file ./test.php.
is_file(./test.php): is true.
lstat: kitten exists (it shouldn't).

test: lstat ./test.php.
lstat(./test.php): exists.
all_the_stats: stat calls for kitten both failed.

A call to is_file on a valid file won't clear the cache. Nor will a call to stat on a file that doesn't exist. But if you call stat for a valid file, the cache for kitten is cleared. This is... surprising... for users.

So yes, I'd vote to nuke it. But there was no consensus to just give users an option to remove this "feature".

@Crell
Copy link
Contributor

Crell commented Dec 17, 2024

The missing piece here would be benchmarks showing that it's not helpful, or not sufficiently helpful to justify the silliness it causes. Then an Internals thread discussion, followed by an RFC.

I have no idea how much pushback the idea would get; there would very likely be people asking about different OSes, OS versions, SSD vs HDD, etc. So coming in with robust benchmarks from the start would be very helpful.

@jimwins
Copy link
Member

jimwins commented Dec 17, 2024

It might help to go back and identify what problem(s) the stat cache was meant to address in the first place. My memory is that it hails from the era when we counted system calls as an important performance concern, so the cache helped with situations where you checked if a file existed but then wanted to access the mtime/ctime/etc. It's when the very performance-conscious thought about include vs. include_once. My gut feeling is that it has probably long outlived its purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants