Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: replace kPathSeparator with std::filesystem #53063

Merged
merged 1 commit into from
Jun 30, 2024

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented May 20, 2024

We don't need kPathSeparator anymore since std::filesystem::path::preferred_separator exists.

cc @nodejs/cpp-reviewers @nodejs/fs

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/security-wg

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels May 20, 2024
src/path.cc Outdated Show resolved Hide resolved
src/path.cc Outdated Show resolved Hide resolved
src/path.h Outdated Show resolved Hide resolved
@anonrig anonrig force-pushed the remove-kpathseparator branch from 85b1a5a to 3b9bb49 Compare May 28, 2024 01:12
@anonrig anonrig requested review from jasnell, targos and atlowChemi May 28, 2024 01:12
src/node_report.h Outdated Show resolved Hide resolved
src/node_file.h Outdated Show resolved Hide resolved
src/inspector_profiler.h Outdated Show resolved Hide resolved
src/env.h Outdated Show resolved Hide resolved
@targos
Copy link
Member

targos commented May 28, 2024

Error on Windows:

D:\a\node\node\src\compile_cache.cc(208,47): error C2676: binary '+': 'std::string' does not define this operator or a conversion to a type acceptable to the predefined operator [D:\a\node\node\libnode.vcxproj]

@anonrig anonrig force-pushed the remove-kpathseparator branch 5 times, most recently from 699210a to deef67e Compare June 13, 2024 19:30
Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once CI is green

@anonrig anonrig added request-ci Add this label to start a Jenkins CI on a PR. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Jun 13, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jun 13, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

return res + "*";
}
return res + node::kPathSeparator + "*";

return (std::filesystem::path(res) / "*").string();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test it, but from std::filesystem::path::operator/= it might cause an unexpected replacement:

// Where "//host" is a root-name
path("//host")  / "foo" // the result is      "//host/foo" (appends with separator)
path("//host/") / "foo" // the result is also "//host/foo" (appends without separator)
 
// On POSIX,
path("foo") / ""      // the result is "foo/" (appends)
path("foo") / "/bar"; // the result is "/bar" (replaces)

Refs: https://en.cppreference.com/w/cpp/filesystem/path/append.

Possibly, the "*" append won't cause any error, but I would be more confident if we use path::preferred_separator here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can replace it, but FYI, we can use concat and it will always work correctly.

#include <iostream>
#include <filesystem>

int main() {
  auto path = std::filesystem::path("foo").concat("/bar");
  printf("path -> %s\n", path.string().c_str());
  return 0;
}

Prints:

path -> foo/bar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it would cause more CPU cycles though. IMO the + concat is easier

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm changing this. Thank you for the review.

Copy link
Member Author

@anonrig anonrig Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there has been a complication. On Windows, preferred_separator returns "wchar_t*". Therefore, it is not possible to make a simple addition of wchar_t and std::string.

The solution to fix this is:

#ifdef _WIN32
      // This is required for Windows since prefered separator is "wchar_t".
      auto separator = std::filesystem::path::preferred_separator;
      return res + std::string(separator.begin(), separator.end()) + "*";
#else
      return res + std::filesystem::path::preferred_separator + "*";
#endif
    }

But since we are talking about potential solutions, I thought what about just converting this whole function to use std::filesystem, rather than libuv, and make the whole code a 5 line function?

std::string WildcardIfDir(const std::string& res) noexcept {
  auto path = std::filesystem::path(res);
  auto file_status = std::filesystem::status(path);
  if (file_status.type() == std::filesystem::file_type::directory) {
    path /= "*";
  }
  return path.string();
}

I personally prefer the std::filesystem, but wanted to hear @RafaelGSS @tniessen and @jasnell's thoughts on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I think the first change is better. I think we'll want to carefully consider switching too much to std::filesystem until we can be certain it's implemented consistently across all the platforms we support (unless you're thinking of using this variation only on Windows)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not change it to the former either as std::filesystem is very likely to perform syscalls and the Permission Model operates on non-existing paths and it would be more difficult to handle. That's why I used std::string_view

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig requested a review from mcollina June 19, 2024 22:49
@anonrig anonrig force-pushed the remove-kpathseparator branch from 118b263 to 0c5afb1 Compare June 21, 2024 19:52
@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Jun 21, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jun 21, 2024
@nodejs-github-bot
Copy link
Collaborator

auto path = std::filesystem::path(res);
auto file_status = std::filesystem::status(path);
if (file_status.type() == std::filesystem::file_type::directory) {
path /= "*";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I previously mentioned ::operator/ might cause some unexpected results.

@aduh95
Copy link
Contributor

aduh95 commented Jun 30, 2024

What's the status here? Is this ready to land?

@anonrig
Copy link
Member Author

anonrig commented Jun 30, 2024

What's the status here? Is this ready to land?

If @RafaelGSS's comment is a non-blocker, it is ready to land.

Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments aren't non-blocking. I can change it later if needed. LGTM.

@anonrig anonrig added the commit-queue Add this label to land a pull request using GitHub Actions. label Jun 30, 2024
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Jun 30, 2024
@nodejs-github-bot nodejs-github-bot merged commit c1dc307 into nodejs:main Jun 30, 2024
54 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in c1dc307

aduh95 pushed a commit that referenced this pull request Jul 12, 2024
PR-URL: #53063
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Stephen Belanger <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
@aduh95 aduh95 mentioned this pull request Jul 12, 2024
aduh95 pushed a commit that referenced this pull request Jul 16, 2024
PR-URL: #53063
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Stephen Belanger <[email protected]>
Reviewed-By: Rafael Gonzaga <[email protected]>
@joyeecheung
Copy link
Member

joyeecheung commented Sep 18, 2024

This PR seems to be mostly incorrect - it's passing utf-8 encoded chars into the std::filesystem::path constructors directly, which on Windows, expect the chars to be encoded in UTF-16 (so they suffer from the same bug as #54476)

@joyeecheung
Copy link
Member

joyeecheung commented Sep 19, 2024

Another regression came up: #54991

I think we should consider reverting most of the changes here and other similar std::filesystem::path changes because:

  1. Most of the times, the path content comes from UTF8 encoded data in a char array - either some environment variable, or CLI flag, or converted from a JS string/buffer
  2. In most cases, the path will end up being passed into libuv, which also expects paths encoded in UTF8

Now on POSIX systems, it's fine to pretend that std::string/char[] and std::filesystem::path can be converted from each other automatically, but on Windows the native encoding is UTF-16 which means in 1, they must be converted into std::u8string/char8_t before being passed into the std::filesystem::path constructor (which leads to an overhead in UTF8 -> UTF16 transcoding), and in 2, they must be converted from path.u8string() into std::string/char[] (which leads to another UTF16 -> UTF8 transcoding) and should not be just path.string(). This not only incurs unnecessary overhead, but is also very easy to get overlooked and lead to the crashes that have been reported recently.

I think we should just keep it simple and prefer to always use std::string/char to carry UTF-8-encoded data for paths, unless it's a path it will later be passed into other std::filesystem APIs.

@richardlau
Copy link
Member

From a Release POV Node.js 22 is due to become LTS next month (29 Oct) which likely means we'll only have at most two more current releases (possibly only one) for Node.js 22. The latest a Node.js 22 current release can go out is 15 Oct because we want "two weeks in current" (usual Release WG LTS policy) for commits in LTS (the 29 Oct release).

I'm +1 for reverting the changes.

@anonrig
Copy link
Member Author

anonrig commented Sep 19, 2024

I'm +1 on reverting as well. It has done more harm than good.

@anonrig
Copy link
Member Author

anonrig commented Sep 19, 2024

I'll open a PR to revert it soon.

@joyeecheung
Copy link
Member

By the way, I think we can still keep the original intent of the replacement, just do it as constexpr char kPathSeparator = std::filesystem::path::preferred_separator and drop the ifdefs (I am not 100% sure this is equivalent, so up to you if you want to just revert it as-is).

@targos targos added the dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. label Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants