Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix unicode problems with verbose log #1165

Closed
wants to merge 2 commits into from

Conversation

IIIMADDINIII
Copy link

Running under Windows 11 and using Powershell i noticed, that unicode characters are not displayed correctly when using verbose logging.
For example the "✔" Symbol for the verbose doen message is displayed like "Ô£ö" on my system.
I noticed in the NodeJs Source code that process.stderr.write and writeFileSync are using different subsystems and it seems that writeFileSync is not playing nicely with terminals.
This Pull Request fixes the Problem for me.

@ehmicky
Copy link
Collaborator

ehmicky commented Oct 27, 2024

Hi @IIIMADDINIII,

Which character encoding are you using in Powershell?

The write needs to happen synchronously. Otherwise lines might be out-of-order. For example, in verbose mode, the command needs to be printed before its output. When the write is done asynchronously, there is a chance the subprocess stdout will start being printed before the command is printed on stderr.

Unfortunately, process.stderr.write() is not guaranteed to be synchronous. This is the reason behind using fs.writeFileSync(2, ...).

Now it does seem that process.stderr is using node:tty when connected to a terminal, while fs.writeFileSync() is not. This could impact how Node.js handles some unusual environments.

But whatever the solution is, it needs to guarantee that logs are printed in order. This PR would break this for the other users.

@IIIMADDINIII
Copy link
Author

I Use UTF8:

PS > [System.Text.Encoding]::Default.EncodingName
Unicode (UTF-8)

I Understand that the output of logs need to stay in order.
I now tested it in VSCode, Terminal App and git mingw console which all show the problem.
I suspect that writeFileSync each byte is handles on its own, while with process.stderr.write the terminal understands there is more coming and should still wait for more data.

Looking through the NodeJs Sources it seems that process.stderr.write is sync as long it is pointing to console on windows.
The underlying StreamBase Write only starts an async write if it can not write it within one write operation sream_base.cc.
On Windows with an console stream it always ends up at uv__tty_try_write. It always returns that all bytes have been written if no error happens.

I also tried to call process.stderr.write with a huge string (100Mb) and it still returns true signaling it was able to write all data without filling a buffer and needing to wait until all data is drained (Node Docu).
On linux this returns false with large strings.

Would it be ok to only use process.stderr.write when running under windows and process.stderr.isTTY returns true?
I did not notice any logs which are printed out of order with this fix on windows.

@ehmicky
Copy link
Collaborator

ehmicky commented Oct 27, 2024

Thanks for looking deeper into this.

Looking through the NodeJs Sources it seems that process.stderr.write is sync as long it is pointing to console on windows.

The Node.js documentation seems to indicate the opposite.

process.stdout and process.stderr differ from other Node.js streams in important ways:
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX

Would it be ok to only use process.stderr.write when running under windows and process.stderr.isTTY returns true?
I did not notice any logs which are printed out of order with this fix on windows.

Lines being printed out-of-order happens on a race condition. In most cases, it actually does not happen. We only found out the problem in the automated tests since they are running hundreds of tests at once, which creates some contention on the stdout/stderr writing underlying logic and buffering. In particular, it happened on Windows since GitHub actions are slower with that OS. So lines can be printed out-of-order on Windows, even though it's hard to reproduce.

Also, since verbose logs always print on stderr, this only happens when the subprocess is printing to stdout and using stdout: 'inherit'.

The original PR and issue for this problem: #600, #598

For example, the following might reproduce it, if some command line like $ node -p index is printed after the subprocess output (which would be that index).

import {execa} from 'execa';

await Promise.all(Array.from({length: 1000}, (_, index) =>
  execa({verbose: 'full', stdout: 'inherit'})`node -p ${index}`));

For example the "✔" Symbol for the verbose doen message is displayed like "Ô£ö" on my system.

Going to the original problem, it does look to me there might some character encoding problem there. is U+2714 codepoint. In UTF-8, those are 3 bytes: 0xe2, 0x9c, 0x94. The above three characters might be representations of those bytes in a different character encoding. I checked with Latin1 and Windows-1252 and it does not seem to match, but it could be a different one. I understand your system is showing UTF8, but this does make me wonder whether another character encoding might be involved there?

That being said, it's also very possible that the problem is due to writeFileSync() not using node:tty internally, as you suggest. Looking at the underlying logic used by node:tty, it does seem to do a bunch of UTF-8 to UTF-16 conversion. That logic is only present on Windows, not on Unix (since Unix uses UTF-8 not UTF-16).

It can be argued that writing weird characters on some Windows setup is a bigger problem than printing the command line before the first stdout line (in some specific instances). But ideally, it'd be nice to fix both problems, instead of replacing one by the other.

@IIIMADDINIII
Copy link
Author

I just learned some new stuff. Powershell has more than one encoding setting.
I Found out that the "Ô£ö" is corresponding to Codepage 850 which my system is as ([Console]::OutputEncoding)[https://learn.microsoft.com/en-us/dotnet/api/system.console.outputencoding?view=net-8.0] (Encoding which is used to output to the console). This is different to the $OutputEncoding setting (which is utf8) which is used between powershell and nodejs.
If i set [Console]::OutputEncoding to utf8 the checkmark is displayed correctly.
Changing the system default encoding to utf8 is not recommended, because many programs still use the old behaviour.
As i understood, when an app is utf8 capable, SetConsoleOutputCP should be used to set the console to utf8, but this is not exposed by nodejs.
Using the process.stdout.write uses ConsoleWriteW wich allways uses UTF-16LE to write to the Console independent of the current [Console]::OutputEncoding.

The tests in #600 use a pipe and not tty to write the data. so my proposed change would not influence this.
You can test the type by reading process.stderr._type.
The pipes also have a problem with encoding when the codepage is not utf8 but that is not for now.
I have run your suggested test without a problem. Also i have made some variations (example: increasing log output so to force it to take longer).
The node -p index verbose output is always printed before command output.
I think the nodejs documentation is not correct or outdated.

I understand if you do not want to merge this solution because it is a bit hacky.
But i am pretty certain that process.stderr.write is sync when process.platform === 'win32' && process.stderr.isTTY.
I will also look into a possible solution from the side of nodejs.
Thank you for your input to diagnose this problem further.

@ehmicky
Copy link
Collaborator

ehmicky commented Oct 28, 2024

I Found out that the "Ô£ö" is corresponding to Codepage 850 which my system is as

I should have checked this one too! I actually used to write about this.

The tests in #600 use a pipe and not tty to write the data. so my proposed change would not influence this.

Also, our test runner Ava is actually not running in a TTY (process.stderr.isTTY is false), so it's always a little tricky to test TTY-specific code with it.


Thanks for all this additional work.

We actually do not have a specific automated test (despite have 5000 of them :) ) that ensures verbose logs are never printed out-of-order. We were just relying on the fact that some automated tests used to randomly fail.

I just tried adding such a test, and it made me realize using synchronous I/O (through fs.writeFileSync()) actually does not seem to solve the out-of-order problem. So, considering it creates issues with Windows and is poorer performance-wise, there's no reason not to always use process.stderr.write() instead (or even better IMHO, console.warn()), even when not on Windows, or when not in a TTY.

Unfortunately, that means this out-of-order problem is still there, but we haven't received any bug report about it so far, so I guess its race condition only triggers under very specific circumstances (such as running lots of subprocesses in parallel at once, while using both verbose and stdout: 'inherit'). Based on that, I have opened #1167, which should solve everything. Should we close this PR in favor of #1167?

@IIIMADDINIII
Copy link
Author

I tested #1167 on my system, and it works as expected.
Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants