Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to open file with unicode name longer than 4 #3014

Closed
TroyDanielFZ opened this issue May 7, 2021 · 7 comments
Closed

Failed to open file with unicode name longer than 4 #3014

TroyDanielFZ opened this issue May 7, 2021 · 7 comments
Labels

Comments

@TroyDanielFZ
Copy link

ctags --options=NONE 比较长的测试文件.cpp

The file is encoded in cp936, and when the filename is short as 文件.cpp or long ascii name, it works well.
However, when the filename is in unicode, it fails. Sometimes for3,4 or longer names.
I can figure out why, and this maybe a bug?

@k-takata
Copy link
Member

k-takata commented May 7, 2021

Please use the issue template.
What is the ctags version? What is your OS? How do you get the executable?

@TroyDanielFZ
Copy link
Author

TroyDanielFZ commented May 8, 2021

Sorry for duplicate submission due to network issues.

The name of the parser:

The command line you used to run ctags:

ctags --options=NONE 比较长的测试文件.cpp
ctags --options=NONE 比较长的测试文.cpp
ctags --options=NONE 比较长的测试.cpp
ctags --options=NONE 比较长的测.cpp
ctags --options=NONE 比较长的.cpp
ctags --options=NONE 比较长.cpp
ctags --options=NONE 比较.cpp
ctags --options=NONE 比.cpp

All the input files are empty.

No output content, e.g. no file tags generated.

Outputs in the command prompt are:

e:\temp\>ctags --options=NONE 比较长的测试文件.cpp
ctags: Notice: No options will be read from files or environment
ctags: Warning: cannot open input file "姣旇緝闀跨殑娴嬭瘯鏂囦欢.cpp" : No such file or directory

e:\temp\>ctags --options=NONE 比较长的测试文.cpp
ctags: Notice: No options will be read from files or environment
ctags: Warning: cannot open input file "姣旇緝闀跨殑娴嬭瘯鏂?cpp" : No space left on device

e:\temp\>ctags --options=NONE 比较长的测试.cpp
ctags: Notice: No options will be read from files or environment
ctags: Warning: cannot open input file "姣旇緝闀跨殑娴嬭瘯.cpp" : No such file or directory

e:\temp\>ctags --options=NONE 比较长的测.cpp
ctags: Notice: No options will be read from files or environment
ctags: Warning: cannot open input file "姣旇緝闀跨殑娴?cpp" : No space left on device

e:\temp\>ctags --options=NONE 比较长的.cpp
ctags: Notice: No options will be read from files or environment

e:\temp\>ctags --options=NONE 比较长.cpp
ctags: Notice: No options will be read from files or environment

e:\temp\>ctags --options=NONE 比较.cpp
ctags: Notice: No options will be read from files or environment

e:\temp\>ctags --options=NONE 比.cpp
ctags: Notice: No options will be read from files or environment

For filename less than 4 characters (without the extension), it works well, while for filenames longer than 4, it fails, and it seems that the filename it tries to read is messy code.

The version of ctags:

Universal Ctags 5.9.0(745ac2f5), Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Mar 12 2021, 01:05:28
  URL: https://ctags.io/
  Optional compiled features: +win32, +wildcards, +regex, +internal-sort, +unix-path-separator, +iconv, +option-directory, +xpath, +json, +interactive, +yaml, +case-insensitive-filenames, +packcc

How do you get ctags binary:

Download from ctags-win32

Platform:

Microsoft Windows Version 1909 (OS Build 18363.1500)

@k-takata
Copy link
Member

It seems that msvcrt.dll's lstat() doesn't handle UTF-8 filenames correctly even if the UTF-8 code page is enabled by the manifest file.
Using the ucrt64 environment on MSYS2 seems to solve it.

BTW, I found another issue in the MinGW builds that MANUAL_GLOBBING is not defined.

@k-takata
Copy link
Member

It seems that this is caused by a limitation of the FindFirstFileA() API.

The filename 比较长的.cpp fits in the 8.3 filenames, so a short filename is not created.
On the other hand, the filename 比较长的测.cpp doesn't fit in the 8.3 filenames, so a short filename like 比较长~1.cpp is created.
The problem is that 比较长~1.cpp is 12 bytes in cp936, but it is 15 bytes in UTF-8.

The cAlternateFileName member in the WIN32_FIND_DATAA structure has only a 14-byte space. So, the FindFirstFileA() API fails.

The lstat() function in msvcrt.dll calls the FindFirstFileA() API directly. So, the mingw32/mingw64 environments are affected by this problem.
However, the lstat() function in Universal CRT converts the filename to UTF-16 then calls the wide version of the FindFirstFile() API. So, the ucrt64/clang64 environments don't have this problem.

A possible workaround on the user-side is to disable the 8.3 filenames by using the fsutil 8dot3name command.

@TroyDanielFZ
Copy link
Author

Thanks for your detailed explanation. I just tried with clang64, the problem doesn't show up any more.
And for this

The problem is that 比较长~1.cpp is 12 bytes in cp936, but it is 15 bytes in UTF-8.

Is it possible by converting the filename(s) with iconvlib to cp936 to solve this?

@k-takata
Copy link
Member

I have confirmed that removing the short filenames in the current directory using fsutil 8dot3name strip . fixes the issue.

Is it possible by converting the filename(s) with iconvlib to cp936 to solve this?

No. The correct way to fix this is to covert the filenames to UTF-16 and use the Wide APIs.
However, it spoils the advantage of using the UTF-8 manifest that we don't need to use the Wide APIs explicitly...

k-takata added a commit to k-takata/ctags-win32 that referenced this issue Jun 5, 2022
Use CLANG{32,64} to use UCRT for the runtime library.
Hopefully, this solves universal-ctags/ctags#3014.

Note that msys2.org doesn't provide UCRT32 environment now.
k-takata added a commit to k-takata/ctags-win32 that referenced this issue Jun 6, 2022
Use CLANG{32,64} to use UCRT for the runtime library.
Hopefully, this solves universal-ctags/ctags#3014.

Note that msys2.org doesn't provide UCRT32 environment now.
k-takata added a commit to k-takata/ctags-win32 that referenced this issue Jun 6, 2022
Use CLANG{32,64} to use UCRT for the runtime library.
Hopefully, this solves universal-ctags/ctags#3014.

Note that msys2.org doesn't provide UCRT32 environment now.
@k-takata
Copy link
Member

I added CLANG{32,64} binaries to ctags-win32. So, this shouldn't be occurred when they are used. Closing.
(Actually, I couldn't reproduce this with the latest MINGW{32,64} binaries either, but I don't know why.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants