Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when special unicode characters are used in source code on Windows #350

Closed
profesia-company opened this issue Jan 9, 2023 · 4 comments · Fixed by #385
Closed

Error when special unicode characters are used in source code on Windows #350

profesia-company opened this issue Jan 9, 2023 · 4 comments · Fixed by #385

Comments

@profesia-company
Copy link

profesia-company commented Jan 9, 2023

Describe the bug
Only Windows. On Linux it is working fine. We are using local characters from Slovak language in our source code. When we are trying to format source code, error is writted. Wrong char is e.g. 'ň'

To Reproduce
Trying to format sql command on Windows:
select 'ň' as ch

Expected behavior
Format document and no error from sqlfmt on Windows.

Actual behavior
`C:\Users\226545\Documents\dwh-dbt> sqlfmt .\models\profesia_bi\marts\tmp\tmp_order_offers_new.sql

Traceback (most recent call last):
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\Scripts\sqlfmt.exe_main
.py", line 7, in
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 1128, in call
return self.main(*args, **kwargs)
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 1053, in main
rv = self.invoke(ctx)
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\site-packages\click\decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlfmt\cli.py", line 168, in sqlfmt
report = api.run(files=matched_files, mode=mode, callback=progress_callback)
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlfmt\api.py", line 65, in run
results = _format_many(files, cache, mode, callback=callback)
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlfmt\api.py", line 176, in _format_many
results.extend((map(format_func, cache_misses)))
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlfmt\api.py", line 210, in _format_one
source = _read_path_or_stdin(path)
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlfmt\api.py", line 247, in _read_path_or_stdin
source = f.read()
File "C:\Users\226545\AppData\Local\Programs\Python\Python39\lib\encodings\cp1250.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x88 in position 4254: character maps to `

Additional context
What is the output of sqlfmt --version?
sqlfmt, version 0.14.3

@tconbeer
Copy link
Owner

tconbeer commented Jan 9, 2023

We're using Python's built-in open to decode your file. That's trying to use your OS's default encoding; on *nix systems, that's utf-8, but on your Windows machine it's read from the configured "code page," which in your case is cp1250. This appears to be an old standard that has been replaced by both cp1252 and much more commonly utf-8.

The file seems to have been encoded in utf-8, which is why the linux users can open it without issue. I might recommend that the Windows user(s) change their default codepage to utf-8, since that's really the standard now? How to change.

Black goes out of its way to detect the encoding of a file before reading it, but that only works because Python has a standard for declaring file encodings. No such standard exists for SQL. Black's impl.

I could consider catching this error and retrying with utf-8, but then I would still always write files in utf-8, anyway, which could cause other issues for windows users when they try to read the formatted files with cp1250?

@tconbeer tconbeer changed the title Error when special unicode characters are used in source code Error when special unicode characters are used in source code on Windows Jan 9, 2023
@profesia-company
Copy link
Author

Thank you for the fast answer. We have changed the default codepage to utf-8 on Windows computers and it is working fine. So this issue can be closed from my point of view ....

@tconbeer
Copy link
Owner

Good to hear. Thanks for following up.

@cmcnicoll
Copy link

Got a similar error for English (United States) locale. Fixed the error by enabling "Beta: Use Unicode UTF-8 for worldwide language support" in Region Settings. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants