-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix byte/string handling for Python 3 #19
base: main
Are you sure you want to change the base?
Conversation
Add bytestring_output_allocate_size, and use for appropriate functions. This means that you don't need to set SWIG_PYTHON_STRICT_BYTE_CHAR, which means that you can pass strings to functions that accept them whilst also retrieving byte data objects without having to convert to/from unicode representations. The functions this is applied to are: get_integerarray_as_string get_image_as_string get_image_fs_as_string get_image_sf_as_string get_real_image_as_string get_real_image_fs_as_string get_real_image_sf_as_string get_3d_image_as_string get_3d_image_fs_as_string get_3d_image_sf_as_string get_real_3d_image_as_string get_real_3d_image_fs_as_string get_real_3d_image_sf_as_string get_realarray_as_string
Even without SWIG_PYTHON_STRICT_BYTE_CHAR - on python 3, previously you had a choice of everything being a bytes - in which case string arguments have to be encoded - or nothing is, in which case you need to convert binary data to/from unicode in order to pass in/out of functions. This applies this to the following functions: set_integerarray set_integerarray_wdims set_integerarray_wdims_fs set_integerarray_wdims_sf set_realarray set_realarray_wdims set_realarray_wdims_fs set_realarray_wdims_sf set_image set_image_fs set_image_sf set_real_image set_real_image_fs set_real_image_sf set_3d_image set_3d_image_fs set_3d_image_sf set_real_3d_image set_real_3d_image_fs set_real_3d_image_sf
Also, ensure consistency of LF/CRLF line endings, and remove trailing spaces.
See discussion about this same issue: cctbx/cctbx_project#475 |
Except for the Windows issue, this looks good, but with increasing use by small molecule people on windows, |
To be clear - with this branch the windows issue is that I don't know if there is an equivalent for rewriting the I did run into different windows issues however, as I described in cctbx/staged-recipes#1 (comment) - mainly to do with |
Dear Nicholas,
Independent of the formalities of the March HDRMX meeting, let's try to
settle as much as we can by then.
For windows there are three approaches: work with native windows 10,
work with MSYS2, or work with WSL2.
Right now I would give priority to making sure the native linux debian and
ubuntu cbflib kits work under
python2 and python3 in WSL2 debian and ubuntu.
I have never had much luck with cbflib, hdf5 and native windows 10.
For Macs we should at least be sure what we have is working under
catalina.
Regards,
Herbert
…On Sun, Jan 31, 2021 at 7:22 AM Nicholas Devenish ***@***.***> wrote:
To be clear - with this branch the windows issue is that I don't know if
there is an equivalent for rewriting the pycbf.py, which is probably fine
as long as nobody wants to regenerate from the literate programming sources
from scratch, which I'm not sure anyone has done in quite a while.
I did run into different windows issues however, as I described in cctbx/staged-recipes#1
(comment) <cctbx/staged-recipes#1 (comment)>
- mainly to do with cl.exe support for optional language features (most
notably variable-length arrays) and a slightly hacky attempt at fixing in
the unmerged branch ***@***.***:winchanges
<ndevenish@refactor...ndevenish:winchanges>
- taking the minimally intrusive option instead of reworking the relevant
functions (or using goto!). I need to polish this up and make a PR/issue
for discussion though; haven't spent much time on cbflib/conda since
November. (I think we had plans to try to discuss general cbflib issues
with you at the march HDRMX meeting but not sure if anything has moved on
that yet)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#19 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABB6EAJH7APYEKSIDE7YPN3S4VDPFANCNFSM4T6VCGFQ>
.
|
This would also solve #18 - a person claiming to be the current (then) maintainer of libcbf on debian seems to have run into this issue |
This is the patches from the PR dials/cbflib#19
Problem: All
char *
fields, input and output are mapped tostr
. This works on Python 2, but on Python 3 causes problems that your binary data is now encoded in a string and needs to be converted viaencode('utf-8', errors='surrogateescape')
(see SWIG docs). This can be worked around by settingSWIG_PYTHON_STRICT_BYTE_CHAR
in the build - but now allchar *
fields are bytes, meaning that you need to encode any strings that you are passing into pycbf (see e.g. cctbx/dxtbx@44b6d72).This commit fixes this. When using Python 3, functions that return data will return it as a python
bytes
object, and those that accept data will accept a pythonbytes
object. Everything else will acceptstr
.The output/swig files have been regenerated from the nuweb sources, and the test/
make_pycbf.py
converted to print functions so that they will work on python 3 or 2 (using__future__
, so Python 2.6.0+ required for regeneration now).Because of the nature of the generation, that I've generated in SWIG 4.0, that the output files had previously all been generated with CRLF line endings, and that the base sources haven't been regenerated from the HTML for a while, the total diff is quite large and hard to read. The important parts are in:
bytes
objects for data input, andbytes
objects for dataThe one additional caveat is that the
pycbf.py
doesn't now work in python 2 without adding the line# coding: utf-8
to the top. This isn't because of these changes, but becausedoc/CBFlib.html
now uses unicode characters for e.g.Å
, these are carried over the the.py
file in regeneration. I've made this change manually here, under the presumption that people probably aren't going to regenerate, but this could be fixed by adding:to the
linux.sh
file. I don't know the equivalent for windows though, and don't know how important this use case is, so have left it rather than making the platforms work differently.