-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Regex dependency #201
Conversation
💯 I have a pet peeve with static RegExs. They add a lot of overhead compared to a simple FSM. |
Thank you for keeping on working on this issue @RossSmyth! The call for regex support is also imminent with #170 but in looks like we can get away there without regex or regex-lite there as well. After a quick look at HWIDs and the MODALIAS strings from #170, I'm wondering if we actually need to deal with word boundaries according to Perl regex or if we could use a simpler definition in our context? I'm not opposed to a solution based on |
Yeah I was thinking the same, but I personally do not know what that subset is. The Windows documentation on HWID isn't really that great. |
I found the note
in the documentation for device identification strings. Looks like what we are doing here is not spec'd. 😄 But something done by other libraries as well (like for example pySerial, nusb). The common pattern seen is:
pySerial uses a word regex on the last path element for checking for a composite device and not This procedure works with the example device identification strings from pyserial/pyserial#283 (comment). After looking into this a bit more into the history of our code, regex-matching for word characters seems to be a relic from #47. It looks like it survived subsequent changes to this mechanism when implementing/cleaning up support for retrieving serial numbers for composite devices like #141 recently. From my my perspective, we good to go with just using the entire last path element of the actual USB device as serial number and without the check for word characters. |
Okie dokie. I'll modify it so. |
648bb98
to
529e7eb
Compare
Thinking about how the serial number is matched, we currently make the serial number an optional field, even from the parent device if part of a composite device. We do not test this behavior though. So I wonder what the semantics are supposed to be.
I'll stare at pyserial to see if I can divine what they do. |
Ok, my Python skills are quite rusty but I think it's staring right in the face in your link. I believe it is supposed to be that if the child device doesn't have a serial number, the parent must, or else pyserial will throw an exception OR stack overflow because it seems to just infinitely recurse I'll change this in a later PR though as it is a bit orthogonal. |
As far as I've overseeing things here: This is is the way. Serial numbers for USB devices are provided by the device and are optional. In terms of USB descriptors the serial number is a property of the USB device and not the interface. In case of a composite device there is one serial number per device and the interfaces are numbered. This is also related to #178 and we are maybe using the wrong serial number for composite FTDI devices: The suffix seems to be generated by the FTDI driver and according to pyserial/pyserial#283 (comment) (COM96 and COM134), the parent device provides the right serial number. |
If you look at So yeah, I agree. Always optional. |
Remove regex as the capture that are done are not too complex. Just implement our own capturing.
Recursion depth is limited here.
Yes. No need to change/fix everything here. We just should end up at least en par with the beforehand state. |
Like in: No serial if the actual device is nested too deep down in the device stack or there is none? The latter is the case for older FTDI devices like the FT2232D which requires an external EEPROM for storing serial number and other configuration data.
Great to be on the same page! 😄 |
I would like to keep the actual wrong serial numbers returned in #203 and not return a longer number which is also not the devices serial number. Switching from Locally testing the changes shows no difference in the output for my set of test hardware. From my perspective, we are good to go. Thank you very much for your perseverance with this PR! |
I used it because it is semantically identical to the pyserial regex. But we can preserve the PERL_WORD semantics instead, that works for me. |
Rust str are indexed by byte, so doing .take().count() is incorrect since Unicode values are often multiple bytes. This fixes it and adds tests for the edge cases.
Oops forgot to actually add the changes 😄 It did not handle actual Unicode in the serial number properly if we are to accept that. The previous implementation would index it by byte. For example the test serial I added, |
This includes throwing some random data at HwidMatches::new with the help of quickcheck.
Thank you for finally remembering and adding the last fix! |
Thanks! |
Regex being the slowest dependency for my entire project just to do a couple matches has been really bugging me. So I just recreated it.
The main draw back is the Unicode table. I just tried to recreate the semantics of the regex, and this is what it specified (any word character).
This is even lighter weight than regex-lite so I would say this succeeds #168, plus there is no MSRV junk to worry about (wrt dependencies).
I don't personally believe there is any maintenance burden to it since the matching is pretty straightforward.
I also added a test to test that we select the second part of serial numbers that are split by "&" as specified in one issue. That test seems to have been updated so that it no longer did that after parent IDs were added.