-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize unicode during export / import #1085
Comments
@oPromessa Trying to figure this one out. As I said in the original comment, there are 2 things I hate dealing with in programming: dates and unicode! Sorry for long post -- these notes to help me figure out how to fix this. Unicode characters can take one of 4 different normalization forms: osxphotos uses I created a photo and give it keyword From the osxphotos REPL: >>> import unicodedata
>>> import photoscript
>>> keyword = get_selected()[0].keywords[0]
>>> keyword
'Cão'
>>> unicodedata.is_normalized("NFC", keyword)
True
>>> photo = photoscript.Photo(uuid=get_selected()[0].uuid)
>>> photo.keywords
['Cão']
>>> unicodedata.is_normalized("NFC", photo.keywords[0])
True But when I paste the same keyword using >>> import unicodedata
>>> import photoscript
>>> get_selected()[0].keywords[0]
'Cão'
>>> keyword = get_selected()[0].keywords[0]
>>> keyword
'Cão'
>>> unicodedata.is_normalized("NFC", keyword)
True
>>> photo = photoscript.Photo(uuid=get_selected()[0].uuid)
>>> photo.keywords[0]
'Cão'
>>> unicodedata.is_normalized("NFC", photo.keywords[0])
False
>>> unicodedata.is_normalized("NFD", photo.keywords[0])
True When writing to files, osxphotos uses Lines 286 to 292 in 986010e
However, internally, osxphotos uses Lines 381 to 390 in 986010e
osxphotos/osxphotos/_constants.py Lines 21 to 22 in 986010e
It appears the system preserves whatever format was used when reading from the command line as is demonstrated by the following simple script: uni.py: import sys
import unicodedata
if __name__ == "__main__":
text = sys.argv[1]
for form in ["NFC", "NFD", "NFKC", "NFKD"]:
print(form, unicodedata.is_normalized(form, text)) ❯ python uni.py Cão
NFC True
NFD False
NFKC True
NFKD False
❯ python
Python 3.11.2 (v3.11.2:878ead1ac1, Feb 7 2023, 10:02:41) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.normalize("NFD", "Cão")
'Cão'
^^^ Copy this code and paste into command line
❯ python uni.py Cão
NFC False
NFD True
NFKC False
NFKD True
So in osxphotos/osxphotos/cli/import_cli.py Lines 354 to 368 in 986010e
|
I've started a unicode_refactor branch to work on this. First part was pulling out unicode and platform specify function out of utils.py into separate modules which is done. Now I need to create a map of all the places where unicode conversion needs to happen and determine what to do in each case.
On macOS 13.4, creating new data (keywords, titles, descriptions) in Photos uses >>> import unicodedata
>>> unicodedata.is_normalized("NFC", get_selected()[0].keywords[1])
True
>>> unicodedata.is_normalized("NFC", get_selected()[0].title)
True
>>> unicodedata.is_normalized("NFC", get_selected()[0].description)
True |
* Began refactoring for improving unicode handling * Added platform and unicode modules * Added tests for unicode utilities * Added tests for unicode utilities * Added tests for unicode utilities * Added tests for unicode utilities * Fixed unicode tests for linux * Fixed unicode tests for linux * Fixed duplicate alubm name with --add-to-album * Fixed test for linux * Fix for duplicate unicode kewyords, see #907, #1085
strings with unicode (e.g. keywords, etc) need to be normalized in all round trips with the Photos library.
Hi there. It worked very well.
But same issue found now with Keywords.
While exporting, I've added keywords to the files (via XMP) -- eg "Cão" -- which, on
osxphotos import
, via exiftool caused the same issues now with Keywords: their duplication: visually the same name but one in NFD and the other in NFC.Workaround:
--keyword "{keyword|function:fixunicode.py::fixunicode}"
with and without--merge-keywords
on theosxphotos import
but itdos not seem to change keywords!Originally posted by @oPromessa in #907 (reply in thread)
The text was updated successfully, but these errors were encountered: