Fix tests on some platforms #124

vbarbaresi · 2021-10-29T06:55:24Z

For Python 3.10: install a protobuf compiler for pycld3 build
It turns out that pycld3 doesn't have prebuilt wheels yet for Python 3.10, so we need the tools to build it.
I just saw that you already opened an issue for that Wheel building fails for Python nightly (3.10) bsolomon1124/pycld3#18
Using % 256 == 0 in tests was actually masking failures: 256 return code meant that the binary wasn't found
Explicitly call ./bin/trafilatura or ./Scripts/trafilatura so that we don't have rely on the binary being in the PATH.

Also use the more modern subprocess.run() instead of os.system.
We could collect stdout to test more things but it didn't work the same on all platforms so I didn't add it and use the return code only.
A weird encoding issue happened on Windows from processes spawned by multiprocessing.Pool (and only in this case)
I could reproduce using export PYTHONIOENCODING='cp1252' before running tests.
Setting PYTHONIOENCODING='utf-8' during the test fixes the issue.
Maybe we should set it in the application, if someone reports encoding issues on Windows. I didn't want to fix something that wasn't broken so I just fixed the tests.

…n environment PATH

…OENCODING

codecov-commenter · 2021-10-29T06:58:25Z

Codecov Report

Merging #124 (25d502b) into master (776d706) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #124   +/-   ##
=======================================
  Coverage   94.63%   94.63%           
=======================================
  Files          19       19           
  Lines        2648     2648           
=======================================
  Hits         2506     2506           
  Misses        142      142

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 776d706...25d502b. Read the comment docs.

sourcery-ai · 2021-10-29T11:44:36Z

Sourcery Code Quality Report

✅ Merging this PR will increase code quality in the affected files by 0.12%.

Quality metrics	Before	After	Change
Complexity	1.59 ⭐	1.62 ⭐	0.03 👎
Method Length	198.33 😞	203.00 😞	4.67 👎
Working memory	7.73 🙂	7.74 🙂	0.01 👎
Quality	58.85% 🙂	58.97% 🙂	0.12% 👍

Other metrics	Before	After	Change
Lines	346	357	11

Changed files	Quality Before	Quality After	Quality Change
tests/cli_tests.py	58.85% 🙂	58.97% 🙂	0.12% 👍

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation
tests/cli_tests.py	test_sysoutput	5 ⭐	288 ⛔	10 😞	48.67% 😞	Try splitting into smaller methods. Extract out complex expressions
tests/cli_tests.py	test_cli_pipeline	1 ⭐	662 ⛔	9 🙂	50.89% 🙂	Try splitting into smaller methods
tests/cli_tests.py	test_parser	2 ⭐	322 ⛔	6 ⭐	59.64% 🙂	Try splitting into smaller methods
tests/cli_tests.py	test_input_filtering	0 ⭐	236 ⛔	7 🙂	62.65% 🙂	Try splitting into smaller methods

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

vbarbaresi · 2021-10-29T12:15:18Z

tests/cli_tests.py

    # input directory walking and processing
-    assert os.system('trafilatura --inputdir "tests/resources/"') % 256 == 0
+    result = subprocess.run([trafilatura_bin, '--inputdir', RESOURCES_DIR]).returncode == 0


oops I just realized that I removed the assert here @adbar
I initially wanted to assert on result but I changed my mind and didn't add the assert back

should be fixed in 07216a3

error on Windows for the line that wasn't part of the tests...

oh right, it's the same issue that I fixed in the multiprocess pool workers.. So adding PYTHONIOENCODING in the run command environment should fix the issue:

subprocess.run([...], env={"PYTHONIOENCODING": "utf-8"})

But I'll try on a Windows machine this week-end.
I want to see if it should be set globally in the app or if it's just a test issue on GitHub worker configuration.
I suspect we have to fix it globally and not just in the tests

vbarbaresi added 3 commits October 28, 2021 21:36

try to fix tests on Windows by using subprocess.run and not relying o…

fbc43d2

…n environment PATH

try to fix Python 3.10 build by installing protobuf-compilerr

af91ac4

Fix windows tests: find trafilatura bin in Scripts and change PYTHONI…

25d502b

…OENCODING

tests: Python 3.10.0 → 3.10

c076991

adbar merged commit 6e33579 into adbar:master Oct 29, 2021

vbarbaresi commented Oct 29, 2021

View reviewed changes

adbar added a commit that referenced this pull request Oct 29, 2021

tests: fix post-PR #124

07216a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tests on some platforms #124

Fix tests on some platforms #124

vbarbaresi commented Oct 29, 2021

codecov-commenter commented Oct 29, 2021 •

edited

Loading

sourcery-ai bot commented Oct 29, 2021

vbarbaresi Oct 29, 2021

adbar Oct 29, 2021

adbar Oct 29, 2021

vbarbaresi Oct 29, 2021 •

edited

Loading

adbar Oct 29, 2021

Fix tests on some platforms #124

Fix tests on some platforms #124

Conversation

vbarbaresi commented Oct 29, 2021

codecov-commenter commented Oct 29, 2021 • edited Loading

Codecov Report

sourcery-ai bot commented Oct 29, 2021

Sourcery Code Quality Report

Legend and Explanation

vbarbaresi Oct 29, 2021

Choose a reason for hiding this comment

adbar Oct 29, 2021

Choose a reason for hiding this comment

adbar Oct 29, 2021

Choose a reason for hiding this comment

vbarbaresi Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

adbar Oct 29, 2021

Choose a reason for hiding this comment

codecov-commenter commented Oct 29, 2021 •

edited

Loading

vbarbaresi Oct 29, 2021 •

edited

Loading