Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tests on some platforms #124

Merged
merged 4 commits into from
Oct 29, 2021
Merged

Fix tests on some platforms #124

merged 4 commits into from
Oct 29, 2021

Conversation

vbarbaresi
Copy link
Contributor

  • For Python 3.10: install a protobuf compiler for pycld3 build
    It turns out that pycld3 doesn't have prebuilt wheels yet for Python 3.10, so we need the tools to build it.
    I just saw that you already opened an issue for that Wheel building fails for Python nightly (3.10) bsolomon1124/pycld3#18

  • Using % 256 == 0 in tests was actually masking failures: 256 return code meant that the binary wasn't found
    Explicitly call ./bin/trafilatura or ./Scripts/trafilatura so that we don't have rely on the binary being in the PATH.

    Also use the more modern subprocess.run() instead of os.system.
    We could collect stdout to test more things but it didn't work the same on all platforms so I didn't add it and use the return code only.

  • A weird encoding issue happened on Windows from processes spawned by multiprocessing.Pool (and only in this case)
    I could reproduce using export PYTHONIOENCODING='cp1252' before running tests.
    Setting PYTHONIOENCODING='utf-8' during the test fixes the issue.
    Maybe we should set it in the application, if someone reports encoding issues on Windows. I didn't want to fix something that wasn't broken so I just fixed the tests.

@codecov-commenter
Copy link

codecov-commenter commented Oct 29, 2021

Codecov Report

Merging #124 (25d502b) into master (776d706) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #124   +/-   ##
=======================================
  Coverage   94.63%   94.63%           
=======================================
  Files          19       19           
  Lines        2648     2648           
=======================================
  Hits         2506     2506           
  Misses        142      142           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 776d706...25d502b. Read the comment docs.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 29, 2021

Sourcery Code Quality Report

✅  Merging this PR will increase code quality in the affected files by 0.12%.

Quality metrics Before After Change
Complexity 1.59 ⭐ 1.62 ⭐ 0.03 👎
Method Length 198.33 😞 203.00 😞 4.67 👎
Working memory 7.73 🙂 7.74 🙂 0.01 👎
Quality 58.85% 🙂 58.97% 🙂 0.12% 👍
Other metrics Before After Change
Lines 346 357 11
Changed files Quality Before Quality After Quality Change
tests/cli_tests.py 58.85% 🙂 58.97% 🙂 0.12% 👍

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
tests/cli_tests.py test_sysoutput 5 ⭐ 288 ⛔ 10 😞 48.67% 😞 Try splitting into smaller methods. Extract out complex expressions
tests/cli_tests.py test_cli_pipeline 1 ⭐ 662 ⛔ 9 🙂 50.89% 🙂 Try splitting into smaller methods
tests/cli_tests.py test_parser 2 ⭐ 322 ⛔ 6 ⭐ 59.64% 🙂 Try splitting into smaller methods
tests/cli_tests.py test_input_filtering 0 ⭐ 236 ⛔ 7 🙂 62.65% 🙂 Try splitting into smaller methods

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

@adbar adbar merged commit 6e33579 into adbar:master Oct 29, 2021
# input directory walking and processing
assert os.system('trafilatura --inputdir "tests/resources/"') % 256 == 0
result = subprocess.run([trafilatura_bin, '--inputdir', RESOURCES_DIR]).returncode == 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I just realized that I removed the assert here @adbar
I initially wanted to assert on result but I changed my mind and didn't add the assert back

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be fixed in 07216a3

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error on Windows for the line that wasn't part of the tests...

Copy link
Contributor Author

@vbarbaresi vbarbaresi Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, it's the same issue that I fixed in the multiprocess pool workers.. So adding PYTHONIOENCODING in the run command environment should fix the issue:

subprocess.run([...], env={"PYTHONIOENCODING": "utf-8"})

But I'll try on a Windows machine this week-end.
I want to see if it should be set globally in the app or if it's just a test issue on GitHub worker configuration.
I suspect we have to fix it globally and not just in the tests

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

adbar added a commit that referenced this pull request Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants