Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boringssl --> OpenSSL 3.2 #59870

Merged
merged 30 commits into from
Apr 16, 2024
Merged

Conversation

rschu1ze
Copy link
Member

@rschu1ze rschu1ze commented Feb 12, 2024

Like #56398, but OpenSSL 3.2. Made a new PR so that it doesn't become too messy.

Background:

  • OpenSSL seems to become more popular (compared to boringssl) with the improvements done after Heartbleed. As both codebases diverge, ClickHouse had to introduce hacks to force submodules to link against boringssl when upstream supported only OpenSSL. These hacks will become obsolete with this PR.
  • Should enable FIPS 140-3 with TLS 1.3 (Be able to deploy Clickhouse in FIPS-compliant way #34238) in the long run. BoringSSL doesn't seem to have a TLS 1.3 FIPS compliant version (aka. "boringcrypto"). It is a bit tricky though... as per Feb 2024, OpenSSL 3.0.8 and 3.0.9 are FIPS-140-2 validated (announcement), OpenSSL 3.1.2 is under FIPS-140-3 validation (announcement), and OpenSSL 3.2 appears too new so no validation was seemingly started yet. Also see Be able to deploy Clickhouse in FIPS-compliant way #34238.
  • OpenSSL (no specific version requirement) is a prerequisite for upgrading Azure SDK from 1.7 to 1.8 (see here)
  • We specifically need to go with OpenSSL 3.2 instead of 3.0 or 3.1 because of this mess constraint, i.e. all data stored with AES_128_GCM_SIV and AES_256_GCM_SIV codecs would no longer be decrypt-able.

Details:

  • Prior to this PR, ClickHouse linked against boringssl pulled in as submodule. An unsupported alternative was to link OpenSSL statically (ENABLE_OPENSSL) via a submodule or dynamically (ENABLE_OPENSSL AND ENABLE_OPENSSL_DYNAMIC) against the OS-provided OpenSSL, usually a FIPS-certified one.
  • e85adb9 removes the boringssl submodule and 8b259d5 switches the build to the statically linked OpenSSL version.
  • That broke the sparse checkout. Fixed with 096e358
  • Next replace the OpenSSL submodule (pointing to upstream) by a forked one (bb906e0, b61ee06)
  • The previous OpenSSL submodule was 3.0, the new one is 3.2. This breaks the build for all platforms. To fix it, one needs to create new CMake descriptions from OpenSSL's makefile (for detailed steps, see the beginning of contrib/openssl-cmake/CMakeLists.txt + Add OpenSSL option to Clickhouse #43991 (comment)).
  • 124b882 fixes x86, e13bd77 fixes ARM, f00b73c fixes PPC, 0b50318 fixes macOS x86 and ARM, ea4fdb3 fixes RISC-V, and b6528e6 rage-disables s390x because boringssl --> OpenSSL 3.2 #59870 (comment) and boringssl --> OpenSSL 3.2 #59870 (comment) ... I could not figure out how to make the build work (collateral damage, sorry IBM). We should enable it back later, see boringssl --> OpenSSL 3.2 #59870 (comment). Lots of trash accumulated while re-enabling the platforms, the garbage is removed with 9499484.
  • Some tests gave different error codes and could be fixed easily: fa74ae4, ee588b6, d93d4de
  • Glibc compat check in CI complained OpenSSL 3.2 needs a too-new glibc, fixed with cb5db8c and cdeef25
  • Tests for CompressionCodecEncrypted failed with the most obscure errors. I remember debugging for hour, reading scattered tutorial fragments before finding out that GCM_SIV requires to call the API in a slightly different way. Fixed with 7e5432f. This commit also provides backward compatibility for s390x. dd73ccf is a necessary refactoring to make that all work.
  • Integration test test_dictionaries_all_layouts_separate_sources failed with obscure "unexpected EOF" errors. By pure luck, I found SSL Library Error: error:0A000126:SSL routines::unexpected eof while reading openssl/openssl#22690 (comment) and could fix that: 55d35d8
  • More fixes for msan failures: 08c4600 979e06c
  • And fixes for tsan: bdaace3 This disables atomics in OpenSSL but I don't think that's a performance problem.
  • And fixes for leaksan: e06a9c3, 164cd58
  • 01747_system_session_log_long was fixed by 5d37a45. The root cause for this was a bug in libpq (PostgreSQL, here) which was fixed upstream in the meantime. Backported the fix into ClickHouse's stone age copy of libpq.
  • Finally, there were some data races detected by tsan in integration tests. These point to problems with session caching in OpenSSL. Force-disabled it for now with d30b48f and will check further.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-ch-test-poll robot-ch-test-poll added pr-not-for-changelog This PR should not be mentioned in the changelog submodule changed At least one submodule changed in this PR. labels Feb 12, 2024
@robot-ch-test-poll
Copy link
Contributor

robot-ch-test-poll commented Feb 12, 2024

This is an automated comment for commit 9d2301f with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
A SyncThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS⏳ pending
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Mergeable CheckChecks if all other necessary checks are successful❌ failure
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors❌ failure
Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests✅ success
PR CheckThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success

@UnamedRus
Copy link
Contributor

One of the reason of going to boringSSL was performance speedup (30x) for encryption functions #11844 (comment)

@rschu1ze
Copy link
Member Author

The azure submodule worked with boringssl only by chance, and the 1.8 upgrade no longer builds at all with boringssl. That's a blocker.

I assume that the performance measurements were done against OpenSSL 1.x, perhaps runtimes improved in the meantime.

@rschu1ze
Copy link
Member Author

rschu1ze commented Feb 12, 2024

Status update:

  • Applying b442b22 to this PR and using cipher "AES-128-GCM-SIV" instead of "AES-128-GCM" produces the expected result. So this will resolve boringssl ⟶ OpenSSL 3.0 #56398 (comment)

  • Re-generating CMake build descriptions after the upgrade was daunting (to put it nicely). I am tempted to do the same exercise only for ARM, and use much simpler build descriptions for all other platforms (e.g. no assembly).

@rschu1ze
Copy link
Member Author

readelf dies for me when running locally. Not clear how to reproduce the existing (which should be fixed by 9eb2ecf) and the new problem in the compatibility check.

@rschu1ze
Copy link
Member Author

rschu1ze commented Feb 12, 2024

Applying b442b22 to this PR and using cipher "AES-128-GCM-SIV" instead of "AES-128-GCM" produces the expected result. This will resolve #56398 (comment)

Not true. Encryption works and produces the right result, decryption throws an error with generic error message error:00000000:lib(0)::reason(0) (ironically meaning "no error" when you google this error). No clue yet why, probably because we are now in "SIV Mode" (manpage) and the calls to OpenSSL need to be changed in "subtle ways", as the manpage explains so eloquently. Could not find examples or tutorials for this mode anywhere online.

@rschu1ze
Copy link
Member Author

rschu1ze commented Feb 12, 2024

Aforementioned manpage states in section SIV Mode:

EVP_CIPHER_CTX_ctrl(ctx, EVP_CTRL_AEAD_SET_TAG, taglen, tag);
Sets the expected tag (the Synthetic Initialization Vector) to taglen bytes from tag.
This call is only legal when decrypting data and must be made before any data is
processed (e.g. before any EVP_DecryptUpdate() calls). For SIV mode the taglen must be
16.

Calling EVP_CIPHER_CTX_ctrl() before EVP_DecryptUpdate() did the trick, decompression now works.
(It is still terrifying to not get a proper error message for such types of API misuse.)

@rschu1ze
Copy link
Member Author

rschu1ze commented Feb 12, 2024

Encryption works and produces the right result, decryption throws an error with generic error message error:00000000:lib(0)::reason(0) (ironically meaning "no error" when you google this error).

I am pretty sure that the error handling in our existing code calling OpenSSL is wrong (this leads to different actual and expected error codes for a negative test in integration test test_replicated_merge_tree_encryption_codec). Basically, OpenSSL stores errors in a queue, so when the 5th call fails, one needs to loop through and ignore four previous errors using ERR_get_error() OR clear the queue before each call using ERR_clear_error(). We previously only called ERR_get_error() which gets the earliest error. The code was only exercised on s390/x and thus untested.

On the hand, doing ERR_clear_error() before each call to OpenSSL functions also doesn't produce better error codes, something else seems wrong.

@rschu1ze
Copy link
Member Author

rschu1ze commented Feb 13, 2024

Two more integration tests with wrong expected error coded fixed, trying now to restore the non-x86 builds.

@rschu1ze
Copy link
Member Author

trying now to restore the non-x86 builds.

For ARM, I tried the cross-compiling instructions mentioned here #43991 (comment). ./Configure generates the makefile but then mixes compiler calls (which fail) with header-generating steps, such that not all headers needed for copy into the platform-specific build directories exist. I guess the compiler errors are because I don't have the necessary toolchain set up. Need to dig deeper ...

For Arm, I'll also try on my EC2 ARM instance.

@rschu1ze
Copy link
Member Author

rschu1ze commented Feb 13, 2024

ARM builds are fixed now, checking now why integration test test_dictionaries_all_layouts_separate_sources fails.

@rschu1ze
Copy link
Member Author

Need to work on other stuff with a deadline until Friday.

@rschu1ze
Copy link
Member Author

rschu1ze commented Feb 22, 2024

PPC builds should work now.

Tried to fix RISC-V builds locally but when I try to cross-compile ClickHouse, I run into:

/usr/bin/riscv64-linux-gnu-ld.bfd: /home/ubuntu/repo/ch2/cmake/linux/../../contrib/sysroot/linux-riscv64/lib/riscv64-linux-gnu/librt.a(timer_create.o): undefined reference to symbol '__stack_chk_guard@@GLIBC_2.27
/usr/bin/riscv64-linux-gnu-ld.bfd: /home/ubuntu/repo/ch2/cmake/linux/../../contrib/sysroot/linux-riscv64/lib/riscv64-linux-gnu/./ld-linux-riscv64-lp64d.so.1: error adding symbols: DSO missing from command line
lang++-17: error: linker command failed with exit code 1 (use -v to see invocation)

plus the build is ultra slow locally.

Also tried to fix s390x builds but my attempt to cross-compile locally (without the boring-to-openssl transition) failed with weird linker errors.. Additionally, a standalone openssl (which is needed to generate platform-specifc files) refuses to compile on LE systems (like x86) for BE systems (s390x). Luckily, it generated at least a few critical files, other ones (which seem very similar across platforms) I could copy from other platforms.

EDIT: The RISC-V build should be working now. s390/x can hopefully be fixed by looking at CI output.

@rschu1ze
Copy link
Member Author

rschu1ze commented Feb 22, 2024

Yes, RISC-V works.

Darwin ARM builds now compile and link successfully locally (with actual darwin-aarch64 platform directory unlike #56398 which wrongly used linux-aarch64). However, when I start clickhouse-server on my Mac I get

~/repo/ClickHouse/build/programs (be-less-boring-32 *>) $ chs
dyld[93091]: dyld cache '(null)' not loaded: syscall to map cache into shared region failed
dyld[93091]: Library not loaded: /usr/lib/libiconv.2.dylib
  Referenced from: <C296BB2F-0441-3025-B1C6-D1AA4E4243B3> /Users/robert/repo/ClickHouse/build/programs/clickhouse
  Reason: tried: '/usr/lib/libiconv.2.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/lib/libiconv.2.dylib' (no such file), '/usr/lib/libiconv.2.dylib' (no such file, no
dyld cache)
Abort trap: 6

which does not look related but I'll make a master build to double-check.

I also generated similar platform files for darwin-x86_64 and there is a chance it works, but I did not test locally yet (cross-compiling for Mac is troublesome locally)

EDIT: The same error appears for standard master builds on Mac --> unrelated.

EDIT EDIT: we also have compat builds for x86_64 (no SSE3 or higher) and ARM (v8.0). Both builds are green but since there is quite some assembly used (see build descriptions), there is a chance that any of these will regardless fail at runtime with ILLEGAL INSTRUCTION because the assembly contains more recent instructions. Unfortunately, it is difficult to test for me I don't have access to such ancient hardware and disabling specific SIMD instruction sets on AWS EC2 instances does not seem possible.

@rschu1ze rschu1ze force-pushed the be-less-boring-32 branch 2 times, most recently from 11d2ccb to 0f3103f Compare February 23, 2024 10:41
@rschu1ze
Copy link
Member Author

Darwin builds are now really green.

Regarding compat check:

Regarding s390x (here):

  • I followed the documented steps with a master build (w/o this PR) and linking fails with clang++-17: error: invalid linker name in argument '-fuse-ld=mold'. I then installed mold (apt install mold) and linking fails with mold: unknown -m argument: elf64_s390. What am I doing wrong here? @bkuschel @larryluogit Perhaps you have an idea or updated build instructions?

Will continue working on this PR once #59516 is merged.

@rschu1ze rschu1ze force-pushed the be-less-boring-32 branch from dc58ea2 to cf43db2 Compare March 5, 2024 13:12
@rschu1ze
Copy link
Member Author

rschu1ze commented Mar 5, 2024

Regarding s390x (here):

* I followed the [documented steps](https://clickhouse.com/docs/en/development/build-cross-s390x) with a `master` build (w/o this PR) and linking fails with `clang++-17: error: invalid linker name in argument '-fuse-ld=mold'`. I then installed `mold` (`apt install mold`) and linking fails with `mold: unknown -m argument: elf64_s390`. What am I doing wrong here? @bkuschel @larryluogit  Perhaps you have an idea or updated build instructions?

Temporarily disabled s390/x.

@Algunenano
Copy link
Member

I followed the documented steps with a master build (w/o this PR) and linking fails with clang++-17: error: invalid linker name in argument '-fuse-ld=mold'. I then installed mold (apt install mold) and linking fails with mold: unknown -m argument: elf64_s390. What am I doing wrong here? @bkuschel @larryluogit Perhaps you have an idea or updated build instructions?

@yakov-olkhovskiy is the one that hardcoded mold

@rschu1ze
Copy link
Member Author

rschu1ze commented Mar 5, 2024

I guess you mean the mold linker was hardcoded here: https://github.com/ClickHouse/ClickHouse/pull/59870/files#diff-cc10102f8b1db393176bb4ef3536718e0d91ca8b6c086e088953b0d951732c9bR82
If I have time, I can check if I get s390x working with that, but there is a chance that it will stay disabled in CI until this PR is merged.

@yakov-olkhovskiy
Copy link
Member

yakov-olkhovskiy commented Mar 5, 2024

Linux s390x build was added here:
#53181
mold was the only linker working there...
please notice, that mold should be that specific version as I remember

@rschu1ze rschu1ze force-pushed the be-less-boring-32 branch from d520ea2 to 47919cb Compare March 11, 2024 14:08
@rschu1ze
Copy link
Member Author

rschu1ze commented Mar 11, 2024

Regarding compat check:

* About [*wip* boringssl --> OpenSSL 3.2 #59870 (comment)](https://github.com/ClickHouse/ClickHouse/pull/59870#issuecomment-1939445522)): It turned out 1. I was wrongly using `BuilderDebRelease` (which uses musl) as template for local build to run the compat check. It should have been `BuilderBinRelease` 2. With this, `readelf` from `binutils` still died, but installing `elfutils` and using `eu-readutils` instead worked.

* Picking [9eb2ecf](https://github.com/ClickHouse/ClickHouse/commit/9eb2ecfe02e2be6447502f3cad9d316096190e5e) into this PR made the ARM compatibility check green. Interestingly, x86 one is still red: https://s3.amazonaws.com/clickhouse-test-reports/59870/0f3103f494b8b7ccd7da62ff9d3cc23932c42a7b/compatibility_check__amd64_.html with two other too-new symbols `sendmmsg@GLIBC_2.14` and `recvmmsg@GLIBC_2.12`.

* Not clear yet what causes this, one candidate are new `BIO_s_dgram_pair()`/ `BIO_s_dgram_mem()` calls added with OpenSSL 3.2. (see their [changelog](https://github.com/openssl/openssl/blob/master/CHANGES.md)).

Fixed now (--> fb00fa6). Interestingly, the same problem with old glibc-s appeared already earlier: openssl/openssl#22036

@rschu1ze rschu1ze marked this pull request as ready for review March 11, 2024 17:20
@rschu1ze rschu1ze force-pushed the be-less-boring-32 branch from 979e06c to d30b48f Compare April 8, 2024 11:16
@hanfei1991 hanfei1991 self-assigned this Apr 8, 2024
Copy link
Member

@hanfei1991 hanfei1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest LGTM. Awesome work!

@@ -1,5 +1,5 @@
#!/usr/bin/env bash
# Tags: deadlock
# Tags: deadlock, no-tsan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember I could not get this test and 01393_benchmark_secure_port.sh to run locally. I just saw that they consistently failed in tsan builds. The log files from CI were also not helpful ... so I disabled them for now :-( Not great, but I'll check as a follow up.

docs/en/development/build-cross-s390x.md Show resolved Hide resolved
@rschu1ze rschu1ze enabled auto-merge April 15, 2024 21:30
@rschu1ze rschu1ze added this pull request to the merge queue Apr 16, 2024
Merged via the queue into ClickHouse:master with commit 01c55e6 Apr 16, 2024
227 of 234 checks passed
@rschu1ze rschu1ze deleted the be-less-boring-32 branch April 16, 2024 07:05
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 16, 2024
@Algunenano
Copy link
Member

Algunenano commented Apr 16, 2024

FYI, I'm seeing tons of errors when building from scratch:

[271/13662] Generating ../openssl/crypto/bn/rsaz-3k-avx512.s
sh: 0: Illegal option --
sh: 0: Illegal option - 
[272/13662] Generating ../openssl/crypto/bn/rsaz-2k-avx512.s
sh: 0: Illegal option --
sh: 0: Illegal option - 
[273/13662] Generating ../openssl/crypto/aes/aesni-sha256-x86_64.s
sh: 0: Illegal option --
sh: 0: Illegal option - 
[274/13662] Generating ../openssl/crypto/bn/rsaz-4k-avx512.s
sh: 0: Illegal option -

Verbose:

[379/13662] cd /mnt/ch/ClickHouse/build_public/contrib/openssl-cmake && /usr/bin/env perl /mnt/ch/ClickHouse/contrib/openssl/crypto/sha/asm/sha512-x86_64.pl /mnt/ch/ClickHouse/build_public/contrib/openssl/crypto/sha/sha256-x86_64.s
sh: 0: Illegal option --
sh: 0: Illegal option - 

Do we need a specific version of perl? Do we even need to generate those files instead of pushing them directly to the submodule?

@Algunenano
Copy link
Member

It seems it needs CC to be declared externally:

perl -d:Trace /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl /mnt/ch/ClickHouse/build_public/contrib/openssl/crypto/bn/rsaz-2k-avx512.s
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:31: $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:32: $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:34: $win64=0; $win64=1 if ($flavour =~ /[nm]asm|mingw64/ || $output =~ /\.asm$/);
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:34: $win64=0; $win64=1 if ($flavour =~ /[nm]asm|mingw64/ || $output =~ /\.asm$/);
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:35: $avx512ifma=0;
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:37: $0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:37: $0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:38: ( $xlate="${dir}x86_64-xlate.pl" and -f $xlate ) or
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:42: if (`$ENV{CC} -Wa,-v -c -o /dev/null -x assembler /dev/null 2>&1`
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:47: if (!$avx512ifma && $win64 && ($flavour =~ /nasm/ || $ENV{ASM} =~ /nasm/) &&
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:52: if (!$avx512ifma && `$ENV{CC} -v 2>&1`
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:65: open OUT,"| \"$^X\" \"$xlate\" $flavour \"$output\""
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:67: *STDOUT=*OUT;
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:69: if ($avx512ifma>0) {{{
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:718: }}} else {{{                # fallback for old assembler
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:719: $code.=<<___;
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:719: $code.=<<___;
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:742: $code =~ s/\`([^\`]*)\`/eval $1/gem;
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:743: print $code;
>> /mnt/ch/ClickHouse/contrib/openssl/crypto/bn/asm/rsaz-2k-avx512.pl:744: close STDOUT or die "error closing STDOUT: $!";
sh: 0: Illegal option --
sh: 0: Illegal option -

I guess cmake should take care of that

@HarryLeeIBM
Copy link
Contributor

Has anybody tried to build the latest with cmake .. -DENABLE_OPENSSL_DYNAMIC=1? It gives error:

/usr/bin/x86_64-linux-gnu-ld: /usr/bin/x86_64-linux-gnu-ld: DWARF error: invalid or unhandled FORM value: 0x25
contrib/openssl-cmake/CMakeFiles/crypto.dir/__/openssl/engines/e_dasync.c.o: in function `v_check':
e_dasync.c:(.text.v_check+0x0): multiple definition of `v_check'; /usr/bin/x86_64-linux-gnu-ld: DWARF error: invalid or unhandled FORM value: 0x25
contrib/openssl-cmake/CMakeFiles/crypto.dir/__/openssl/engines/e_capi.c.o:e_capi.c:(.text.v_check+0x0): first defined here

@rschu1ze
Copy link
Member Author

rschu1ze commented Apr 18, 2024

Hm, indeed, this reproduces on x86 with a minimal CMake invocation: cmake -DENABLE_OPENSSL_DYNAMIC=1 -DENABLE_EMBEDDED_COMPILER=0 -DENABLE_LIBRARIES=0 -DCMAKE_BUILD_TYPE=Debug -S . -B build.

@HarryLeeIBM
Copy link
Contributor

Regarding CH build for openssl dynamic linking issue, a PR has been raised:62888

@HarryLeeIBM HarryLeeIBM mentioned this pull request Apr 29, 2024
30 tasks
azat added a commit to azat/ClickHouse that referenced this pull request May 9, 2024
…e sentry)

The problem is tha openssl registers OPENSSL_cleanup() as atexit
handler, which called before destroying of SentryWriter, so to avoid
this problem, let's destroy it explicitly.

<details>

<summary>stack trace example</summary>

    Thread 2 (Thread 0x7ffff54006c0 (LWP 24847) "clickhouse-serv"):
    0  ___pthread_rwlock_rdlock (rwlock=0x0) at pthread_rwlock_rdlock.c:26
    1  0x00000000164c18a9 in CRYPTO_THREAD_read_lock (lock=0x0) at threads_pthread.c:93
    2  0x000000001642e6b9 in int_err_get_item (d=0x7ffff53f74e0) at err.c:192
    ...
    7  ossl_connect_common (cf=0x7ffff7812c80, data=0x7ffff70a4c00, nonblocking=bool_true, done=0x7ffff53f834c) at openssl.c:4486
    ...
    17 curl_easy_perform (data=data@entry=0x7ffff70a4c00) at easy.c:787
    18 0x000000000b4c3854 in sentry__curl_send_task (_envelope=<optimized out>, _state=0x7ffff7074300) at sentry_transport_curl.c:225
    19 0x000000000b4ba880 in worker_thread (data=0x7ffff70e5500) at sentry_sync.c:262

    Thread 1 (Thread 0x7ffff7cb2c80 (LWP 24842) "clickhouse-serv"):
    5  0x000000000b4bb0e2 in sentry__cond_wait_timeout (cv=0x7ffff70e5540, mutex=0x7ffff70e5570, msecs=250) at sentry_sync.h:332
    6  sentry__bgworker_shutdown (bgw=0x7ffff70e5500, timeout=2000) at sentry_sync.c:412
    7  0x000000000b4b3e95 in sentry_close () at sentry_core.c:238
    8  0x000000000b4a5f1f in SentryWriter::~SentryWriter (this=0x7ffff71a1240) at SentryWriter.cpp:147
    9  std::__1::default_delete<SentryWriter>::operator()[abi:v15000](SentryWriter*) const (this=0x7ffff70e5568, __ptr=0x7ffff71a1240) at unique_ptr.h:48
    10 std::__1::unique_ptr<SentryWriter, std::__1::default_delete<SentryWriter> >::reset[abi:v15000](SentryWriter*) (this=0x7ffff70e5568, __p=0x0) at unique_ptr.h:305
    11 std::__1::unique_ptr<SentryWriter, std::__1::default_delete<SentryWriter> >::~unique_ptr[abi:v15000]() (this=0x7ffff70e5568) at unique_ptr.h:259
    12 0x00007ffff7de62e6 in __run_exit_handlers (status=0, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
    13 0x00007ffff7de642e in __GI_exit (status=<optimized out>) at exit.c:138
    14 0x00007ffff7dccd51 in __libc_start_call_main (main=main@entry=0x6111c20 <main(int, char**)>, argc=argc@entry=13, argv=argv@entry=0x7fffffffb718) at libc_start_call_main.h:74
    15 0x00007ffff7dcce0c in __libc_start_main_impl (main=0x6111c20 <main(int, char**)>, argc=13, argv=0x7fffffffb718, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffb708) at libc-start.c:360

    (gdb) p req.body
    $7 = 0x7ffff7816000 "{\"dsn\":\"...\"}\n{\"type\":\"session\",\"length\":190}\n{\"init\":true,\"sid\":\"...\",\"status\":\"exited\",\"errors\":0,\"started\":\"2024-05-08T20:29:23.253Z\",\"duration\":17.213,\"attrs\":{\"release\":\"24.5\",\"environment\":\"test\"}}"

</details>

P.S. Likely started happens after conversion to OpenSSL (ClickHouse#59870).

Signed-off-by: Azat Khuzhin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo submodule changed At least one submodule changed in this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants