Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add machinery to remove unecessary dynamic symbols at link time #222

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ezra-varady
Copy link
Collaborator

@ezra-varady ezra-varady commented Nov 9, 2023

This attempts to address #202. It adds a cmake target for an ld version script that places all symbols not explicitly marked with PGDLL_EXPORT in the local section. It finds the functions that need to be marked as global automatically so it shouldn't require developers to mark their functions in a suppression script. I believe that --version-script should behave the same way in lld though there isn't any documentation so this may only work with gnu tools. I would have preferred to set all visibility to hidden then manually mark functions to be exported in source since this is more transparent to developers, however libstdc++ doesn't allow this

example symbol exports with suppression script

nm -D --defined-only lantern.so
000000000000acf2 T cos_dist@@LANTERN
000000000000ad5a T hamming_dist@@LANTERN
000000000000a77e T hnsw_handler@@LANTERN
000000000000ac8a T l2sq_dist@@LANTERN
0000000000000000 A LANTERN
000000000000ae71 T lantern_internal_failure_point_enable@@LANTERN
000000000000ae25 T lantern_internal_validate_index@@LANTERN
000000000000ac66 T ldb_generic_dist@@LANTERN
000000000000ace5 T pg_finfo_cos_dist@@LANTERN
000000000000ad4d T pg_finfo_hamming_dist@@LANTERN
000000000000a771 T pg_finfo_hnsw_handler@@LANTERN
000000000000ac7d T pg_finfo_l2sq_dist@@LANTERN
000000000000ae64 T pg_finfo_lantern_internal_failure_point_enable@@LANTERN
000000000000ae18 T pg_finfo_lantern_internal_validate_index@@LANTERN
000000000000ac59 T pg_finfo_ldb_generic_dist@@LANTERN
000000000000adae T pg_finfo_vector_l2sq_dist@@LANTERN
00000000000101ab T _PG_fini@@LANTERN
000000000000ff6e T _PG_init@@LANTERN
0000000000009afa T Pg_magic_func@@LANTERN
000000000000adbb T vector_l2sq_dist@@LANTERN

EDIT: apparently Darwin uses its own linker so this isn't compatible with osx builds. Since this will be a relatively small percentage of deployments long term I think it's acceptable to just export more symbols on mac

EDIT2: sanitizer failure is unrelated to changes. It passed on the previous run, seems there's some minor non-determinism in the index that fails in that test


# build version script
# this script will place everything except undefined symbols and
echo "LANTERN {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of "LANTERN" here? is it the version name? the section name in the linker script?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the version name, I wasn't sure if it made sense to use a specific version

@ezra-varady
Copy link
Collaborator Author

It seems the reason for namespace std:: always being visible is that various C++ language features require this to work properly across shared library boundaries. Since lantern will never be linked against, and postgres, as well as the extension itself are both written in pure C, I believe for our specific use case it's safe to hide these symbols. I think the C++ components should have the correct symbols available when they're linked into usearch, and because usearch will, in this case, only ever be called by pure C, we can safely throw out the symbols. Nothing from inside usearch will ever interact with a C++ runtime outside of the library. The descriptions I've found are terse and ambiguous though. I'll try to figure out how to trigger an exception in usearch to sanity check this

I changed the version name to LANTERN_V1. Since nothing actually links against lantern, we shouldn't have to have multiple ABI versions, but in case we do this name will be safe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants