Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Archive variants #52

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

alandefreitas
Copy link
Member

@alandefreitas alandefreitas commented Oct 26, 2023

Add extra archive variants such as docs-only and source-only. These variants can reduce expenses with JFrog download bandwidth, provide users with archives that are simpler to use, and provide docs-only archives for the website.

The MakeBoostDistro.py script includes parameters to determine what types of files should be included in the distribution. All other functions are adapted to handle these requirements accordingly.

fix #50

@alandefreitas alandefreitas force-pushed the archive-vars branch 4 times, most recently from 902b2b6 to 28e6048 Compare October 27, 2023 00:28
Add extra archive variants such as docs-only and source-only. These variants can reduce expenses with JFrog download bandwidth, provide users with archives that are simpler to use, and provide docs-only archives for the website.

The MakeBoostDistro.py script includes parameters to determine what types of files should be included in the distribution. All other functions are adapted to handle these requirements accordingly.

fix boostorg#50
@alandefreitas
Copy link
Member Author

@sdarwin Should we start iterating on this again?

@sdarwin
Copy link
Contributor

sdarwin commented Jan 10, 2024

For each of the new archives (docs-only and source-only) run a recursive diff diff -r dir1/ dir2/ comparing the new results to a traditional archive such as the ones on https://boostorg.jfrog.io/artifactory/main/develop/

  • For a sources-only archive, the 'boost' folder should be identical, right? The same as before. I think the last time I checked a couple months ago, there were some very small minor differences in that folder? But I would have to re-check.

  • Similarly in docs-only, doc-related folders would be identical.

@sdarwin
Copy link
Contributor

sdarwin commented Mar 21, 2024

The really interesting variant is source-only.

In terms of the website, I don't believe it's worth the complexity to have a docs-only bundle, because cloud storage is not expensive, a few dollars, and by continuing to use the "full" archives it provides redundancy and simplicity. The web files are full backup copies of each releases.

Docs-only is around 30% smaller. However, generating and uploading packages with both FULL and DOCS-ONLY (two packages instead of one) increases the total storage size! That's worse, not better. It also increases the amount of code to debug and maintain. Almost nobody would need to download a docs-only bundle, and if they did, the full archive serves the purpose. I propose commenting out the functionality of docs-only. Don't generate such an archive. Otherwise, make an argument for why docs-only should be kept.

In terms of source-only, how about these other choices.

  • Just out of curiosity, to learn, what would the effect be of going even further, and removing the entire libs/ and tools/ directories also. Only leaving the boost/ directory. Does that break everything?

or

  • Within each library, completely remove the doc/ folder. Currently it is being selective, and inside the doc/ folder, leaving so-called "code" such as a Jamfile or a .hpp file, but removing quickbook, html, odg, images. Of what use is the remaining Jamfile? Who would ever need that for anything? When quickbooks and images are gone. At that point, a few remaining stray files are useless.

Consider other folders such as test/ and example/. In those cases, "code" remains, but anything that isn't code such as txt, tar, json, README files are gone. That means the examples and the tests are probably 50% broken. Who is going to try to use examples/ or tests/ when files are missing?

Leaving things in a half-way useless state isn't helpful. At that point, why not go even further. Either have all tests/, or no tests/. But not broken tests where some critical .json files are missing so the tests won't run.

Another option is what Peter has started doing in Github Releases. boost-1.85.0.beta1-b2-nodocs.tar.gz. The -nodocs archive probably has all files without anything removed. The only difference is "docs" haven't been generated, so it saves around 25MB of space. But there won't be any controversy about the contents of the archive since it contains everything.

What are viable options that could be published with minimal controversy or confusion to end-users. And continue to work as expected. A simple story/explanation and also be useful to the developer.

  1. Leave everything but don't "build" the docs. As in Peter's -nodocs. However, this is too easy somehow, and not a huge storage savings.

or

  1. A number of software projects out in the world keep their documentation in a separate git repository. That is clear enough. It's understandable. If we strip out all doc/ folders from all libraries, and the top level, but leave everything else intact, it doesn't break "tests", or "examples", or anything else. Delete all "docs/" folders. Nothing else is modified. The archive size should be quite small.

or

  1. If the boost/ folder is included, and basically nothing else. Maximum reduction. Consider that on Ubuntu, the libboost-all-dev package will install "boost". Examining the results of installing libboost-all-dev it will include /usr/include/boost/beast and /usr/include/boost/url but NOT anything from src/ such as url/src/segments_encoded_view.cpp. Therefore, the package corresponds to boost/ only.

or

  1. None of the above.

There could be a case to be made for a "source-only" (option 2) and a "minimal" (option 3), although having multiple choices adds complexity. If we agree about a strategy then I could send a message to the mailing list asking for their feedback. No rush, let's think about it.

@alandefreitas
Copy link
Member Author

I don't believe it's worth the complexity to have a docs-only bundle

I agree. I think Peter asked for it after I did the source-only.

Just out of curiosity, to learn, what would the effect be of going even further, and removing the entire libs/ and tools/ directories also. Only leaving the boost/ directory. Does that break everything?

Users always need b2 from tools. Libs contains the source files so we also need it.

Within each library, completely remove the doc/ folder. Currently it is being selective, and inside the doc/ folder, leaving so-called "code" such as a Jamfile or a .hpp file, but removing quickbook, html, odg, images. Of what use is the remaining Jamfile? Who would ever need that for anything? When quickbooks and images are gone. At that point, a few remaining stray files are useless.

Mmmm... IIRC, I think it does that because Jamfiles outside doc refer to this doc file and things break. On the other hand, I think I did remove the tests from the release somehow (I don't remember if that's still in the source-only variant).

The reason I wanted to remove test was because it was an extreme case. One or two libraries contain tests that take most of the space in the whole release. I'll look at the release again with something like wiztree.

Consider other folders such as test/ and example/. In those cases, "code" remains, but anything that isn't code such as txt, tar, json, README files are gone. That means the examples and the tests are probably 50% broken. Who is going to try to use examples/ or tests/ when files are missing?

I think the main problem here was Jamfiles referencing these folders which breaks the build process when they don't exist. I think I'll try to come up with a script that works directly on top of the release from the website to filter these files. Then we can experiment more easily with it.

it saves around 25MB

Yes. That's nice but I was looking for something more extreme. Like the complete thing being around 20MB. That would be nice even in CI because you could just download everything instead of going through depinst.py.

What are viable options that could be published with minimal controversy or confusion to end-users

Yes. The steps you proposed are a good idea. I'll do some more experiments locally. I can work on the filters then try to build everything with b2 and keep doing that to ensure nothing breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docs-only and source-only archives
2 participants