Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve --heap-size-hint arg handling #48050

Merged
merged 15 commits into from
Feb 23, 2024

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Dec 30, 2022

Previously, --heap-size-hint would silently ignore many flavors of "bad" input, parsing things like "3PB" as 3 bytes. This change makes it significantly less permissive, erroring unless it can parse a number (still relying on the C sscanf %Lf format specifier there) with an optional unit (case-insensitive, either with or without the trailing b). Also test it.

Previously, --heap-size-hint would silently ignore many flavors of "bad" input, parsing things like "3gigs" as 3 bytes. This change makes it significantly less permissive, erroring unless it can parse a number (still relying on the C `sscanf` `%Lf` format specifier there) with an optional unit (case-insensitive, either with or without the trailing `b`). Also test it.
@mbauman mbauman added the backport 1.9 Change should be backported to release-1.9 label Dec 30, 2022
@KristofferC KristofferC mentioned this pull request Jan 2, 2023
41 tasks
src/jloptions.c Outdated Show resolved Hide resolved
src/jloptions.c Outdated Show resolved Hide resolved
@KristofferC
Copy link
Member

Is this ready to go?

@KristofferC KristofferC mentioned this pull request Jan 17, 2023
35 tasks
test/cmdlineargs.jl Outdated Show resolved Hide resolved
test/cmdlineargs.jl Outdated Show resolved Hide resolved
@KristofferC KristofferC mentioned this pull request Feb 20, 2023
50 tasks
@KristofferC KristofferC mentioned this pull request Mar 7, 2023
52 tasks
@oscardssmith
Copy link
Member

Is this ready to merge?

@KristofferC KristofferC mentioned this pull request Apr 9, 2023
26 tasks
@KristofferC KristofferC mentioned this pull request May 8, 2023
51 tasks
@KristofferC KristofferC mentioned this pull request Jun 6, 2023
36 tasks
@KristofferC KristofferC mentioned this pull request Jul 11, 2023
35 tasks
KristofferC added a commit that referenced this pull request Aug 18, 2023
Backported PRs:
- [x] #47782 <!-- Generalize Bool parse method to AbstractString -->
- [x] #48634 <!-- Remove unused "deps" mechanism in internal sorting
keywords [NFC] -->
- [x] #49931 <!-- Lock finalizers' lists at exit -->
- [x] #50064 <!-- Fix numbered prompt with input only with comment -->
- [x] #50474 <!-- docs: Fix a `!!! note` which was miscapitalized -->
- [x] #50516 <!-- Fix visibility of assert on GCC12/13 -->
- [x] #50635 <!-- `versioninfo()`: include build info and unofficial
warning -->
- [x] #49915 <!-- Revert "Remove number / vector (#44358)" -->
- [x] #50781 <!-- fix `bit_map!` with aliasing -->
- [x] #50845 <!-- fix #50438, use default pool for at-threads -->
- [x] #49031 <!-- Update inference.md -->
- [x] #50289 <!-- Initialize prev_nold and nold in gc_reset_page -->
- [x] #50559 <!-- Expand kwcall lowering positional default check to
vararg -->
- [x] #49582 <!-- Update HISTORY.md for `DelimitedFiles` -->
- [x] #50341 <!-- invokelatest docs should say not exported before 1.9
-->
- [x] #50525 <!-- only check that values are finite in `generic_lufact`
when `check=true` -->
- [x] #50444 <!-- Optimize getfield lowering to avoid boxing in some
cases -->
- [x] #50523 <!-- Avoid generic call in most cases for getproperty -->
- [x] #50860 <!-- Add `Base.get_extension` to docs/API -->
- [x] #50164 <!-- codegen: handle dead code with unsafe_store of FCA
pointers -->
- [x] #50568 <!-- `Array(::AbstractRange)` should return an `Array` -->
- [x] #50871 <!-- macOS: Don't inspect dead threadtls during exception
handling. -->

Need manual backport:
- [ ] #48542 <!-- Add docs on task-specific buffering using
multithreading -->
- [ ] #50591 <!-- build: fix various makefile bugs -->


Non-merged PRs with backport label:
- [ ] #50842 <!-- Avoid race conditions with recursive rm -->
- [ ] #50823 <!-- Make ranges more robust with unsigned indexes. -->
- [ ] #50663 <!-- Fix Expr(:loopinfo) codegen -->
- [ ] #49716 <!-- Update varinfo() docstring signature -->
- [ ] #49713 <!-- prevent REPL from erroring in numbered mode in some
situations -->
- [ ] #49573 <!-- Implement jl_cpu_pause on PPC64 -->
- [ ] #48726 <!-- fix macro expansion of property destructuring -->
- [ ] #48642 <!-- Use gc alloc instead of alloc typed in lowering -->
- [ ] #48183 <!-- Don't use pkgimage for package if any includes fall in
tracked path for coverage or alloc tracking -->
- [ ] #48050 <!-- improve `--heap-size-hint` arg handling -->
- [ ] #47615 <!-- Allow threadsafe access to buffer of type inference
profiling trees -->
src/jloptions.c Outdated Show resolved Hide resolved
@fingolfin
Copy link
Member

@mbauman do you still have plans to update this, taking the existing comments into account?

@vtjnash
Copy link
Member

vtjnash commented Feb 8, 2024

Also possibly related to fixes in #52979 and #50843

@mbauman
Copy link
Member Author

mbauman commented Feb 8, 2024

Oh yeah, this was a bit of a fly-by code dump here. Let's see if we can quickly get this back into shape. I've removed the debugging test code (tests pass locally; I don't recall whether I was trying to debug something in CI?). And I've changed the docs to match with #50480.

This looking better?

src/jloptions.c Outdated Show resolved Hide resolved
src/jloptions.c Outdated Show resolved Hide resolved
g = 1024m
t = 1024g
@testset "--heap-size-hint=$str" for (str, val) in [("1", 1), ("1e7", 1e7), ("2.5e7", 2.5e7), ("1MB", 1m), ("2.5g", 2.5g), ("1e4kB", 1e4k),
("1e100", typemax(UInt64)), ("1e500g", typemax(UInt64)), ("1e-12t", 1), ("500000000b", 500000000)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, clearly I did not look closely enough before: I thought this was about adding support for inputs like 2.5g which seems reasonable to me

But is a heap size hint of the form 1e-12t or 1e100 really something we want to / should support?

I guess it would be more work to not support them, though. Ah well.

Copy link
Member Author

@mbauman mbauman Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't change the numeric syntaxes supported here — it just adds tests for them.

The big change here isn't about the supported numeric values, but rather the byte unit. Before Julia would silently discard/ignore unrecognized units. After this PR, Julia errors. That's the big improvement.

Currently:

▶ julia --heap-size-hint=2.5GB -E "Base.JLOptions().heap_size_hint"
0x0000000000000002

▶ julia --heap-size-hint=2.5Gibberish -E "Base.JLOptions().heap_size_hint"
0x0000000000000002

▶ julia --heap-size-hint=2.5Gobbledeegook -E "Base.JLOptions().heap_size_hint"
0x0000000000000a00

After:

▶ ./julia --heap-size-hint=2.5GB -E "Base.JLOptions().heap_size_hint"
0x00000000a0000000

▶ ./julia --heap-size-hint=2.5Gibberish -E "Base.JLOptions().heap_size_hint"
ERROR: julia: invalid argument to --heap-size-hint (2.5Gibberish)

▶ ./julia --heap-size-hint=2.5Gobbledeegook -E "Base.JLOptions().heap_size_hint"
ERROR: julia: invalid argument to --heap-size-hint (2.5Gobbledeegook)

@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Feb 12, 2024
@vtjnash
Copy link
Member

vtjnash commented Feb 13, 2024

Apparently something is wrong with _imp__utf8proc_tolower. It seems like we must have linked it since utf8proc_charwidth exists, but this other one is missing for some reason.

src/jloptions.c Outdated Show resolved Hide resolved
@vtjnash vtjnash added backport 1.11 Change should be backported to release-1.11 and removed backport 1.9 Change should be backported to release-1.9 labels Feb 19, 2024
@vtjnash
Copy link
Member

vtjnash commented Feb 19, 2024

Also needs a rebase to fix some conflicting doc changes

static char ascii_tolower(char c)
{
if ('A' <= c && c <= 'Z')
return c - 'A' + 'a';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, let me recap: this PR changed things to use tolower, which has issues, so it was suggested to use utf8proc_tolower instead which then was replaced by this hand-written function...

How about my original suggestion to just stick with what the code did before this PR, and simply have two cases in the switch for each letter? That seems by far simplest and is guaranteed to have no UTF-8 related issues or anything else and requires no additional explanation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yeah. But this works too, and there are a couple extra cases now, including one that is not in a switch statement, so it is not entirely obvious to me that it is significantly better one way or the other; just a style preference.

@vtjnash vtjnash merged commit 138aba7 into JuliaLang:master Feb 23, 2024
5 of 7 checks passed
@mbauman mbauman deleted the mb/heap_size_hint_cleanup branch February 23, 2024 22:11
@mbauman mbauman removed the merge me PR is reviewed. Merge when all tests are passing label Feb 23, 2024
tecosaur pushed a commit to tecosaur/julia that referenced this pull request Mar 4, 2024
Previously, `--heap-size-hint` would silently ignore many flavors of
"bad" input, parsing things like "3PB" as 3 bytes. This change makes it
significantly less permissive, erroring unless it can parse a number
(still relying on the C `sscanf` `%Lf` format specifier there) with an
optional unit (case-insensitive, either with or without the trailing
`b`). Also test it.
KristofferC pushed a commit that referenced this pull request Mar 6, 2024
Previously, `--heap-size-hint` would silently ignore many flavors of
"bad" input, parsing things like "3PB" as 3 bytes. This change makes it
significantly less permissive, erroring unless it can parse a number
(still relying on the C `sscanf` `%Lf` format specifier there) with an
optional unit (case-insensitive, either with or without the trailing
`b`). Also test it.

(cherry picked from commit 138aba7)
@KristofferC KristofferC mentioned this pull request Mar 6, 2024
60 tasks
mkitti pushed a commit to mkitti/julia that referenced this pull request Mar 7, 2024
Previously, `--heap-size-hint` would silently ignore many flavors of
"bad" input, parsing things like "3PB" as 3 bytes. This change makes it
significantly less permissive, erroring unless it can parse a number
(still relying on the C `sscanf` `%Lf` format specifier there) with an
optional unit (case-insensitive, either with or without the trailing
`b`). Also test it.
KristofferC added a commit that referenced this pull request Mar 17, 2024
Backported PRs:
- [x] #39071 <!-- Add a lazy `logrange` function and `LogRange` type -->
- [x] #51802 <!-- Allow AnnotatedStrings in log messages -->
- [x] #53369 <!-- Orthogonalize re-indexing for FastSubArrays -->
- [x] #48050 <!-- improve `--heap-size-hint` arg handling -->
- [x] #53482 <!-- add IR encoding for EnterNode -->
- [x] #53499 <!-- Avoid compiler warning about redefining jl_globalref_t
-->
- [x] #53507 <!-- update staled `Core.Compiler.Effects` documentation
-->
- [x] #53408 <!-- task splitting: change additive accumulation to
multiplicative -->
- [x] #53523 <!-- add back an alias for `check_top_bit` -->
- [x] #53377 <!-- add _readdirx for returning more object info gathered
during dir scan -->
- [x] #53525 <!-- fix InteractiveUtils call in Base.runtests on failure
-->
- [x] #53540 <!-- use more efficient `_readdirx` for tab completion -->
- [x] #53545 <!-- use `_readdirx` for `walkdir` -->
- [x] #53551 <!-- revert "Add @create_log_macro for making custom styled
logging macros (#52196)" -->
- [x] #53554 <!-- Always return a value in 1-d circshift! of
abstractarray.jl -->
- [x] #53424 <!-- yet more atomics & cache-line fixes on work-stealing
queue -->
- [x] #53571 <!-- Update Documenter to v1.3 for inventory writing -->
- [x] #53403 <!-- Move parallel precompilation to Base -->
- [x] #53589 <!-- add back `unsafe_convert` to pointer for arrays -->
- [x] #53596 <!-- build: remove extra .a file -->
- [x] #53606 <!-- fix error path in `precompilepkgs` -->
- [x] #53004 <!-- Unexport with, at_with, and ScopedValue from Base -->
- [x] #53629 <!-- typo fix in scoped values docs -->
- [x] #53630 <!-- sroa: Fix incorrect scope counting -->
- [x] #53598 <!-- Use Base parallel precompilation to build stdlibs -->
- [x] #53649 <!-- precompilepkgs: package in boths deps and weakdeps are
in fact only weak -->
- [x] #53671 <!-- Fix bootstrap Base precompile in cross compile
configuration -->
- [x] #52125 <!-- Load Pkg if not already to reinstate missing package
add prompt -->
- [x] #53602 <!-- Handle zero on arrays of unions of number types and
missings -->
- [x] #53516 <!-- permit NamedTuple{<:Any, Union{}} to be created -->
- [x] #53643 <!-- Bump CSL to 1.1.1 to fix libgomp bug -->
- [x] #53679 <!-- move precompile workload back from Base -->
- [x] #53663 <!-- add isassigned methods for reinterpretarray -->
- [x] #53662 <!-- [REPL] fix incorrectly cleared line after completions
accepted -->
- [x] #53611 <!-- Linalg: matprod_dest for Diagonal and adjvec -->
- [x] #53659 <!-- fix #52025, re-allow all implicit pointer casts in
cconvert for Array -->
- [x] #53631 <!-- LAPACK: validate input parameters to throw informative
errors -->
- [x] #53628 <!-- Make some improvements to the Scoped Values
documentation. -->
- [x] #53655 <!-- Change tbaa of ptr_phi to tbaa_value  -->
- [x] #53391 <!-- Default to the medium code model in x86 linux -->
- [x] #53699 <!-- Move `isexecutable, isreadable, iswritable` to
`filesystem.jl` -->
- [x] #41232 <!-- Fix linear indexing for ReshapedArray if the parent
has offset axes -->
- [x] #53527 <!-- Enable analyzegc checks for try catch and fix found
issues -->
- [x] #52092 
- [x] #53682 <!-- Increase build precompilation -->
- [x] #53720 
- [x] #53553 <!-- typeintersect: fix `UnionAll` unaliasing bug caused by
innervars. -->

Contains multiple commits, manual intervention needed:
- [ ] #53305 <!-- Propagate inbounds in isassigned with CartesianIndex
indices -->

Non-merged PRs with backport label:
- [ ] #53736 <!-- fix literal-pow to return the right type when the base
is -1 -->
- [ ] #53707 <!-- Make ScopedValue public -->
- [ ] #53696 <!-- add invokelatest to on_done callback in bracketed
paste -->
- [ ] #53660 <!-- put Logging back in default sysimage -->
- [ ] #53509 <!-- revert moving "creating packages" from Pkg.jl -->
- [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} -->
- [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->
- [ ] #51928 <!-- Styled markdown, with a few tweaks -->
- [ ] #51816 <!-- User-themable stacktraces -->
- [ ] #51811 <!-- Make banner size depend on terminal size -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
@KristofferC KristofferC removed the backport 1.11 Change should be backported to release-1.11 label Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants