-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove trailing whitespace in .c , .cpp and .h files #2
Conversation
Perl doesn't use pull requests, patches should be filed using perlbug. Trailing whitespace removal is also not considered important, but merely noise... |
I think lots of whitespaces bring bad code quality to the Perl source code... |
Trailing whitespace has nothing to do with quality... |
Thank you very much for the patch, @xatier. This GitHub repository is just a mirror of the main repository at http://perl5.git.perl.org. But some folks like to clone from GitHub because it has a fast content delivery network. As @seveas said, patches for Perl can be submitted with I don't know if the Perl5 Porters have a policy about trailing whitespace. But I share your feelings -- code that looks good tends to run good [sic]. I'm sure the Porters would be happy to consider your patch. |
In the op tree, a statement consists of a nextstate/dbstate op (of class cop) followed by the contents of the statement. This cop is created after the statement has been parsed. So if you have nested statements, the outermost statement has the highest sequence number (cop_seq). Every sub (including BEGIN blocks) has a sequence number indicating where it occurs in its containing sub. So BEGIN { } #1 # seq 2 { # seq 1 ... } is indistinguishable from # seq 2 { BEGIN { } #1 # seq 1 ... } because the sequence number of the BEGIN block is 1 in both examples. By reserving a sequence number at the start of every block and using it once the block has finished parsing, we can do this: BEGIN { } #1 # seq 1 { # seq 2 ... } # seq 1 { BEGIN { } #2 # seq 2 ... } and now B::Deparse can tell where to put the blocks. PL_compiling.cop_seq was unused, so this is where I am stashing the pending sequence number.
[DELTA] 1.25 2016-11-17 - Reduce memory usage by only loading Config if needed and not importing from Carp. Based on PR #2 from J. Nick Coston.
[DELTA] 1.25 2016-11-17 - Reduce memory usage by only loading Config if needed and not importing from Carp. Based on PR #2 from J. Nick Coston.
At this maximal level of debugging output, it displays the top 3 state stack entries each time it pushes, but with no obvious indication that a push is occurring. This commit changes this output: | 1| Setting an EVAL scope, savestack=9, | 2| #4 WHILEM_A_max | 2| #3 WHILEM_A_max | 2| #2 CURLYX_end yes 0 <abcdef> <g> | 2| 4:POSIXD[\w](5) to be this (which includes the word "push" and extra indentation for the stack dump): | 1| Setting an EVAL scope, savestack=9, | 2| push #4 WHILEM_A_max | 2| #3 WHILEM_A_max | 2| #2 CURLYX_end yes 0 <abcdef> <g> | 2| 4:POSIXD[\w](5) Also, replace curd (current depth) var with a positive integer offset (i) var, to avoid signed/unsigned mixing problems.
[DELTA] 1.57 2017-01-22 rurban ---- * Todo the t/exec.t test 2 on cygwin. * Fixed/Todo the t/decrypt.t test 7 utf8 failures. Skip with non UTF-8 locale. 1.56 2017-01-20 rurban ---- * add binmode to the decrypt/encr,decrypt sample scripts * add utf8-encoded testcase to t/decrypt.t [cpan #110921]. use -C * stabilized some tests, add diag to sometimes failing sh tests * moved filter-util.pl to t/ * fixed INSTALLDIRS back to site since 5.12 [gh #2] * fixed exec/sh test races using the same temp. filenames * reversed this Changes file to latest first * added Travis CI
Commit v5.27.8-405-gf548aeca98 from a year ago tweaked this timing-sensitive test script to reduce false positives. However, we're still seeing the occasional failure of test 2 in smokes, so twaks the timing a little further.
Fixed the issue with warning
On some platforms, building a -DDEBUGGING perl triggers the following compiler warnings: In file included from locale.c:385: locale.c: In function ‘S_bool_setlocale_2008_i’: locale.c:2494:29: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 7 has type ‘int’ [-Wformat=] "bool_setlocale_2008_i: creating new object" \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ perl.h:4983:33: note: in definition of macro ‘DEBUG__’ DEBUG_PRE_STMTS a; DEBUG_POST_STMTS \ ^ locale.c:2493:7: note: in expansion of macro ‘DEBUG_L’ DEBUG_L(PerlIO_printf(Perl_debug_log, \ ^~~~~~~ locale.c:2523:17: note: in expansion of macro ‘DEBUG_NEW_OBJECT_FAILED’ DEBUG_NEW_OBJECT_FAILED(category_names[index], new_locale, ^~~~~~~~~~~~~~~~~~~~~~~ In file included from locale.c:322: config.h:4052:18: note: format string is defined here #define U32uf "lu" /**/ This is because the code tries to format __LINE__ with a varargs function using %"LINE_Tf". Things are slightly tricky here because in a varargs function, no type context is available, so the format string absolutely has to match the intrinsic type of each argument. The __LINE__ macro expands to a simple (decimal) integer constant. According to C, such a constant has type int if its value fits, otherwise unsigned int if it fits, otherwise long int, etc. None of the *.c files in the perl distribution exceed 32767 lines (the minimum INT_MAX required by C), so even on ancient 16-bit systems, our __LINE__ will always be of type int. The %"LINE_Tf" format is designed to match a line_t argument, not int. (On some platforms, line_t is defined as unsigned long and incompatible with int for formatting purposes.) Therefore it is an error to use %"LINE_Tf" with __LINE__. One way to fix this is to convert the argument to match the format string: ... %"LINE_Tf" ...", (line_t)__LINE__. The other way is to change the format string to match the (int) argument: "... %d ...", __LINE__. I chose option Perl#2 because it is by far the most common way to output __LINE__ elsewhere in the perl source.
Valid, parseable, and sane prototypes, are tiny in char len and often fit on 1 hand. Original commit referred to mitigating sloppy XS code with random white space. Majority of XS code will have perfect clean prototype strings. Do not SV Alloc, PV Alloc, MEXTENDPUSH, memcpy(), and alot more free()s in scope _dec(), for clean strings. Even for dirty but parsable prototypes, they will be tiny. Therefore use a tiny stack buffer for dirty semi-hot path to remove overhead. Fuzzing, junk, abuse, can OOM die in newSV()/malloc() if needed, same as in prior version of the code. Use newSV(len) and POK_off, SV head is private to us, and a waste to bookkeep SVPV details. SAVEFREEPV() was not used, because previous code did mortal, and not SAVEFREEPV(), so keep using mortal. One day after testing by public, maybe SAVEFREEPV() is smarter choice here, but KISS in this commit. The size of the stack buffers should probably be 8 or 16 bytes to cover legit protoype strings. I made the buffers larger, simply, because I can, and there is no machine code size difference on x86/x64 between 16 and the numbers picked. The numbers are from 2 different binary analysis tools of perl541.dll on x64 Windows, -O1, MSVC 2022. The numbers are "width" or "size" or "overhead" in bytes of the C stack frames, of the 3 callers of S_strip_spaces. My rational is, by keeping width of the C stack frame, under 0xFF bytes, x86_op+stk_reg+imm8 encoding is emitted by CCs. Instead of x86_op+stk_reg+imm32 encoding which is larger. So the math is 0xFF-current_frame_size-(5*ptrs). -(5*ptrs) accounts for future P5P C auto vars or changes to the C code of the 3 callers, and whatever GCC vs Clang vs each CC build number uniqueness, so x86/x64 CCs only use stk_reg+imm8_offset instructions and never resort to writing out 32b offsets in machine code. As a guesstimate "/2" the stack frame width for i386 CPUs. Perl_cv_ckproto_len_flags(), has 6 args, therefore its inelligible for Win64's 4 register __fastcall ABI, and args 5 and 6, must be read off the C stack per ABI. So even if small U8-U64 C auto vars, are at the "top" of the C stack, and reached with +imm8 operands, obv the CC still has to write 2 "lone" read(+imm32) ops to read arg 5 and 6. There are tricks to optimize out +imm32 to reach incoming args, but thats for a CC vendor talk. Anyways, 0x10/16 or 0x20/32 is the realistic buffer size, the higher lengths here, are simply because, in theory, Perl#1 avoid malloc() always, Perl#2 no perf, runtime, or machine code size diff between 0x20 and my numbers. 2 different tools were used, I picked the "larger" numbers C stack size report number, to make the cleanedproto buffers even smaller, so this "CC only uses +imm8 op to r/w C stack" optimization lasts for years, not 1 build number of GCC/VC/LLVM. statistics Perl_ck_entersub_args_proto 0x88/0x48 yyl_subproto 0x28/0x20 Perl_cv_ckproto_len_flags 0x68/0x30 S_strip_spaces() was added in d16269d 6/24/2013 5:58:46 PM Remove spaces from a (copy of) a proto when used. The logic that *CUT*
…sv.c -svfix.pl is quick throwaway garbage done in 20 mins and probably doesnt regen sv_inline.h properly and copy pasting/replace regexps were was used anyway to fix up the code, it can be rewritten correctly tho and put in as a official regen.pl script how to finish this fix, options set and unset sv_type as a CPP macro then -#include a header 17 times that contains ONLY only Perl_newSV_type() and its mortal() sister creating 17*2 static inline fns (basically what I did here), code is stepable, extra ms'es of I/O build times perf degrade debate may or may not come up, I hope the CC has a sane in memory cache for .h files and doesn't go back to the kernel or put the entire Perl_newSV_type() fnc in a #define "#define PETS blah(foo(myarg)) + \ cat(dog(fur)) + \ laser(ball(toy)) " then execute that macro 17 times or write 17 Perl_newSV_type() copies into sv_inline.h with /regen.pl infrastructure (fastest for build speed core and build speed CPAN and code is c dbg stepable) OR is a dedicated "sv_newg.h" for regen.pl needed? does the master Perl_newSV_type() template live in a .pl or a .h? i dont have an opinion or against concept of sv_inline.h just have 5-10 hand written versions sv type specific of Perl_newSV_type(), its a cheap gimick fix to keep all 17 types together mashed with if/else/switch in 1 func and expecting bug free perfection from LTO engines of various C compilers, and expecting perfection from an single vendor LTO engine is very against the spirit of portable code -todo ideas, turn those super long #define ==?:==?:==?: into char array/struct initializers, stored in macros, one faux-string per each column of struct body_details, use that macro as c auto stk rw array initializer, then do the U32 len [3] = "\x01\x02\x03"[sv_type]; or U8 sizes [3] = {1,2,3}; U32 len = sizes[sv_type]; which in perl core would look U32 arena_size = SVDB_AR_SZ_DECL; U32 len = arena_size[sv_type]; maybe VC will optimize those since no global memory is used. Only Perl_newSV_typeX() needs this. in this commit static inline Perl_newSV_typeX(pTHX_ const svtype type) which is the ONLY Perl_newSV_type*() variant that take an arbitrary svtype arg, this is the fallback for gv_pvn_add_by() since I couldn't "const" that call to newSV_type() cuz gv_pvn_add_by() is only place in the whole core that takes a random SV type number. Internals of Perl_newSV_typeX() are trashy, here is an example, MSVC DID not turn this into a jump table but instead 17 test/cond_jump ops. v5 = (char *)S_new_body(v2); v6 = 40i64; if ( v2 == 15 ) v6 = 136i64; if ( v2 == 14 ) v6 = 104i64; if ( v2 == 13 ) v6 = 104i64; if ( v2 == 12 ) v6 = 32i64; if ( v2 == 11 ) v6 = 40i64; if ( v2 == 10 ) v6 = 80i64; if ( v2 == 9 ) v6 = 48i64; if ( v2 == 8 ) v6 = 224i64; if ( v2 == 7 ) v6 = 48i64; if ( v2 == 6 ) v6 = 32i64; if ( v2 == 5 ) v6 = 24i64; if ( v2 == 4 ) v6 = 40i64; if ( v2 == 3 ) v6 = 16i64; if ( v2 == 2 ) v6 = 0i64; if ( v2 == 1 ) v6 = 0i64; memset(v5, 0, v6 & -(signed __int64)(v2 != 0)); Solution is move Perl_newSV_typeX() to sv.c, and let it be struct body_details driven. Cuz it only purpose is when newSV_type() absolutly CAN NOT be constant folded (random number input). it only has 1 caller in core. S_new_body() properly const folded away in 99% of cases except for TWO callers Perl_newSV_typeX() and Perl_make_trie(). Perl_make_trie() failure to inline is bizzare, since Perl_make_trie() internally does "v9 = S_new_body(SVt_PVAV);" and DID inline Perl_newSV_typeSVt_PVAV() !!! and therefore Perl_make_trie() has the AV field initing/nulling code. Here is the "optimized" contents of S_new_body(), its junk performance/design wise (but runtime correct/no bugs) void **__fastcall S_new_body(svtype sv_type) { svtype v1; // er9 __int64 v2; // rbx void **result; // rax signed int v4; // ecx signed __int64 v5; // rax v1 = sv_type; v2 = sv_type; result = (void **)PL_body_roots[sv_type]; if ( !result ) { v4 = 4080; if ( v1 == 15 ) v4 = 3264; if ( v1 == 14 ) v4 = 2080; if ( v1 == 13 ) v4 = 4056; if ( v1 == 12 ) v4 = 4064; if ( v1 == 11 ) v4 = 4080; if ( v1 == 10 ) v4 = 4080; if ( v1 == 9 ) v4 = 4080; if ( v1 == 8 ) v4 = 4032; if ( v1 == 7 ) v4 = 4080; if ( v1 == 6 ) v4 = 3296; if ( v1 == 5 ) v4 = 3424; if ( v1 == 4 ) v4 = 3120; if ( v1 == 3 ) v4 = 3536; if ( v1 == 2 ) v4 = 0; if ( v1 == 1 ) v4 = 0; v5 = 40i64; if ( v1 == 15 ) v5 = 136i64; if ( v1 == 14 ) v5 = 104i64; if ( v1 == 13 ) v5 = 104i64; if ( v1 == 12 ) v5 = 32i64; if ( v1 == 11 ) v5 = 40i64; if ( v1 == 10 ) v5 = 80i64; if ( v1 == 9 ) v5 = 48i64; if ( v1 == 8 ) v5 = 224i64; if ( v1 == 7 ) v5 = 48i64; if ( v1 == 6 ) v5 = 32i64; if ( v1 == 5 ) v5 = 24i64; if ( v1 == 4 ) v5 = 40i64; if ( v1 == 3 ) v5 = 16i64; if ( v1 == 2 ) v5 = 0i64; if ( v1 == 1 ) v5 = 0i64; result = (void **)Perl_more_bodies(v1, v5 & -(signed __int64)(v1 != 0), v4 & (unsigned int)-(v1 != 0)); } PL_body_roots[v2] = *result; return result; } ------------ disassembly view of S_new_body() ------------ cmp r9d, 0Fh lea edi, [rbp+28h] mov r8d, 0FF0h lea r11d, [rbp+20h] mov edx, 0CC0h lea r10d, [rbp+30h] mov ecx, r8d mov eax, r9d cmovz ecx, edx cmp r9d, 0Eh mov edx, 820h cmovz ecx, edx cmp r9d, 0Dh lea edx, [r8-18h] cmovz ecx, edx cmp r9d, 0Ch lea edx, [r8-10h] cmovz ecx, edx cmp r9d, 0Bh lea edx, [r8-30h] cmovz ecx, r8d cmp r9d, 0Ah cmovz ecx, r8d cmp r9d, 9 cmovz ecx, r8d cmp r9d, 8 cmovz ecx, edx cmp r9d, 7 mov edx, 0CE0h cmovz ecx, r8d cmp r9d, 6 cmovz ecx, edx cmp r9d, 5 mov edx, 0D60h cmovz ecx, edx cmp r9d, 4 mov edx, 0C30h cmovz ecx, edx cmp r9d, 3 mov edx, 0DD0h cmovz ecx, edx cmp r9d, 2 lea edx, [rdi+60h] cmovz ecx, ebp cmp r9d, 1 cmovz ecx, ebp neg eax sbb eax, eax and eax, ecx mov ecx, r9d mov r8d, eax cmp r9d, 0Fh mov eax, edi cmovz eax, edx cmp r9d, 0Eh lea edx, [rbp+68h] cmovz eax, edx cmp r9d, 0Dh cmovz eax, edx cmp r9d, 0Ch lea edx, [rbp+50h] cmovz eax, r11d cmp r9d, 0Bh cmovz eax, edi cmp r9d, 0Ah cmovz eax, edx cmp r9d, 9 mov edx, 0E0h cmovz eax, r10d cmp r9d, 8 cmovz eax, edx cmp r9d, 7 lea edx, [rbp+18h] cmovz eax, r10d cmp r9d, 6 cmovz eax, r11d cmp r9d, 5 cmovz eax, edx cmp r9d, 4 lea edx, [rbp+10h] cmovz eax, edi cmp r9d, 3 cmovz eax, edx cmp r9d, 2 cmovz eax, ebp cmp r9d, 1 cmovz eax, ebp neg ecx mov ecx, r9d sbb rdx, rdx and rdx, rax call Perl_more_bodies ---------------------- 17 test ops and 17 conditional_move_constant_8_bits ops solution, turn S_new_body() back into a macro so no CC ever tries to ref-inline it. It was a macro before sv_inline.h branch was merged TODO add XSApitest.xs that worlds longest macros are identical to the master correct copy (struct body_details). byte size drops from before these 3 commits to this "success commit" mp.exe 0x1241AC-0x1224EC=7360 0x19D3D8-0x19B8E8=6896 p541.dll 0x154886-0x1532A6=5600 0x1AA19E-0x1A862E=7024 BEFORE Dump of file ..\miniperl.exe SECTION HEADER Perl#1 .text name 1241AC virtual size SECTION HEADER Perl#2 .rdata name 19D3D8 virtual size Dump of file ..\perl541.dll SECTION HEADER Perl#1 .text name 154886 virtual size SECTION HEADER Perl#2 .rdata name 1AA19E virtual size AFTER Dump of file ..\perl541.dll SECTION HEADER Perl#1 .text name 1532A6 virtual size SECTION HEADER Perl#2 .rdata name 1A862E virtual size Dump of file ..\miniperl.exe SECTION HEADER Perl#1 .text name 1224EC virtual size SECTION HEADER Perl#2 .rdata name 19B8E8 virtual size
…ly in sv.c -svfix.pl is quick throwaway garbage done in 20 mins and probably doesnt regen sv_inline.h properly and copy pasting/replace regexps were was used anyway to fix up the code, it can be rewritten correctly tho and put in as a official regen.pl script how to finish this fix, options set and unset sv_type as a CPP macro then -#include a header 17 times that contains ONLY only Perl_newSV_type() and its mortal() sister creating 17*2 static inline fns (basically what I did here), code is stepable, extra ms'es of I/O build times perf degrade debate may or may not come up, I hope the CC has a sane in memory cache for .h files and doesn't go back to the kernel or put the entire Perl_newSV_type() fnc in a #define "#define PETS blah(foo(myarg)) + \ cat(dog(fur)) + \ laser(ball(toy)) " then execute that macro 17 times or write 17 Perl_newSV_type() copies into sv_inline.h with /regen.pl infrastructure (fastest for build speed core and build speed CPAN and code is c dbg stepable) OR is a dedicated "sv_newg.h" for regen.pl needed? does the master Perl_newSV_type() template live in a .pl or a .h? i dont have an opinion or against concept of sv_inline.h just have 5-10 hand written versions sv type specific of Perl_newSV_type(), its a cheap gimick fix to keep all 17 types together mashed with if/else/switch in 1 func and expecting bug free perfection from LTO engines of various C compilers, and expecting perfection from an single vendor LTO engine is very against the spirit of portable code -todo ideas, turn those super long #define ==?:==?:==?: into char array/struct initializers, stored in macros, one faux-string per each column of struct body_details, use that macro as c auto stk rw array initializer, then do the U32 len [3] = "\x01\x02\x03"[sv_type]; or U8 sizes [3] = {1,2,3}; U32 len = sizes[sv_type]; which in perl core would look U32 arena_size = SVDB_AR_SZ_DECL; U32 len = arena_size[sv_type]; maybe VC will optimize those since no global memory is used. Only Perl_newSV_typeX() needs this. in this commit static inline Perl_newSV_typeX(pTHX_ const svtype type) which is the ONLY Perl_newSV_type*() variant that take an arbitrary svtype arg, this is the fallback for gv_pvn_add_by() since I couldn't "const" that call to newSV_type() cuz gv_pvn_add_by() is only place in the whole core that takes a random SV type number. Internals of Perl_newSV_typeX() are trashy, here is an example, MSVC DID not turn this into a jump table but instead 17 test/cond_jump ops. v5 = (char *)S_new_body(v2); v6 = 40i64; if ( v2 == 15 ) v6 = 136i64; if ( v2 == 14 ) v6 = 104i64; if ( v2 == 13 ) v6 = 104i64; if ( v2 == 12 ) v6 = 32i64; if ( v2 == 11 ) v6 = 40i64; if ( v2 == 10 ) v6 = 80i64; if ( v2 == 9 ) v6 = 48i64; if ( v2 == 8 ) v6 = 224i64; if ( v2 == 7 ) v6 = 48i64; if ( v2 == 6 ) v6 = 32i64; if ( v2 == 5 ) v6 = 24i64; if ( v2 == 4 ) v6 = 40i64; if ( v2 == 3 ) v6 = 16i64; if ( v2 == 2 ) v6 = 0i64; if ( v2 == 1 ) v6 = 0i64; memset(v5, 0, v6 & -(signed __int64)(v2 != 0)); Solution is move Perl_newSV_typeX() to sv.c, and let it be struct body_details driven. Cuz it only purpose is when newSV_type() absolutly CAN NOT be constant folded (random number input). it only has 1 caller in core. S_new_body() properly const folded away in 99% of cases except for TWO callers Perl_newSV_typeX() and Perl_make_trie(). Perl_make_trie() failure to inline is bizzare, since Perl_make_trie() internally does "v9 = S_new_body(SVt_PVAV);" and DID inline Perl_newSV_typeSVt_PVAV() !!! and therefore Perl_make_trie() has the AV field initing/nulling code. Here is the "optimized" contents of S_new_body(), its junk performance/design wise (but runtime correct/no bugs) void **__fastcall S_new_body(svtype sv_type) { svtype v1; // er9 __int64 v2; // rbx void **result; // rax signed int v4; // ecx signed __int64 v5; // rax v1 = sv_type; v2 = sv_type; result = (void **)PL_body_roots[sv_type]; if ( !result ) { v4 = 4080; if ( v1 == 15 ) v4 = 3264; if ( v1 == 14 ) v4 = 2080; if ( v1 == 13 ) v4 = 4056; if ( v1 == 12 ) v4 = 4064; if ( v1 == 11 ) v4 = 4080; if ( v1 == 10 ) v4 = 4080; if ( v1 == 9 ) v4 = 4080; if ( v1 == 8 ) v4 = 4032; if ( v1 == 7 ) v4 = 4080; if ( v1 == 6 ) v4 = 3296; if ( v1 == 5 ) v4 = 3424; if ( v1 == 4 ) v4 = 3120; if ( v1 == 3 ) v4 = 3536; if ( v1 == 2 ) v4 = 0; if ( v1 == 1 ) v4 = 0; v5 = 40i64; if ( v1 == 15 ) v5 = 136i64; if ( v1 == 14 ) v5 = 104i64; if ( v1 == 13 ) v5 = 104i64; if ( v1 == 12 ) v5 = 32i64; if ( v1 == 11 ) v5 = 40i64; if ( v1 == 10 ) v5 = 80i64; if ( v1 == 9 ) v5 = 48i64; if ( v1 == 8 ) v5 = 224i64; if ( v1 == 7 ) v5 = 48i64; if ( v1 == 6 ) v5 = 32i64; if ( v1 == 5 ) v5 = 24i64; if ( v1 == 4 ) v5 = 40i64; if ( v1 == 3 ) v5 = 16i64; if ( v1 == 2 ) v5 = 0i64; if ( v1 == 1 ) v5 = 0i64; result = (void **)Perl_more_bodies(v1, v5 & -(signed __int64)(v1 != 0), v4 & (unsigned int)-(v1 != 0)); } PL_body_roots[v2] = *result; return result; } ------------ disassembly view of S_new_body() ------------ cmp r9d, 0Fh lea edi, [rbp+28h] mov r8d, 0FF0h lea r11d, [rbp+20h] mov edx, 0CC0h lea r10d, [rbp+30h] mov ecx, r8d mov eax, r9d cmovz ecx, edx cmp r9d, 0Eh mov edx, 820h cmovz ecx, edx cmp r9d, 0Dh lea edx, [r8-18h] cmovz ecx, edx cmp r9d, 0Ch lea edx, [r8-10h] cmovz ecx, edx cmp r9d, 0Bh lea edx, [r8-30h] cmovz ecx, r8d cmp r9d, 0Ah cmovz ecx, r8d cmp r9d, 9 cmovz ecx, r8d cmp r9d, 8 cmovz ecx, edx cmp r9d, 7 mov edx, 0CE0h cmovz ecx, r8d cmp r9d, 6 cmovz ecx, edx cmp r9d, 5 mov edx, 0D60h cmovz ecx, edx cmp r9d, 4 mov edx, 0C30h cmovz ecx, edx cmp r9d, 3 mov edx, 0DD0h cmovz ecx, edx cmp r9d, 2 lea edx, [rdi+60h] cmovz ecx, ebp cmp r9d, 1 cmovz ecx, ebp neg eax sbb eax, eax and eax, ecx mov ecx, r9d mov r8d, eax cmp r9d, 0Fh mov eax, edi cmovz eax, edx cmp r9d, 0Eh lea edx, [rbp+68h] cmovz eax, edx cmp r9d, 0Dh cmovz eax, edx cmp r9d, 0Ch lea edx, [rbp+50h] cmovz eax, r11d cmp r9d, 0Bh cmovz eax, edi cmp r9d, 0Ah cmovz eax, edx cmp r9d, 9 mov edx, 0E0h cmovz eax, r10d cmp r9d, 8 cmovz eax, edx cmp r9d, 7 lea edx, [rbp+18h] cmovz eax, r10d cmp r9d, 6 cmovz eax, r11d cmp r9d, 5 cmovz eax, edx cmp r9d, 4 lea edx, [rbp+10h] cmovz eax, edi cmp r9d, 3 cmovz eax, edx cmp r9d, 2 cmovz eax, ebp cmp r9d, 1 cmovz eax, ebp neg ecx mov ecx, r9d sbb rdx, rdx and rdx, rax call Perl_more_bodies ---------------------- 17 test ops and 17 conditional_move_constant_8_bits ops solution, turn S_new_body() back into a macro so no CC ever tries to ref-inline it. It was a macro before sv_inline.h branch was merged TODO add XSApitest.xs that worlds longest macros are identical to the master correct copy (struct body_details). byte size drops from before these 3 commits to this "success commit" mp.exe 0x1241AC-0x1224EC=7360 0x19D3D8-0x19B8E8=6896 p541.dll 0x154886-0x1532A6=5600 0x1AA19E-0x1A862E=7024 BEFORE Dump of file ..\miniperl.exe SECTION HEADER Perl#1 .text name 1241AC virtual size SECTION HEADER Perl#2 .rdata name 19D3D8 virtual size Dump of file ..\perl541.dll SECTION HEADER Perl#1 .text name 154886 virtual size SECTION HEADER Perl#2 .rdata name 1AA19E virtual size AFTER Dump of file ..\perl541.dll SECTION HEADER Perl#1 .text name 1532A6 virtual size SECTION HEADER Perl#2 .rdata name 1A862E virtual size Dump of file ..\miniperl.exe SECTION HEADER Perl#1 .text name 1224EC virtual size SECTION HEADER Perl#2 .rdata name 19B8E8 virtual size
…ly in sv.c -svfix.pl is quick throwaway garbage done in 20 mins and probably doesnt regen sv_inline.h properly and copy pasting/replace regexps were was used anyway to fix up the code, it can be rewritten correctly tho and put in as a official regen.pl script how to finish this fix, options set and unset sv_type as a CPP macro then -#include a header 17 times that contains ONLY only Perl_newSV_type() and its mortal() sister creating 17*2 static inline fns (basically what I did here), code is stepable, extra ms'es of I/O build times perf degrade debate may or may not come up, I hope the CC has a sane in memory cache for .h files and doesn't go back to the kernel or put the entire Perl_newSV_type() fnc in a #define "#define PETS blah(foo(myarg)) + \ cat(dog(fur)) + \ laser(ball(toy)) " then execute that macro 17 times or write 17 Perl_newSV_type() copies into sv_inline.h with /regen.pl infrastructure (fastest for build speed core and build speed CPAN and code is c dbg stepable) OR is a dedicated "sv_newg.h" for regen.pl needed? does the master Perl_newSV_type() template live in a .pl or a .h? i dont have an opinion or against concept of sv_inline.h just have 5-10 hand written versions sv type specific of Perl_newSV_type(), its a cheap gimick fix to keep all 17 types together mashed with if/else/switch in 1 func and expecting bug free perfection from LTO engines of various C compilers, and expecting perfection from an single vendor LTO engine is very against the spirit of portable code -todo ideas, turn those super long #define ==?:==?:==?: into char array/struct initializers, stored in macros, one faux-string per each column of struct body_details, use that macro as c auto stk rw array initializer, then do the U32 len [3] = "\x01\x02\x03"[sv_type]; or U8 sizes [3] = {1,2,3}; U32 len = sizes[sv_type]; which in perl core would look U32 arena_size = SVDB_AR_SZ_DECL; U32 len = arena_size[sv_type]; maybe VC will optimize those since no global memory is used. Only Perl_newSV_typeX() needs this. in this commit static inline Perl_newSV_typeX(pTHX_ const svtype type) which is the ONLY Perl_newSV_type*() variant that take an arbitrary svtype arg, this is the fallback for gv_pvn_add_by() since I couldn't "const" that call to newSV_type() cuz gv_pvn_add_by() is only place in the whole core that takes a random SV type number. Internals of Perl_newSV_typeX() are trashy, here is an example, MSVC DID not turn this into a jump table but instead 17 test/cond_jump ops. v5 = (char *)S_new_body(v2); v6 = 40i64; if ( v2 == 15 ) v6 = 136i64; if ( v2 == 14 ) v6 = 104i64; if ( v2 == 13 ) v6 = 104i64; if ( v2 == 12 ) v6 = 32i64; if ( v2 == 11 ) v6 = 40i64; if ( v2 == 10 ) v6 = 80i64; if ( v2 == 9 ) v6 = 48i64; if ( v2 == 8 ) v6 = 224i64; if ( v2 == 7 ) v6 = 48i64; if ( v2 == 6 ) v6 = 32i64; if ( v2 == 5 ) v6 = 24i64; if ( v2 == 4 ) v6 = 40i64; if ( v2 == 3 ) v6 = 16i64; if ( v2 == 2 ) v6 = 0i64; if ( v2 == 1 ) v6 = 0i64; memset(v5, 0, v6 & -(signed __int64)(v2 != 0)); Solution is move Perl_newSV_typeX() to sv.c, and let it be struct body_details driven. Cuz it only purpose is when newSV_type() absolutly CAN NOT be constant folded (random number input). it only has 1 caller in core. S_new_body() properly const folded away in 99% of cases except for TWO callers Perl_newSV_typeX() and Perl_make_trie(). Perl_make_trie() failure to inline is bizzare, since Perl_make_trie() internally does "v9 = S_new_body(SVt_PVAV);" and DID inline Perl_newSV_typeSVt_PVAV() !!! and therefore Perl_make_trie() has the AV field initing/nulling code. Here is the "optimized" contents of S_new_body(), its junk performance/design wise (but runtime correct/no bugs) void **__fastcall S_new_body(svtype sv_type) { svtype v1; // er9 __int64 v2; // rbx void **result; // rax signed int v4; // ecx signed __int64 v5; // rax v1 = sv_type; v2 = sv_type; result = (void **)PL_body_roots[sv_type]; if ( !result ) { v4 = 4080; if ( v1 == 15 ) v4 = 3264; if ( v1 == 14 ) v4 = 2080; if ( v1 == 13 ) v4 = 4056; if ( v1 == 12 ) v4 = 4064; if ( v1 == 11 ) v4 = 4080; if ( v1 == 10 ) v4 = 4080; if ( v1 == 9 ) v4 = 4080; if ( v1 == 8 ) v4 = 4032; if ( v1 == 7 ) v4 = 4080; if ( v1 == 6 ) v4 = 3296; if ( v1 == 5 ) v4 = 3424; if ( v1 == 4 ) v4 = 3120; if ( v1 == 3 ) v4 = 3536; if ( v1 == 2 ) v4 = 0; if ( v1 == 1 ) v4 = 0; v5 = 40i64; if ( v1 == 15 ) v5 = 136i64; if ( v1 == 14 ) v5 = 104i64; if ( v1 == 13 ) v5 = 104i64; if ( v1 == 12 ) v5 = 32i64; if ( v1 == 11 ) v5 = 40i64; if ( v1 == 10 ) v5 = 80i64; if ( v1 == 9 ) v5 = 48i64; if ( v1 == 8 ) v5 = 224i64; if ( v1 == 7 ) v5 = 48i64; if ( v1 == 6 ) v5 = 32i64; if ( v1 == 5 ) v5 = 24i64; if ( v1 == 4 ) v5 = 40i64; if ( v1 == 3 ) v5 = 16i64; if ( v1 == 2 ) v5 = 0i64; if ( v1 == 1 ) v5 = 0i64; result = (void **)Perl_more_bodies(v1, v5 & -(signed __int64)(v1 != 0), v4 & (unsigned int)-(v1 != 0)); } PL_body_roots[v2] = *result; return result; } ------------ disassembly view of S_new_body() ------------ cmp r9d, 0Fh lea edi, [rbp+28h] mov r8d, 0FF0h lea r11d, [rbp+20h] mov edx, 0CC0h lea r10d, [rbp+30h] mov ecx, r8d mov eax, r9d cmovz ecx, edx cmp r9d, 0Eh mov edx, 820h cmovz ecx, edx cmp r9d, 0Dh lea edx, [r8-18h] cmovz ecx, edx cmp r9d, 0Ch lea edx, [r8-10h] cmovz ecx, edx cmp r9d, 0Bh lea edx, [r8-30h] cmovz ecx, r8d cmp r9d, 0Ah cmovz ecx, r8d cmp r9d, 9 cmovz ecx, r8d cmp r9d, 8 cmovz ecx, edx cmp r9d, 7 mov edx, 0CE0h cmovz ecx, r8d cmp r9d, 6 cmovz ecx, edx cmp r9d, 5 mov edx, 0D60h cmovz ecx, edx cmp r9d, 4 mov edx, 0C30h cmovz ecx, edx cmp r9d, 3 mov edx, 0DD0h cmovz ecx, edx cmp r9d, 2 lea edx, [rdi+60h] cmovz ecx, ebp cmp r9d, 1 cmovz ecx, ebp neg eax sbb eax, eax and eax, ecx mov ecx, r9d mov r8d, eax cmp r9d, 0Fh mov eax, edi cmovz eax, edx cmp r9d, 0Eh lea edx, [rbp+68h] cmovz eax, edx cmp r9d, 0Dh cmovz eax, edx cmp r9d, 0Ch lea edx, [rbp+50h] cmovz eax, r11d cmp r9d, 0Bh cmovz eax, edi cmp r9d, 0Ah cmovz eax, edx cmp r9d, 9 mov edx, 0E0h cmovz eax, r10d cmp r9d, 8 cmovz eax, edx cmp r9d, 7 lea edx, [rbp+18h] cmovz eax, r10d cmp r9d, 6 cmovz eax, r11d cmp r9d, 5 cmovz eax, edx cmp r9d, 4 lea edx, [rbp+10h] cmovz eax, edi cmp r9d, 3 cmovz eax, edx cmp r9d, 2 cmovz eax, ebp cmp r9d, 1 cmovz eax, ebp neg ecx mov ecx, r9d sbb rdx, rdx and rdx, rax call Perl_more_bodies ---------------------- 17 test ops and 17 conditional_move_constant_8_bits ops solution, turn S_new_body() back into a macro so no CC ever tries to ref-inline it. It was a macro before sv_inline.h branch was merged TODO add XSApitest.xs that worlds longest macros are identical to the master correct copy (struct body_details). byte size drops from before these 3 commits to this "success commit" mp.exe 0x1241AC-0x1224EC=7360 0x19D3D8-0x19B8E8=6896 p541.dll 0x154886-0x1532A6=5600 0x1AA19E-0x1A862E=7024 BEFORE Dump of file ..\miniperl.exe SECTION HEADER Perl#1 .text name 1241AC virtual size SECTION HEADER Perl#2 .rdata name 19D3D8 virtual size Dump of file ..\perl541.dll SECTION HEADER Perl#1 .text name 154886 virtual size SECTION HEADER Perl#2 .rdata name 1AA19E virtual size AFTER Dump of file ..\perl541.dll SECTION HEADER Perl#1 .text name 1532A6 virtual size SECTION HEADER Perl#2 .rdata name 1A862E virtual size Dump of file ..\miniperl.exe SECTION HEADER Perl#1 .text name 1224EC virtual size SECTION HEADER Perl#2 .rdata name 19B8E8 virtual size
Add builtin::getcwd Perl#3 gcc syntax fixes -win32_get_childdir() skip the strlen() because GetCurrentDirectoryA() gives it to us, handle 32KB paths if encountered. It is an infinite retry loop since its been reported, in multithreading, GCD()/CWD can change and get longer on our OS thread's between overflow Perl#1 and correct-size attempt Perl#2 because CWD val is a race cond. So all overflow conditions must trigger realloc. If the whole C stack is used up with alloca() and infinite retry. A SEGV is good. Win API/UNICODE_STRING struct is hard coded to USHORT/SHORT. If GetCurrentDirectoryA() returns above 65KB a SEGV is good. If GetCurrentDirectoryA()'s impl is doing {return buflen+1;} which is an API violation, OS is damaged, a SEGV is good. The race retry will never realistically trigger more than 1x ever, 2x rounds through retry loop might happen after a few centuries semi-guess. -CPerlHost::GetChildDir(void) is TODO now that m_pvDir->GetCurrentDirectoryA/W have correct retvals. -perllib.c silence warnings, return debug code accidentally removed in Merge WinCE and Win32 directories -- Initial patch 7bd379e 4/27/2006 7:30:00 PM
I removed some trailing whitespaces with the following shell commands.