[CPUID] Add ISA entries for A64FX and M1 #44194

giordano · 2022-02-16T00:41:13Z

On A64FX we get

julia> Base.BinaryPlatforms.CPUID.cpu_isa()
Base.BinaryPlatforms.CPUID.ISA(Set(UInt32[0x00000016, 0x00000004, 0x00000006, 0x00000007, 0x0000000c, 0x00000008]))

which corresponds to the following set

julia> @eval Base.BinaryPlatforms.CPUID JL_AArch64_lse, JL_AArch64_crc, JL_AArch64_rdm, JL_AArch64_aes, JL_AArch64_sha2, JL_AArch64_sve
(0x00000008, 0x00000007, 0x0000000c, 0x00000004, 0x00000006, 0x00000016)

which is literally the armv8.2-a+crypto set + JL_AArch64_sve.

I'd even go as far as removing armv8.4-a+crypto+sve, I don't think there is any existing CPU at the moment with all those capabilities, it isn't very useful there.

Also, what about backporting to v1.6 and v1.7?

gbaraldi · 2022-02-16T01:53:36Z

Why add a separate ISA entry just for the a64fx? Is it needed somewhere upstream? Also since I think only the a64fx has that ISA combinations I guess you could add fullfp16 too. Besides that LGTM.

staticfloat · 2022-02-16T03:45:05Z

I'd even go as far as removing armv8.4-a+crypto+sve, I don't think there is any existing CPU at the moment with all those capabilities, it isn't very useful there.

Agreed; it's probably more useful to define something like armv8.5-a+fullfp16+fp16ml as an Apple M1 spec. Unfortunately, our CPU autodetection code doesn't work yet, so we'd need to improve that on the C side, in order for this to have an effect on the M1.

gbaraldi · 2022-02-16T12:09:38Z

The M1 Feature detection has been merged #41924, so that should be working right now, just needs the ISA then.

giordano · 2022-02-16T12:26:08Z

Why add a separate ISA entry just for the a64fx? Is it needed somewhere upstream?

This ia mainly used in BinaryBuilder to target specific microarchitectures. We definitely don't want to target all CPUs out there, but we need prototypes of somewhat relevant CPU families (like for the Intel chips above).

Also since I think only the a64fx has that ISA combinations I guess you could add fullfp16 too.

It doesn't have it?

For the record, with #41924, on the base M1 I get

julia> Base.BinaryPlatforms.CPUID.cpu_isa()
Base.BinaryPlatforms.CPUID.ISA(Set(UInt32[0x00000004, 0x00000006, 0x00000007, 0x00000014, 0x0000000c, 0x00000008, 0x00000017]))

which corresponds to the set

julia> @eval Base.BinaryPlatforms.CPUID JL_AArch64_aes, JL_AArch64_sha2, JL_AArch64_crc, JL_AArch64_dotprod, JL_AArch64_rdm, JL_AArch64_lse, JL_AArch64_fp16fml
(0x00000004, 0x00000006, 0x00000007, 0x00000014, 0x0000000c, 0x00000008, 0x00000017)

For reference, features enabled by Apple Clang on this CPU are

"target-features"="+aes,+crc,+crypto,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+v8.5a,+zcm,+zcz"

I'll update the PR later

gbaraldi · 2022-02-16T13:08:44Z

It doesn't have it?

julia/src/processor_arm.cpp

Line 328 in 1a3da30

    
           constexpr auto fujitsu_a64fx = armv8_2a | get_feature_masks(sha2, fullfp16, sve, complxnum);

then this is wrong. Though I checked upstream llvm and I agreed with it.

giordano · 2022-02-16T13:38:53Z

I mean, I showed above what we detect with cpu_isa() and fullfp16

julia/src/features_aarch64.h

Line 20 in 1a3da30

JL_FEATURE_DEF(fullfp16, 9, 0) // HWCAP_FPHP

isn't there, so at very least we can't detect it.

Also, for reference these are the features enabled by the Fujitsu compiler on clang mode (based on LLVM 7):

"target-features"="+crc,+crypto,+fp-armv8,+lse,+neon,+ras,+rdm,+sve,+v8.2a"

fullfp16 isn't here either (while it is on the M1 for Apple Clang, see above)

yuyichao · 2022-02-16T13:43:31Z

LLVM claims that it has fullfp16: https://github.com/llvm/llvm-project/blob/97c151de3de0266b896bb01e98b005fb31f6d3cd/llvm/lib/Target/AArch64/AArch64.td#L984-L986

giordano · 2022-02-16T14:03:49Z

Alright:

julia> CPUID.test_cpu_feature(CPUID.JL_AArch64_fullfp16)
true

The ARM C/C++ compiler reference says that sve implies fullfp16 (at least for their compiler, but that probably means it's true in general).

I think we need to refine

julia/base/cpuid.jl

Line 97 in 1a3da30

    
           all_features = last(last(get(ISAs_by_family, normalize_arch(String(Sys.ARCH)), "" => [ISA(Set{UInt32}())]))).features

which at the moment assumes the ISAs are listed in order of increasing capabilities, which is true for our x86_64 family, but not for aarch64, which is more varied.

giordano · 2022-02-16T14:12:01Z

For completeness, these are all the features we can detect on A64FX, among all those we know:

julia> using Base.BinaryPlatforms.CPUID

julia> aarch64_features = filter!(n -> startswith(String(n), "JL_AArch64"), (names(CPUID; all=true)));

julia> filter!(x -> last(x), [(feat, CPUID.test_cpu_feature(getfield(CPUID, feat))) for feat in aarch64_features])
11-element Vector{Tuple{Symbol, Bool}}:
 (:JL_AArch64_aes, 1)
 (:JL_AArch64_ccpp, 1)
 (:JL_AArch64_complxnum, 1)
 (:JL_AArch64_crc, 1)
 (:JL_AArch64_fullfp16, 1)
 (:JL_AArch64_lse, 1)
 (:JL_AArch64_rdm, 1)
 (:JL_AArch64_sha2, 1)
 (:JL_AArch64_sve, 1)
 (:JL_AArch64_v8_1a, 1)
 (:JL_AArch64_v8_2a, 1)

base/cpuid.jl

giordano · 2022-02-16T20:53:47Z

base/cpuid.jl

+@eval function cpu_isa()
+    return ISA(Set{UInt32}(feat for feat in $(ALL_FEATURES[normalize_arch(String(Sys.ARCH))]) if test_cpu_feature(feat)))
+end


I realised we can avoid always recomputing the list of features for the current architecture and just inline it at precompile time. On my laptop, before:

julia> @benchmark CPUID.cpu_isa() BenchmarkTools.Trial: 10000 samples with 48 evaluations. Range (min … max): 895.604 ns … 90.301 μs ┊ GC (min … max): 0.00% … 98.34% Time (median): 984.552 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.153 μs ± 2.866 μs ┊ GC (mean ± σ): 9.56% ± 3.80% ▃▆█▆▃▁ ▇██████▇▆▄▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂ 896 ns Histogram: frequency by time 2 μs < Memory estimate: 1.41 KiB, allocs estimate: 17.

after:

julia> @benchmark CPUID.cpu_isa() BenchmarkTools.Trial: 10000 samples with 155 evaluations. Range (min … max): 679.871 ns … 18.916 μs ┊ GC (min … max): 0.00% … 89.46% Time (median): 745.200 ns ┊ GC (median): 0.00% Time (mean ± σ): 849.687 ns ± 709.196 ns ┊ GC (mean ± σ): 4.12% ± 4.95% ▁▆█▇▅▃▃▃▃▃▃▂▂▁▁▁ ▂▃ ▂ █████████████████▇█▇▇▇▆▆▇▆▆▆▆▆▆▄▅▅▄▁▅▄▅▅▅▃▁▁▄▁▃▁▁▃▄▃▃▄▃▁▁▄▆██ █ 680 ns Histogram: log(frequency) by time 1.97 μs < Memory estimate: 848 bytes, allocs estimate: 7.

Nice! Less allocations, always a good thing!

With the latest version:

julia> @benchmark Base.BinaryPlatforms.CPUID.cpu_isa() BenchmarkTools.Trial: 10000 samples with 196 evaluations. Range (min … max): 480.342 ns … 12.980 μs ┊ GC (min … max): 0.00% … 94.89% Time (median): 527.505 ns ┊ GC (median): 0.00% Time (mean ± σ): 598.004 ns ± 670.168 ns ┊ GC (mean ± σ): 6.65% ± 5.69% ▄▆███▇▆▄▄▄▃▄▃▂▂▂▂▁▁▁▁▁▁▁▂▁▁▁▂▁ ▂ ▆██████████████████████████████▇▇▇▆█▇█▆▇▆▇▇▇▆▅▅▇▇▅▅▃▆▆▃▆▆▅▄▄▅ █ 480 ns Histogram: log(frequency) by time 1 μs < Memory estimate: 848 bytes, allocs estimate: 7.

I believe it's a bit faster because the new version collects only the features we are interested in, instead of all of those for the given architecture, so we're just doing fewer iterations. The new version is also closer in spirit to what we're currently doing.

giordano · 2022-02-17T00:06:31Z

@yuyichao do you know whether A64FX requires aes? The Fujitsu compiler uses for this chip +crypto, which should imply aes. However I'm on a cluster where

julia> CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)
false

while on Fugaku I get

julia> CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)
true

Also, https://github.com/llvm/llvm-project/blob/97c151de3de0266b896bb01e98b005fb31f6d3cd/llvm/lib/Target/AArch64/AArch64.td#L984-L986 lists only sha2. Fugaku has the chips with 2.0/2.2 GHz, while the other cluster has the 1.8 GHz, I was wondering if it's possible the 1.8 GHz chips don't have aes.

yuyichao · 2022-02-19T09:29:07Z

do you know whether A64FX requires aes? The Fujitsu compiler uses for this chip +crypto, which should imply aes. However I'm on a cluster where

As for the spec I only know as much as the llvm and gcc target feature set says....

An independent way to check the feature set would be LD_SHOW_AUXV=1 /bin/true, which should agree with /proc/cpuinfo.

Fugaku has the chips with 2.0/2.2 GHz, while the other cluster has the 1.8 GHz, I was wondering if it's possible the 1.8 GHz chips don't have aes.

I have no idea, but do the two have the same midr? Their values should be available under /sys/devices/system/cpu/cpu<n>/regs/identification/ (or /proc/cpuinfo). It's possible for different chip variance to have different feature set though I sure hope it's at least distinguishable from the cpuid...

giordano · 2022-02-19T13:13:09Z

Isambard:

$ cat /sys/devices/system/cpu/cpu1/regs/identification/midr_el1
0x00000000461f0010
$ LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO_EHDR: 0xffffa28f0000
AT_??? (0x33): 0x1270
AT_HWCAP:        415fe7
AT_PAGESZ:       65536
AT_CLKTCK:       100
AT_PHDR:         0xaaaaaf000040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0xffffa2900000
AT_FLAGS:        0x0
AT_ENTRY:        0xaaaaaf0016e0
AT_UID:          415400694
AT_EUID:         415400694
AT_GID:          415400694
AT_EGID:         415400694
AT_SECURE:       0
AT_RANDOM:       0xffffd9c9a258
AT_EXECFN:       /bin/true
AT_PLATFORM:     aarch64
$ head -n8 /proc/cpuinfo
processor       : 0
BogoMIPS        : 200.00
Features        : fp asimd evtstrm sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve
CPU implementer : 0x46
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0x001
CPU revision    : 0
$ julia -E 'using Base.BinaryPlatforms.CPUID; CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)'
false

So it looks like AES is indeed not here on this chip?

For comparison, on Fugaku:

$ cat /sys/devices/system/cpu/cpu1/regs/identification/midr_el1
0x00000000461f0010
$ LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO_EHDR: 0x400000070000
AT_??? (0x33): 0x1270
AT_HWCAP:        415fff
AT_PAGESZ:       65536
AT_CLKTCK:       100
AT_PHDR:         0xaaaaaaaa0040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0x400000000000
AT_FLAGS:        0x0
AT_ENTRY:        0xaaaaaaaa16e0
AT_UID:          14463
AT_EUID:         14463
AT_GID:          14026
AT_EGID:         14026
AT_SECURE:       0
AT_RANDOM:       0xffffffffe1f8
AT_EXECFN:       /bin/true
AT_PLATFORM:     aarch64
$ head -n8 /proc/cpuinfo
processor       : 0
BogoMIPS        : 200.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve
CPU implementer : 0x46
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0x001
CPU revision    : 0
$ julia -E 'using Base.BinaryPlatforms.CPUID; CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)'
true

yuyichao · 2022-02-19T16:42:26Z

Does seem like it... But I suspect it's mainly a question for Fujitsu. Their online document shows aes instructions in the performance section but didn't seem to mention anywhere in there if the support for it is conditional. Nor does it mention the values for the system registers the same way the one from ARM does...

giordano · 2022-02-19T17:10:50Z

Ok, thanks for confirming it, then I'll remove AES, as it appears not to be always there (and LLVM doesn't seem to require it either).

* [CPUID] Rework how current ISA is determined * [CPUID] Add ISA entry for A64FX * [CPUID] Add ISA entry for Apple Silicon M1 * [CPUID] Simplify collection of full set of features for architecture * [CPUID] Remove AES from A64FX ISA, not all chips appear to have it (cherry picked from commit f45b6ad)

* [CPUID] Rework how current ISA is determined * [CPUID] Add ISA entry for A64FX * [CPUID] Add ISA entry for Apple Silicon M1 * [CPUID] Simplify collection of full set of features for architecture * [CPUID] Remove AES from A64FX ISA, not all chips appear to have it

* [CPUID] Rework how current ISA is determined * [CPUID] Add ISA entry for A64FX * [CPUID] Add ISA entry for Apple Silicon M1 * [CPUID] Simplify collection of full set of features for architecture * [CPUID] Remove AES from A64FX ISA, not all chips appear to have it (cherry picked from commit f45b6ad)

giordano · 2022-03-25T22:39:05Z

For the record, other people have observed the same differences between A64FX on Isambard and Fugaku: archspec/archspec-json#23 At least I'm glad it isn't just julia 🙂

* [CPUID] Rework how current ISA is determined * [CPUID] Add ISA entry for A64FX * [CPUID] Add ISA entry for Apple Silicon M1 * [CPUID] Simplify collection of full set of features for architecture * [CPUID] Remove AES from A64FX ISA, not all chips appear to have it (cherry picked from commit f45b6ad)

giordano requested a review from staticfloat February 16, 2022 00:41

giordano added the system:arm ARMv7 and AArch64 label Feb 16, 2022

KristofferC added backport 1.6 Change should be backported to release-1.6 backport 1.7 backport 1.8 Change should be backported to release-1.8 labels Feb 16, 2022

giordano force-pushed the mg/cpuid-a64fx-isa branch 2 times, most recently from 201fc6c to d485afb Compare February 16, 2022 20:34

[CPUID] Rework how current ISA is determined

763146e

giordano force-pushed the mg/cpuid-a64fx-isa branch from d485afb to 2703058 Compare February 16, 2022 20:38

giordano commented Feb 16, 2022

View reviewed changes

giordano force-pushed the mg/cpuid-a64fx-isa branch from bf87ed4 to d94c647 Compare February 16, 2022 23:24

giordano added 3 commits February 16, 2022 23:43

[CPUID] Add ISA entry for A64FX

8abda40

[CPUID] Add ISA entry for Apple Silicon M1

2b44fcc

[CPUID] Simplify collection of full set of features for architecture

22eb7a9

giordano force-pushed the mg/cpuid-a64fx-isa branch from d94c647 to 22eb7a9 Compare February 16, 2022 23:45

staticfloat approved these changes Feb 16, 2022

View reviewed changes

giordano changed the title ~~[CPUID] Add ISA entry for A64FX~~ [CPUID] Add ISA entries for A64FX and M1 Feb 17, 2022

KristofferC mentioned this pull request Feb 18, 2022

release-1.8: Backports for julia 1.8-beta1/2 #44237

Merged

33 tasks

giordano mentioned this pull request Feb 19, 2022

BLIS Append 64_ Suffix to All F77 Exported JuliaPackaging/Yggdrasil#4463

Merged

[CPUID] Remove AES from A64FX ISA, not all chips appear to have it

c5e2d4f

KristofferC mentioned this pull request Feb 19, 2022

release-1.6: Backports for 1.6.6 #43735

Merged

50 tasks

giordano merged commit f45b6ad into JuliaLang:master Feb 20, 2022

giordano deleted the mg/cpuid-a64fx-isa branch February 20, 2022 16:17

KristofferC mentioned this pull request Feb 23, 2022

Backports for 1.7.3 #44189

Merged

40 tasks

KristofferC removed the backport 1.8 Change should be backported to release-1.8 label Feb 24, 2022

giordano mentioned this pull request Mar 25, 2022

A64FX processor in Isambard 2 not recognized by archspec archspec/archspec-json#23

Closed

KristofferC removed the backport 1.6 Change should be backported to release-1.6 label May 16, 2022

KristofferC removed the backport 1.7 label May 26, 2022

giordano mentioned this pull request Dec 6, 2022

Remove pmull feature from A64FX archspec/archspec-json#59

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPUID] Add ISA entries for A64FX and M1 #44194

[CPUID] Add ISA entries for A64FX and M1 #44194

giordano commented Feb 16, 2022

gbaraldi commented Feb 16, 2022 •

edited

Loading

staticfloat commented Feb 16, 2022

gbaraldi commented Feb 16, 2022

giordano commented Feb 16, 2022 •

edited

Loading

gbaraldi commented Feb 16, 2022 •

edited

Loading

giordano commented Feb 16, 2022

yuyichao commented Feb 16, 2022 •

edited

Loading

giordano commented Feb 16, 2022

giordano commented Feb 16, 2022 •

edited

Loading

giordano Feb 16, 2022

staticfloat Feb 16, 2022

giordano Feb 16, 2022

giordano commented Feb 17, 2022

yuyichao commented Feb 19, 2022

giordano commented Feb 19, 2022 •

edited

Loading

yuyichao commented Feb 19, 2022 •

edited

Loading

giordano commented Feb 19, 2022

giordano commented Mar 25, 2022

[CPUID] Add ISA entries for A64FX and M1 #44194

[CPUID] Add ISA entries for A64FX and M1 #44194

Conversation

giordano commented Feb 16, 2022

gbaraldi commented Feb 16, 2022 • edited Loading

staticfloat commented Feb 16, 2022

gbaraldi commented Feb 16, 2022

giordano commented Feb 16, 2022 • edited Loading

gbaraldi commented Feb 16, 2022 • edited Loading

giordano commented Feb 16, 2022

yuyichao commented Feb 16, 2022 • edited Loading

giordano commented Feb 16, 2022

giordano commented Feb 16, 2022 • edited Loading

giordano Feb 16, 2022

Choose a reason for hiding this comment

staticfloat Feb 16, 2022

Choose a reason for hiding this comment

giordano Feb 16, 2022

Choose a reason for hiding this comment

giordano commented Feb 17, 2022

yuyichao commented Feb 19, 2022

giordano commented Feb 19, 2022 • edited Loading

yuyichao commented Feb 19, 2022 • edited Loading

giordano commented Feb 19, 2022

giordano commented Mar 25, 2022

gbaraldi commented Feb 16, 2022 •

edited

Loading

giordano commented Feb 16, 2022 •

edited

Loading

gbaraldi commented Feb 16, 2022 •

edited

Loading

yuyichao commented Feb 16, 2022 •

edited

Loading

giordano commented Feb 16, 2022 •

edited

Loading

giordano commented Feb 19, 2022 •

edited

Loading

yuyichao commented Feb 19, 2022 •

edited

Loading