Skip to content

Commit

Permalink
fixup: cpu: x64: improve tbb bnorm realtime inference performance
Browse files Browse the repository at this point in the history
Realtime inference with HT on could use up to 8 threads.
  • Loading branch information
kwiersch authored and vpirogov committed Nov 7, 2022
1 parent d43c70d commit 4fd5ab2
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/cpu/x64/jit_uni_tbb_batch_normalization.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2169,7 +2169,7 @@ struct driver_t : public c_compatible {
dim_t total_size = size_src_dst + size_stats_ss_tensors;

// Try to create at least nthr_ chunks for realtime inference
const int n_chunks_min = nthr_ <= 4 ? nstl::min(4, nthr_) : 1;
const int n_chunks_min = nthr_ <= 8 ? nthr_ : 1;
const size_t l2_per_core = platform::get_per_core_cache_size(2);
dim_t n_chunks
= nstl::max<dim_t>(n_chunks_min, total_size / l2_per_core);
Expand Down

0 comments on commit 4fd5ab2

Please sign in to comment.