-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup C encoder up to 100x #256
base: master
Are you sure you want to change the base?
Conversation
Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
Apple M1 Pro
* Result for M1 Pro was fixed, since previous results was affected by the bug. |
@DagAgren Are you interested in this improvements? |
I also improved decoder performance about 14 times using the same techniques: caching cos values, linearTosRGB values and unrolling loops. This improves performance of decoding from 6 Mpx/s to 86 Mpx/s on M1.
This also introduces very minor change in output result. Nothing that could be noticed by human eye, just different binary output. The method which I use to measure performance is following: diff --git forkSrcPrefix/C/encode_stb.c forkDstPrefix/C/encode_stb.c
index 811ca00006b45eaa829bfd267904ac0d0c647884..a95c6a2ff96ee7cdaa9d1b35ef28b063161cf01d 100644
--- forkSrcPrefix/C/encode_stb.c
+++ forkDstPrefix/C/encode_stb.c
@@ -4,6 +4,7 @@
#include "stb_image.h"
#include <stdio.h>
+#include <time.h>
const char *blurHashForFile(int xComponents, int yComponents,const char *filename);
@@ -38,6 +39,14 @@ const char *blurHashForFile(int xComponents, int yComponents,const char *filenam
const char *hash = blurHashForPixels(xComponents, yComponents, width, height, data, width * 3);
+ #define TIMES 30
+ clock_t start = clock();
+ for (int i = 0; i < TIMES; i++) {
+ hash = blurHashForPixels(xComponents, yComponents, width, height, data, width * 3);
+ }
+ double time_ms = (double)(clock() - start) / CLOCKS_PER_SEC / TIMES;
+ printf("Time per %d execution: %.3f ms\n", TIMES, time_ms * 1000);
+
stbi_image_free(data);
return hash;
diff --git forkSrcPrefix/C/decode_stb.c forkDstPrefix/C/decode_stb.c
index dab164e1eaf1a7199a751a5e13f6da7099027bd2..3514f53e6f91dc41253429ea07e594893d536598 100644
--- forkSrcPrefix/C/decode_stb.c
+++ forkDstPrefix/C/decode_stb.c
@@ -3,6 +3,8 @@
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_writer.h"
+#include <time.h>
+
int main(int argc, char **argv) {
if(argc < 5) {
fprintf(stderr, "Usage: %s hash width height output_file [punch]\n", argv[0]);
@@ -34,6 +36,15 @@ int main(int argc, char **argv) {
freePixelArray(bytes);
+ #define TIMES 30
+ clock_t start = clock();
+ for (int i = 0; i < TIMES; i++) {
+ uint8_t * tmpbytes = decode(hash, width, height, punch, nChannels);
+ freePixelArray(tmpbytes);
+ }
+ double time_ms = (double)(clock() - start) / CLOCKS_PER_SEC / TIMES;
+ printf("Time per %d execution: %.3f ms\n", TIMES, time_ms * 1000);
+
fprintf(stdout, "Decoded blurhash successfully, wrote PNG file %s\n", output_file);
return 0;
} |
@DagAgren How can I earn your attention? |
@DagAgren please note that |
All changes are divided by independent commits, some of them are optional.
In addition to improving performance there are changes:
M_PI
in sources, ensure it defined inmath.h
.blurhash_encoder
executable (in line withblurHashForPixels
function)Makefile
to avoid heavyencode_stb
recompilation on each change.Benchmarks are in the comment.