Skip to main content

SPO600 - Project - Stage Three

In this last stage of my SPO600 project, Since I don't have results suitable for upstreaming, I am going to wrap up my project results and do some thorough technical analysis of my results.

First of all, I am going to summary what I did for my project. (If you want to go over the details, you can see my previous posts.)
I picked a software called SSDUP, it is a traffic-aware SSD burst buffer for HPC systems. I noticed that it uses 3 different Murmurhash3 hash functions, the first two hash functions are optimized for x86 platforms and the third hash function is optimized for x64 platforms. I also noticed that it uses 'gcc -std=gnu99' to compile. In order to easier to handler these 3 hash functions, I split them into 3 files and separately testing them on an AArch64 and x86_64 systems.

As the professor said my results in stage two is hard to read, I am going to show my results again in a table format.

First hash function (MurmurHash3_x86_32), the execution time for -O3 is about 802% faster than without compilation option:
without -O3 option
with -O3 option
No code changes
14.117
1.572
Code changes: i+i and len
14.035
N/A

Second hash function (MurmurHash3_x86_128), the execution time for -O3 is about 891% faster than without compilation option:
without -O3 option
with -O3 option
No code changes
13.332
1.338
Code changes: i+i and len
13.543
N/a

Third hash function (MurmurHash3_x64_128), the execution time for -O3 is about 523% faster than without compilation option, and 0.04% faster with code changed:
without -O3 option
with -O3 option
No code changes
8.179
1.315
Code changes: i+i and len
8.137
N/A

All of the tests are first completed on an AArch64 system. My first step to optimize the hash function is to compile my benchmark program with -O3 compilation option. The first two hash functions, which have been optimized for x86 platforms, which has a significant improvement in performance. The third hash function, which has been optimized for x64 platforms, after compiling with -O3 option, which is a very small improvement in performance. My second step in optimization is to change some code in the third function, there is 0.04% faster than without changing the code.

Afterward, I perform the benchmark program on an x86_64 system, the result turns out that it also has a significant improvement in performance if compiling with -O3 option. But the improvement of the third function on an AArch64 system is not as much as different than x86_64 platforms. As a result, compiling with -O3 option for both functions produces the best performance and is the most optimized case.

Comments

Popular posts from this blog

Lab2

Complied C Lab In this lab, we were asked to compile a C program, using gcc command with different options. At the beginning of this lab, we wrote a simple C program that prints a message: Then using gcc command and the following compiler options to compile the program: -g # enable debugging information -O0 # do not optimize (that's a capital letter and then the digit zero) -fno-builtin # do not use builtin function optimizations Note that the size of file is 73088 bytes We can use objdump --source a.out command to show source code, the source code is under <main> section. And  readelf -p .rodata a.out contains the string to be printed. Then we add the option "-static" to recompile the program, found out the size is changed to 696264 bytes, which is bigger than the original program. And section headers are also increased. Next, I removed the builtin function optimization by remove option "-fno-builtin"...

SPO600 - Project - Stage One

In our final project, the project will split into 3 stages. This is the first stage of my SPO600 course project. In this stage, we are given a task to find an open source software package that includes a CPU-intensive function or method that compiles to machine code. After I chose the open source software package, I will have to benchmark the performance of the software function on an AArach64 system. When the benchmark job is completed, I will have to think about my strategy that attempts to optimize the hash function for better performance on an AArch64 system and identify it, because those strategies will be used in the second stage of the project. With so many software, I would say picking software is the hardest job in the project, which is the major reason it took me so long to get this post going. But after a lot of research, I picked a software called SSDUP , it is a traffic-aware SSD burst buffer for HPC systems. You can find the source code over here: https://github.com/CGC...