Skip to main content

SPO600 - Project - Stage Three

In this last stage of my SPO600 project, Since I don't have results suitable for upstreaming, I am going to wrap up my project results and do some thorough technical analysis of my results.

First of all, I am going to summary what I did for my project. (If you want to go over the details, you can see my previous posts.)
I picked a software called SSDUP, it is a traffic-aware SSD burst buffer for HPC systems. I noticed that it uses 3 different Murmurhash3 hash functions, the first two hash functions are optimized for x86 platforms and the third hash function is optimized for x64 platforms. I also noticed that it uses 'gcc -std=gnu99' to compile. In order to easier to handler these 3 hash functions, I split them into 3 files and separately testing them on an AArch64 and x86_64 systems.

As the professor said my results in stage two is hard to read, I am going to show my results again in a table format.

First hash function (MurmurHash3_x86_32), the execution time for -O3 is about 802% faster than without compilation option:
without -O3 option
with -O3 option
No code changes
14.117
1.572
Code changes: i+i and len
14.035
N/A

Second hash function (MurmurHash3_x86_128), the execution time for -O3 is about 891% faster than without compilation option:
without -O3 option
with -O3 option
No code changes
13.332
1.338
Code changes: i+i and len
13.543
N/a

Third hash function (MurmurHash3_x64_128), the execution time for -O3 is about 523% faster than without compilation option, and 0.04% faster with code changed:
without -O3 option
with -O3 option
No code changes
8.179
1.315
Code changes: i+i and len
8.137
N/A

All of the tests are first completed on an AArch64 system. My first step to optimize the hash function is to compile my benchmark program with -O3 compilation option. The first two hash functions, which have been optimized for x86 platforms, which has a significant improvement in performance. The third hash function, which has been optimized for x64 platforms, after compiling with -O3 option, which is a very small improvement in performance. My second step in optimization is to change some code in the third function, there is 0.04% faster than without changing the code.

Afterward, I perform the benchmark program on an x86_64 system, the result turns out that it also has a significant improvement in performance if compiling with -O3 option. But the improvement of the third function on an AArch64 system is not as much as different than x86_64 platforms. As a result, compiling with -O3 option for both functions produces the best performance and is the most optimized case.

Comments

Popular posts from this blog

Lab 3

In this lab, we are going to use Assembly language to finish 3 parts. 1. As we are getting familiar with Assembly language, we will create a loop in Assembly to prints out 10 times of "Hello World!". This part is quite easy to do it, here is the source code for x86_64 assembler: ------------------------------------------------------ .text .globl    _start start = 0                       /* starting value for the loop index; note that this is a symbol (constant), not a variable */ max = 10                        /* loop exits when the index hits this number (loop condition is i<max) */ _start:     mov     $start,%r15         /* loop index */     mov     %r15,%r10 loop:         /* ... body of the loop ... do something useful here ... */   ...

SPO600 - Project - Stage One

In our final project, the project will split into 3 stages. This is the first stage of my SPO600 course project. In this stage, we are given a task to find an open source software package that includes a CPU-intensive function or method that compiles to machine code. After I chose the open source software package, I will have to benchmark the performance of the software function on an AArach64 system. When the benchmark job is completed, I will have to think about my strategy that attempts to optimize the hash function for better performance on an AArch64 system and identify it, because those strategies will be used in the second stage of the project. With so many software, I would say picking software is the hardest job in the project, which is the major reason it took me so long to get this post going. But after a lot of research, I picked a software called SSDUP , it is a traffic-aware SSD burst buffer for HPC systems. You can find the source code over here: https://github.com/CGC...

Lab 5

In this lab, we are going to use different approaches to scale volume of sound, and the algorithm’s effect on system performance. Here is some basic knowledge of digital sound: Digital sound is usually represented by a signed 16-bit integer signal sample, taken at a rate of around 44.1 or 48 thousand samples per second for one stream of samples for the left and right stereo channels. In order to change the volume of sound, we will have to scale the volume factor for each sample, the range of 0.00 to 1.00 (silence to full volume). Here is the source code I got from professor: (vol1.h) ------------------------------------------------- #include <stdlib.h> #include <stdio.h> #include <stdint.h> #include "vol.h" // Function to scale a sound sample using a volume_factor // in the range of 0.00 to 1.00. static inline int16_t scale_sample(int16_t sample, float volume_factor) { return (int16_t) (volume_factor * (float) sample); } int main() { // Al...