Skip to main content

Lab 3

In this lab, we are going to use Assembly language to finish 3 parts.
1. As we are getting familiar with Assembly language, we will create a loop in Assembly to prints out 10 times of "Hello World!". This part is quite easy to do it, here is the source code for x86_64 assembler:
------------------------------------------------------
.text
.globl    _start

start = 0                       /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 10                        /* loop exits when the index hits this number (loop condition is i<max) */

_start:
    mov     $start,%r15         /* loop index */
    mov     %r15,%r10

loop:
        /* ... body of the loop ... do something useful here ... */

    movq        $len,%rdx
        movq    $msg, %rsi
        movq    $1, %rax
        movq    $1, %rax
        syscall

    inc     %r15                /* increment index */
    cmp     $max,%r15           /* see if we're done */
    jne     loop                /* loop if we're not */

    mov     $0,%rdi             /* exit status */
    mov     $60,%rax            /* syscall sys_exit */
    syscall

.section .rodata
msg:    .ascii       "Hello World!\n"
        len = . - msg
------------------------------------------------------

The output shows below:
------------------------------------------------------
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
------------------------------------------------------

2. The second part is getting harder, we will have to improve the assembly loop code, in order to prints out Loop 0 to Loop 9. Here is the source code I wrote in x86_64 assembler:
------------------------------------------------------
.text
.blobl    _start

stdout = 1
start = 0                        /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 10                         /* loop exits when the index hits this number (loop condition is i<max) */
pos = 6                          /* position of character to be replaced in string */
pos2 = 7                         /* position of newline character */

_start:
    mov     $start,%r15          /* loop index */

loop:
    /* ... body of the loop ... */
    mov     $len,%rdx            /* message length */
    mov     $48,%r14             /* move immediate value 48 (ascii zero) to r14 */
    mov     $10,%r12             /* move immediate value 10 (newline) to r12 */
    add     %r15,%r14            /* add r15 (loop index) to r14 */
    mov     $pos,%r13            /* move value of pos to r13 */
    mov     $pos2,%r11           /* move value of pos2 to r11 */
    mov     $msg,%rsi            /* message location */
    add     %rsi,%r13            /* add rsi (message location) to r13 and store in r13 */
    add     %rsi,%r11            /* add rsi (message location) to r11 and store in r11 */
    mov     %r14,(%r13)          /* move data from r14 to address pointed to by r13 */
    mov     %r12,(%r11)          /* move data from r12 to address pointed to by r11 */
    mov     $stdout,%rdi         /* file descriptior stdout */
    mov     $1,%rax              /* syscall sys_write */
    syscall

    inc     %r15                /* increment index */
    cmp     $max,%r15           /* see if we're done */
    jne     loop                /* loop if we're not */

    mov     $0,%rdi             /* exit status */
    mov     $60,%rax            /* syscall sys_exit */
    syscall
.data

msg:     .ascii  "Loop: x\n"
.set len, . - msg               /* current memory location minus value of label msg */
------------------------------------------------------

And I got the output:
------------------------------------------------------
Loop: 0
Loop: 1
Loop: 2
Loop: 3
Loop: 4
Loop: 5
Loop: 6
Loop: 7
Loop: 8
Loop: 9
------------------------------------------------------

3. In the final part of this lab, it is more complicated. We were being asked to creating a loop to prints out Loop 0 to Loop 29 in Assembly language. Here is the source code I wrote in x86_64 Assembly:
------------------------------------------------------
.text
.global  _start

start = 0 /* loop index starting value */
max = 30 /* loop exits when the index hits this number */

_start:
mov     $start,%r15 /* set loop index */

loop:
mov $0,%rdx /* clear register */
mov %r15,%rax /* store dividend in register */
mov $10,%r10 /* store divisor in register */
div %r10 /* divide */
mov %rax,%r11 /* store quotient in register */
mov %rdx,%r12 /* store remainder in register */
cmp $0,%r11 /* check if quotient is 0 */
je next /* jump if quotient is 0 */
add $0x30,%r11 /* convert to ascii */
mov %r11b,msg+6 /* place quotient in msg */

next:
add $0x30,%r12 /* convert to ascii */
mov %r12b,msg+7 /* place remainder in msg */
        mov $len,%rdx /* message length */
        mov $msg,%rsi /* message location */
        mov $1,%rdi /* file descriptor stdout */
        mov $1,%rax /* syscall sys_write */
        syscall /* syscall */

inc     %r15 /* increment index */
cmp     $max,%r15 /* check if index is at max */
jne     loop /* loop if index not at max */

        mov $0,%rdi /* exit status */
        mov $60,%rax /* syscall sys_exit */
        syscall /* syscall */

.data

msg:    .ascii      "Loop:   \n"
.set len , . - msg
------------------------------------------------------

I also wrote an Aarch64 version:
------------------------------------------------------
.text
.global _start

start = 0                               /* loop index starting value */
max = 30                                /* loop exits when the index hits this number */

_start:
        mov     x3,start                /* set loop index */

loop:
        mov     x9,10 /* store divisor in register */
        udiv    x10,x3,x9 /* divide and store quotient in register */
        msub    x11,x9,x10,x3 /* store remainder in register */
adr     x1,msg                  /* message location */
cmp     x10,0 /* check if quotient is 0 */
b.eq    next /* jump if quotient is 0 */
add     x12,x10,0x30            /* convert to ascii and store in register */
strb    w12,[x1,6]              /* place quotient in msg */

next:
add     x13,x11,0x30            /* convert to ascii and store in register */
strb    w13,[x1,7]              /* place remainder in msg */
        mov     x2,len                  /* message length */
        mov     x0,1                    /* file descriptor stdout */
        mov     x8,64                   /* syscall sys_write */
        svc     0                       /* syscall */

        add     x3,x3,1                 /* increment index */
        cmp     x3,max                  /* check if index is at max */
        b.ne    loop                    /* loop if index not at max */

        mov     x0,0                    /* exit status */
        mov     x8,93                   /* syscall sys_exit */
        svc     0                       /* syscall */

.data

msg: .ascii "Loop:   \n"
.set len, . - msg
------------------------------------------------------

Which will get the same output below:
------------------------------------------------------
Loop:  0
Loop:  1
Loop:  2
Loop:  3
Loop:  4
Loop:  5
Loop:  6
Loop:  7
Loop:  8
Loop:  9
Loop: 10
Loop: 11
Loop: 12
Loop: 13
Loop: 14
Loop: 15
Loop: 16
Loop: 17
Loop: 18
Loop: 19
Loop: 20
Loop: 21
Loop: 22
Loop: 23
Loop: 24
Loop: 25
Loop: 26
Loop: 27
Loop: 28
Loop: 29
------------------------------------------------------

Overall, this lab is pretty difficult for me, Assembly language is much more complicated than other languages such as C/C++ Java, etc. It has to convert the number to Ascii number and combined it to print a string out.
What is different between x86_64 Assembly language and Aarch64 Assembly?
In my opinion, these Assembly languages are not much different, but I would say x86_64 is easier to code. In x86_64 Assembly language, it takes 2 arguments and stores the result in the second argument. In AArch64, it stores the result into argument 1, and uses values from argument 2 and argument 3.

Comments

Popular posts from this blog

Lab2

Complied C Lab In this lab, we were asked to compile a C program, using gcc command with different options. At the beginning of this lab, we wrote a simple C program that prints a message: Then using gcc command and the following compiler options to compile the program: -g # enable debugging information -O0 # do not optimize (that's a capital letter and then the digit zero) -fno-builtin # do not use builtin function optimizations Note that the size of file is 73088 bytes We can use objdump --source a.out command to show source code, the source code is under <main> section. And  readelf -p .rodata a.out contains the string to be printed. Then we add the option "-static" to recompile the program, found out the size is changed to 696264 bytes, which is bigger than the original program. And section headers are also increased. Next, I removed the builtin function optimization by remove option "-fno-builtin"...

Lab 6A

This lab is separated into two parts, I'll blog my work in different post. In the first part, we've got a source code from professor Chris, which is a similar stuff to our lab5, scaling the volume of sound, but it includes inline assembler. The first thing I'll do is add a timer to the code in order to check the performing time. Build and run the program, here is the output: ------------------------------------------------------------------------- [qichang@aarchie spo600_20181_inline_assembler_lab]$ ./vol_simd Generating sample data. Scaling samples. Summing samples. Result: -462 Time: 0.024963 seconds. ------------------------------------------------------------------------- Then I adjusted the number of samples to 5000000 in vol.h: ------------------------------------------------------------------------- [qichang@aarchie spo600_20181_inline_assembler_lab]$ cat vol_simd.c // vol_simd.c :: volume scaling in C using AArch64 SIMD // Chris Tyler 2017.11.29-2018...

SPO600 - Project - Stage Three

In this last stage of my SPO600 project, Since I don't have results suitable for upstreaming, I am going to wrap up my project results and do some thorough technical analysis of my results. First of all, I am going to summary what I did for my project. (If you want to go over the details, you can see my previous posts.) I picked a software called SSDUP, it is a traffic-aware SSD burst buffer for HPC systems. I noticed that it uses 3 different Murmurhash3 hash functions, the first two hash functions are optimized for x86 platforms and the third hash function is optimized for x64 platforms. I also noticed that it uses 'gcc -std=gnu99' to compile. In order to easier to handler these 3 hash functions, I split them into 3 files and separately testing them on an AArch64 and x86_64 systems. As the professor said my results in stage two is hard to read, I am going to show my results again in a table format. First hash function (MurmurHash3_x86_32), the execution time for -O3...