This lab is going to exploring single instruction/multiple data (SIMD) vectorization, and the auto-vectorization capabilities of the GCC compiler. For the people who not familiar with Vectorization, this article will help: Automatic vectorization
In this lab, we are going to write a short program that:
-Create two 1000-element integer arrays
-Fill them with random numbers in the rang -1000 to +1000
-Sum up those two arrays element-by-element to a third array
-Sum up the third array
-Print out the result
Here is the source code I wrote:
------------------------------------------------------
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main(){
int sum;
int arr1[1000];
int arr2[1000];
int arr3[1000];
srand(time(NULL));
for(int i=0; i<1000; i++){
arr1[i] = rand() % 2001 - 1000;
arr2[i] = rand() % 2001 - 1000;
}
for(int i=0; i<1000; i++){
arr3[i] = arr1[i] + arr2[i];
}
for(int i=0; i<1000; i++){
sum += arr3[i];
}
printf("Sum is: %d\n", sum);
}
------------------------------------------------------
I will using the command 'gcc -O3 -o lab4 lab4.c' to compile this program.
But how do we know this will vectorize our program? Check the article in gcc article on vectorization
Vectorization is enabled by default when using -O3 optimazation.
Check the instructions in <main>:
------------------------------------------------------
Disassembly of section .text:
0000000000400560 <main>:
400560: d285e410 mov x16, #0x2f20 // #12064
400564: cb3063ff sub sp, sp, x16
400568: d2800000 mov x0, #0x0 // #0
40056c: a9007bfd stp x29, x30, [sp]
400570: 910003fd mov x29, sp
400574: a90153f3 stp x19, x20, [sp, #16]
400578: 529a9c74 mov w20, #0xd4e3 // #54499
40057c: a9025bf5 stp x21, x22, [sp, #32]
400580: 72a83014 movk w20, #0x4180, lsl #16
400584: f9001bf7 str x23, [sp, #48]
400588: 910103b6 add x22, x29, #0x40
40058c: 913f83b5 add x21, x29, #0xfe0
400590: 5280fa33 mov w19, #0x7d1 // #2001
400594: d2800017 mov x23, #0x0 // #0
400598: 97ffffd6 bl 4004f0 <time@plt>
40059c: 97ffffe9 bl 400540 <srand@plt>
4005a0: 97ffffdc bl 400510 <rand@plt>
4005a4: 9b347c01 smull x1, w0, w20
4005a8: 9369fc21 asr x1, x1, #41
4005ac: 4b807c21 sub w1, w1, w0, asr #31
4005b0: 1b138020 msub w0, w1, w19, w0
4005b4: 510fa000 sub w0, w0, #0x3e8
4005b8: b8376ac0 str w0, [x22, x23]
4005bc: 97ffffd5 bl 400510 <rand@plt>
4005c0: 9b347c01 smull x1, w0, w20
4005c4: 9369fc21 asr x1, x1, #41
4005c8: 4b807c21 sub w1, w1, w0, asr #31
4005cc: 1b138020 msub w0, w1, w19, w0
4005d0: 510fa000 sub w0, w0, #0x3e8
4005d4: b8376aa0 str w0, [x21, x23]
4005d8: 910012f7 add x23, x23, #0x4
4005dc: f13e82ff cmp x23, #0xfa0
4005e0: 54fffe01 b.ne 4005a0 <main+0x40> // b.any
4005e4: d283f002 mov x2, #0x1f80 // #8064
4005e8: 8b0203a1 add x1, x29, x2
4005ec: d2800000 mov x0, #0x0 // #0
4005f0: 3ce06ac0 ldr q0, [x22, x0]
4005f4: 3ce06aa1 ldr q1, [x21, x0]
4005f8: 4ea18400 add v0.4s, v0.4s, v1.4s
4005fc: 3ca06820 str q0, [x1, x0]
400600: 91004000 add x0, x0, #0x10
400604: f13e801f cmp x0, #0xfa0
400608: 54ffff41 b.ne 4005f0 <main+0x90> // b.any
40060c: 4f000400 movi v0.4s, #0x0
400610: aa0103e0 mov x0, x1
400614: d285e401 mov x1, #0x2f20 // #12064
400618: 8b0103a1 add x1, x29, x1
40061c: 3cc10401 ldr q1, [x0], #16
400620: 4ea18400 add v0.4s, v0.4s, v1.4s
400624: eb01001f cmp x0, x1
400628: 54ffffa1 b.ne 40061c <main+0xbc> // b.any
40062c: 4eb1b800 addv s0, v0.4s
400630: 90000000 adrp x0, 400000 <_init-0x4b8>
400634: 91208000 add x0, x0, #0x820
400638: 0e043c01 mov w1, v0.s[0]
40063c: 97ffffc5 bl 400550 <printf@plt>
400640: f9401bf7 ldr x23, [sp, #48]
400644: a94153f3 ldp x19, x20, [sp, #16]
400648: 52800000 mov w0, #0x0 // #0
40064c: a9425bf5 ldp x21, x22, [sp, #32]
400650: d285e410 mov x16, #0x2f20 // #12064
400654: a9407bfd ldp x29, x30, [sp]
400658: 8b3063ff add sp, sp, x16
40065c: d65f03c0 ret
------------------------------------------------------
SIMD VECTOR INSTRUCTIONS:
------------------------------------------------------
4005a4: 9b347c01 smull x1, w0, w20
4005c0: 9b347c01 smull x1, w0, w20
------------------------------------------------------
VECTORIZED:
------------------------------------------------------
4005f8: 4ea18400 add v0.4s, v0.4s, v1.4s
4005f8: 4ea18400 add v0.4s, v0.4s, v1.4s
40060c: 4f000400 movi v0.4s, #0x0
400620: 4ea18400 add v0.4s, v0.4s, v1.4s
------------------------------------------------------
Here are the articles to explans how can we identify a program was vectorized by looking for the SIMD vector registers: https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/a64_simd_vector.html
In this lab, we are going to write a short program that:
-Create two 1000-element integer arrays
-Fill them with random numbers in the rang -1000 to +1000
-Sum up those two arrays element-by-element to a third array
-Sum up the third array
-Print out the result
Here is the source code I wrote:
------------------------------------------------------
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main(){
int sum;
int arr1[1000];
int arr2[1000];
int arr3[1000];
srand(time(NULL));
for(int i=0; i<1000; i++){
arr1[i] = rand() % 2001 - 1000;
arr2[i] = rand() % 2001 - 1000;
}
for(int i=0; i<1000; i++){
arr3[i] = arr1[i] + arr2[i];
}
for(int i=0; i<1000; i++){
sum += arr3[i];
}
printf("Sum is: %d\n", sum);
}
------------------------------------------------------
I will using the command 'gcc -O3 -o lab4 lab4.c' to compile this program.
But how do we know this will vectorize our program? Check the article in gcc article on vectorization
Vectorization is enabled by default when using -O3 optimazation.
Check the instructions in <main>:
------------------------------------------------------
Disassembly of section .text:
0000000000400560 <main>:
400560: d285e410 mov x16, #0x2f20 // #12064
400564: cb3063ff sub sp, sp, x16
400568: d2800000 mov x0, #0x0 // #0
40056c: a9007bfd stp x29, x30, [sp]
400570: 910003fd mov x29, sp
400574: a90153f3 stp x19, x20, [sp, #16]
400578: 529a9c74 mov w20, #0xd4e3 // #54499
40057c: a9025bf5 stp x21, x22, [sp, #32]
400580: 72a83014 movk w20, #0x4180, lsl #16
400584: f9001bf7 str x23, [sp, #48]
400588: 910103b6 add x22, x29, #0x40
40058c: 913f83b5 add x21, x29, #0xfe0
400590: 5280fa33 mov w19, #0x7d1 // #2001
400594: d2800017 mov x23, #0x0 // #0
400598: 97ffffd6 bl 4004f0 <time@plt>
40059c: 97ffffe9 bl 400540 <srand@plt>
4005a0: 97ffffdc bl 400510 <rand@plt>
4005a4: 9b347c01 smull x1, w0, w20
4005a8: 9369fc21 asr x1, x1, #41
4005ac: 4b807c21 sub w1, w1, w0, asr #31
4005b0: 1b138020 msub w0, w1, w19, w0
4005b4: 510fa000 sub w0, w0, #0x3e8
4005b8: b8376ac0 str w0, [x22, x23]
4005bc: 97ffffd5 bl 400510 <rand@plt>
4005c0: 9b347c01 smull x1, w0, w20
4005c4: 9369fc21 asr x1, x1, #41
4005c8: 4b807c21 sub w1, w1, w0, asr #31
4005cc: 1b138020 msub w0, w1, w19, w0
4005d0: 510fa000 sub w0, w0, #0x3e8
4005d4: b8376aa0 str w0, [x21, x23]
4005d8: 910012f7 add x23, x23, #0x4
4005dc: f13e82ff cmp x23, #0xfa0
4005e0: 54fffe01 b.ne 4005a0 <main+0x40> // b.any
4005e4: d283f002 mov x2, #0x1f80 // #8064
4005e8: 8b0203a1 add x1, x29, x2
4005ec: d2800000 mov x0, #0x0 // #0
4005f0: 3ce06ac0 ldr q0, [x22, x0]
4005f4: 3ce06aa1 ldr q1, [x21, x0]
4005f8: 4ea18400 add v0.4s, v0.4s, v1.4s
4005fc: 3ca06820 str q0, [x1, x0]
400600: 91004000 add x0, x0, #0x10
400604: f13e801f cmp x0, #0xfa0
400608: 54ffff41 b.ne 4005f0 <main+0x90> // b.any
40060c: 4f000400 movi v0.4s, #0x0
400610: aa0103e0 mov x0, x1
400614: d285e401 mov x1, #0x2f20 // #12064
400618: 8b0103a1 add x1, x29, x1
40061c: 3cc10401 ldr q1, [x0], #16
400620: 4ea18400 add v0.4s, v0.4s, v1.4s
400624: eb01001f cmp x0, x1
400628: 54ffffa1 b.ne 40061c <main+0xbc> // b.any
40062c: 4eb1b800 addv s0, v0.4s
400630: 90000000 adrp x0, 400000 <_init-0x4b8>
400634: 91208000 add x0, x0, #0x820
400638: 0e043c01 mov w1, v0.s[0]
40063c: 97ffffc5 bl 400550 <printf@plt>
400640: f9401bf7 ldr x23, [sp, #48]
400644: a94153f3 ldp x19, x20, [sp, #16]
400648: 52800000 mov w0, #0x0 // #0
40064c: a9425bf5 ldp x21, x22, [sp, #32]
400650: d285e410 mov x16, #0x2f20 // #12064
400654: a9407bfd ldp x29, x30, [sp]
400658: 8b3063ff add sp, sp, x16
40065c: d65f03c0 ret
------------------------------------------------------
SIMD VECTOR INSTRUCTIONS:
------------------------------------------------------
4005a4: 9b347c01 smull x1, w0, w20
4005c0: 9b347c01 smull x1, w0, w20
------------------------------------------------------
VECTORIZED:
------------------------------------------------------
4005f8: 4ea18400 add v0.4s, v0.4s, v1.4s
4005f8: 4ea18400 add v0.4s, v0.4s, v1.4s
40060c: 4f000400 movi v0.4s, #0x0
400620: 4ea18400 add v0.4s, v0.4s, v1.4s
------------------------------------------------------
Here are the articles to explans how can we identify a program was vectorized by looking for the SIMD vector registers: https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/a64_simd_vector.html
Comments
Post a Comment