Skip to main content
  1. Posts/
  2. Today I Learned/

CSAPP 3.11 Floating-Point Code

·642 words·4 mins
Jiho Kim
Author
Jiho Kim
๋‹ฌ๋ ค ๋˜ ๋‹ฌ๋ ค

๐Ÿ“ ์ƒ์„ธ ์ •๋ฆฌ
#

  • Floating-Point Code
    • ๋ถ€๋™์†Œ์ˆ˜์  ์•„ํ‚คํ…์ณ๋Š” ์–ด์ผ€ ๋™์ž‘ํ• ๋ผ๋‚˜
      • ๋ถ€๋™์†Œ์ˆ˜์  ๊ฐ’์ด ์ €์žฅ๋˜๊ณ  ์—‘์„ธ์Šค๋˜๋Š” ๋ฐฉ๋ฒ•
      • ๋ถ€๋™์†Œ์ˆซ์  ๋ฐ์ดํ„ฐ์—์„œ ์ž‘๋™ํ•˜๋Š” ๋ช…๋ น์–ด
      • ๋ถ€๋™์†Œ์ˆ˜์  ๊ฐ’์„ ํ•จ์ˆ˜์— ์ „๋‹ฌํ•˜๊ณ  ๋ฐ˜ํ™˜ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์ปจ๋ฒค์…˜/๊ด€๋ก€
      • ํ•จ์ˆ˜ ํ˜ธ์ถœ ์ค‘ ๋ ˆ์ง€์Šคํ„ฐ๊ฐ€ ๋ณด์กด๋˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๊ทœ์•ฝ
    • SSE๋Š” 128๋น„ํŠธ, AVX๋Š” 256๋น„ํŠธ, AVX-512๋Š” 512๋น„ํŠธ..
    • %xmm0, %xmm1๋“ฑ์˜ ์–ด์…ˆ๋ธ”๋ฆฌ ์ฝ”๋“œ๊ฐ€ ๋‚˜์˜ค๋ฉด ํ˜„๋Œ€์ ์ธ ๋ ˆ์ง€์Šคํ„ฐ๋‹ค!
      • %rax, %rbx, %rsp, … %r15๋ž‘์€ ์•„์˜ˆ ๋ณ„๊ฐœ. ์œ„์น˜๋„ ๋‹ค๋ฅธ๋“ฏ?
      • ์—ฐ์‚ฐ ๋ฐฉ์‹ ์ž์ฒด๊ฐ€ ๋ณด์ˆ˜๋ฐฉ์‹ / ์ง€์ˆ˜๊ฐ€์ˆ˜ ๋ฐฉ์‹์œผ๋กœ ๋‹ค๋ฅด๋‹ˆ๊นŒ ๋‹ค๋ฅธ ์žฅ์น˜์—์„œ
        • SIMD๋„ ๊ฐ€๋Šฅํ•˜๋‹ค!
  • 3.11.1 Floating-Point Movement and Conversion Operations
    • vmovss / vmovsd / vmovaps / vmovapd ๊ฐ™์€ ๋ช…๋ น์–ด๋“ค์ด ์žˆ๋‹ค
    • ์ฝ”๋“œ ์ตœ์ ํ™” ์ง€์นจ์€ 32๋น„ํŠธ ๋ฐ์ดํ„ฐ๋Š” 4๋ฐ”์ดํŠธ ์ •๋ ฌ์„, 64๋น„ํŠธ ๋ฐ์ดํ„ฐ๋Š” 8๋ฐ”์ดํŠธ ์ •๋ ฌ์„ ๋งŒ์กฑํ•˜๋„๋ก ๊ถŒ์žฅํ•˜์ง€๋งŒ ์•ˆ๊ทธ๋ž˜๋„ ๋™์ž‘์€ ํ•œ๋‹ค
    • ์ •์ˆ˜ mov๊ณผ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ์›ฌ๋งŒํ•˜๋ฉด ๋™์ž‘ํ•œ๋‹ค!
    • GCC๋Š” ์Šค์นผ๋ผ ์ด๋™ ์—ฐ์‚ฐ์„ xmm ๋ ˆ์ง€์Šคํ„ฐ - ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์ด์—์„œ๋งŒ ์ˆ˜ํ–‰ํ•œ๋‹ค.
      • xmm ๋ ˆ์ง€์Šคํ„ฐ ์‚ฌ์ด์—์„œ ์ „์†กํ•˜๊ธฐ ์œ„ํ•ด์„  vmovaps๋‚˜ vmoapd
      • ์•ˆ์—์žˆ๋Š” a๋Š” aligned, ์ •๋ ฌ์„ ์˜๋ฏธํ•œ๋‹ค
    • float float_mov(float v1, float *src, float *dst){
          float v2 = *src;
          *dst = v1;
          return v2;
      }
    • ์œ„ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฒˆ์—ญ๋œ๋‹ค.
    •   v1 in %xmm0, src in %rdi, dst in %rsi
    1 float_mov: 2 vmovaps %xmm0, %xmm1 Copy v1 3 vmovss (%rdi), %xmm0 Read v2 from src 4 vmovss %xmm1, (%rsi) Write v1 to dst 5 ret Return v2 in %xmm0 ```
    • vmovaps, vmovss๋ฅผ ๋‘˜๋‹ค ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค!
    • float -> ์ •์ˆ˜์—์„œ๋Š” ์ž˜ ๋ฐ˜์˜ฌ๋ฆผํ•ด์„œ ๋“ค์–ด๊ฐ
    • ์ •์ˆ˜ -> float์—์„œ๋Š” ์‹ ๊ธฐํ•˜๊ฒŒ๋„ ๋’ค์— ํ”ผ์—ฐ์‚ฐ์ž๊ฐ€ 3๊ฐœ ๋“ค์–ด๊ฐ„๋‹ค
      • ์ด๋•Œ ๋‘๋ฒˆ์งธ ํ”ผ์—ฐ์‚ฐ์ž๋Š” ์ƒ์œ„ ๋ฐ”์ดํŠธ์—๋งŒ ์˜ํ–ฅ์„ ๋ฏธ์ณ์„œ ๋ฌด์‹œํ•ด๋„ ๋œ๋‹ค
        • ์ผ๋‹จ ๋‘๋ฒˆ์งธ ์—ฐ์‚ฐ์ž๋Š” ๊ฒฐ๊ณผ๊ฐ€ ๋“ค์–ด๊ฐˆ ๋ ˆ์ง€์Šคํ„ฐ์˜๊ธฐ ๊ธฐ์กด ๊ฐ’์„ ์˜๋ฏธํ•œ๋‹ค.
        • ์›ฌ๋งŒํ•œ ์—ฐ์‚ฐ์—์„œ ๋‘๋ฒˆ์งธ ํ”ผ์—ฐ์‚ฐ์ž์™€ ์„ธ๋ฒˆ์งธ ๋ชฉ์ ์ง€ ํ”ผ์—ฐ์‚ฐ์ž๋Š” ๊ฐ™๋‹ค
    • float -> float์—์„œ
      • ์กฐ๊ธˆ ์ด์ƒํ•œ ์ฝ”๋“œ๊ฐ€ ์ƒ์„ฑ๋œ๋‹ค
      • ์ƒ์œ„๋น„ํŠธ๋ฅผ ๋‹ค์‹œ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•จ
        • ๊ฐ€์งœ ์˜์กด์„ฑ
          • ์ด์ „์˜ ์ƒ์œ„ ๋น„ํŠธ๊ฐ’์„ ์จ์•ผ๋ ๋•Œ ๊ฐ’์„ ๊ธฐ๋‹ค๋ฆฌ๊ฒŒ ํ•˜์ง€ ์•Š๊ธฐ ์œ„ํ•ด, ๊ทธ๋ƒฅ ๋ฎ์–ด๋ฒ„๋ฆฐ๋‹ค
        • single -> double, double -> single ๋‘˜๋‹ค ๋งˆ์ฐฌ๊ฐ€์ง€
  • 3.11.2 Floating-Point Code in Procedures
    • ํฌ์•„์•… ํ”„๋กœ์‹œ์ ธ๋‹ค
      • ์–ธ์ œ๋‚˜ ๊ทธ๋žฌ๋“ฏ XMM ๋ ˆ์ง€์Šคํ„ฐ๋ฅผ ์ด์šฉํ•ด์„œ float ์ธ์ˆ˜๋“ค์„ ํ•จ์ˆ˜๋กœ ์ „๋‹ฌํ•˜๊ณ  ๋ฐ˜ํ™˜๋ฐ›๊ณ  ํ•œ๋‹ค
    • x86-64์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ด€์Šต์ด ๊ด€์ฐฐ๋œ๋‹ค
      • ์ตœ๋Œ€ 8๊ฐœ์˜ float arg๊ฐ€ xmm0~7๋กœ ์ „๋‹ฌ๋˜๊ณ , ๋” ํ•„์š”ํ•˜๋ฉด ์Šคํƒ ์‚ฌ์šฉ
      • float ๋ฐ˜ํ™˜์€ xm0์—์„œ
      • ๋ชจ๋“  XMM ๋ ˆ์ง€์Šคํ„ฐ๋Š” caller-saved. ์ดํ›„์— ํ˜ธ์ถœ๋œ๋†ˆ์ด ๋ง˜๋Œ€๋กœ ๋ฐ”๊ฟ”๋„ ๋จ
      • double f1(int x, double y, long z);
      • ์œ„์˜ ์˜ˆ์—์„œ, %edi์— x, %xmm0์— y, %rsi์— z
  • 3.11.3 Floating-Point Arithmetic Operations
    • double funct(double a, float x, double b, int i){
          return a*x - b/i;
      }
    • ์œ„ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ๊ฐ™์ด ๋ฒˆ์—ญ๋œ๋‹ค.
    • 	a in %xmm0, x in %xmm1, b in %xmm2, i in %edi
    1 funct: The following two instructions convert x to double 2 vunpcklps %xmm1, %xmm1, %xmm1 3 vcvtps2pd %xmm1, %xmm1 4 vmulsd %xmm0, %xmm1, %xmm0 Multiply a by x 5 vcvtsi2sd %edi, %xmm1, %xmm1 Convert i to double 6 vdivsd %xmm1, %xmm2, %xmm2 Compute b/i 7 vsubsd %xmm2, %xmm0, %xmm0 Subtract from a*x 8 ret Return ```
  • 3.11.4 Defining and Using Floating-Point Constants
    • ์ •์ˆ˜ ์—ฐ์‚ฐ๊ณผ ๋‹ฌ๋ฆฌ AVX float ์—ฐ์‚ฐ์€ immediate value๋กœ ์—ฐ์‚ฐํ•  ์ˆ˜ ์—†๋‹ค.
      • ๋Œ€์‹  ์ปดํŒŒ์ผ๋Ÿฌ๋Š” ์ž„์˜๊ฐ’์— ๋Œ€ํ•ด ์Šคํ† ๋ฆฌ์ง€๋ฅผ ํ• ๋‹นํ•˜๊ณ  ์ดˆ๊ธฐํ™”ํ•˜๊ณ , ๋ฉ”๋ชจ๋ฆฌ๋กœ๋ถ€ํ„ฐ ๊ฐ’์„ ์ฝ๋Š”๋‹ค.
      • double cel2fahr(double temp){
            return 1.8 * temp + 32.0;
        }
      • ๊ณผ ๊ฐ™์€ ํ•จ์ˆ˜๊ฐ€ ์žˆ๋‹ค๋ฉด, ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฐ”๋€๋‹ค.
      • 	double cel2fahr(double temp) temp in %xmm0
      1 cel2fahr: 2 vmulsd .LC2(%rip), %xmm0, %xmm0 Multiply by 1.8 3 vaddsd .LC3(%rip), %xmm0, %xmm0 Add 32.0 4 ret 5 .LC2: 6 .long 3435973837 Low-order 4 bytes of 1.8 7 .long 1073532108 High-order 4 bytes of 1.8 8 .LC3: 9 .long 0 Low-order 4 bytes of 32.0 10 .long 1077936128 High-order 4 bytes of 32.0 ```
      • .LC2์˜ ์œ„์น˜๋กœ๋ถ€ํ„ฐ 1.8์„ ๊ฐ€์ ธ์˜ค๊ณ , .LC3์—์„œ 32.0์„ ํŒ๋…ํ•ด์˜ค๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.
  • 3.11.5 Using Bitwise Operations in Floating-Point Code
    • ๋น„ํŠธ์—ฐ์‚ฐ์„ float์—์„œ๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค!
      • vxorps, vxorpd, vandps, vandpd
    • ๊ทผ๋ฐ float์—์„œ ๋น„ํŠธ์—ฐ์‚ฐ์€ ์ง„์งœ ์™œํ•˜์ง€?
      • ๋ ˆ์ง€์Šคํ„ฐ๋ฅผ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ์‹ถ์„ ๋•Œ
        • ์ž๊ธฐ ์ž์‹ ์„ xorํ•˜๊ธฐ
      • ๋ถ€ํ˜ธ ๋ฐ˜์ „ / ์ ˆ๋Œ“๊ฐ’ํ™”
        • ๋งจ์•ž MSB ๋งŒ์ง€๊ธฐ
      • (x<0 ? 0:x)๊ณผ ๊ฐ™์€ ๊ฒฝ์šฐ (RELU)
  • 3.11.6 Floating Point Comparison Operations
    • ์•„๋ฌด๋ž˜๋„ ์ˆ˜๋ฅผ ๋น„๊ต๋Š” ํ•ด์•ผ๊ฒ ์ง€
      • ucomiss / ucomisd
      • S1์™€ S2 ๋น„๊ต
      • ๋Š˜ ๊ทธ๋žฌ๋“ฏ ZF, CF, PF๋ฅผ ์„ค์ •ํ•œ๋‹ค
      • ํ•˜๋‚˜๋ผ๋„ NaN์ด๋ฉด, ์„ธ ํ”Œ๋ž˜๊ทธ๋ฅผ ๋‹ค ํ‚จ๋‹ค!
  • 3.11.7 Observations about Floating-Point Code
    • AVX2๊ฐ€ float์— ๋Œ€ํ•ด ์ •์ˆ˜๋ž‘ ๋น„์Šทํ•˜์ง€๋งŒ, ํ›จ์”ฌ๋” ๋‹ค์–‘ํ•œ ๋ช…๋ น์–ด์™€ ํ˜•์‹์„ ํฌํ•จํ•˜๋Š”๊ฑธ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค.
    • ๋˜ํ•œ ํŒจํ‚น๋œ ๋ฐ์ดํ„ฐ์— ๋ณ‘๋ ฌ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•ด์„œ ๋” ๋น ๋ฅด๊ฒŒ ์‹คํ–‰์‹œํ‚ฌ์ˆ˜๋„ ์žˆ๋‹ค.
      • ์š”์ƒˆ gcc๊ฐ€ ํ•ด์ฃผ๋”๋ผ

โ”์งˆ๋ฌธ ์‚ฌํ•ญ
#

๐Ÿ”— ์ฐธ๊ณ  ์ž๋ฃŒ
#