#include <stdlib.h> void main() { float f = rand(); }When compiled on POWER (Linux) using GCC 4.5.1 without any optimization flags and using the default -m32 mode, GCC will generate assembly code as follows:
stwu 1,-48(1) mflr 0 stw 0,52(1) stw 31,44(1) mr 31,1 bl rand mr 9,3 lis 0,0x4330 lis 11,.LC0@ha lfd 0,.LC0@l(11) xoris 9,9,0x8000 stw 9,28(31) stw 0,24(31) lfd 13,24(31) fsub 0,13,0 frsp 0,0 stfs 0,8(31) addi 11,31,48 lwz 0,4(11) mtlr 0 lwz 31,-4(11) mr 1,11 blrThere are two floating-point operations: fsub and frsp. frsp is Floating Round to Single Precision instruction, which makes sense.
The question is, where is the integer to floating-point conversion code? It seems subtraction instruction fsub has something to do about it. What is GCC's logic in generating such a code?
GIMPLE is pretty much like C language and is platform independent. To see it, compile above code with -fdump-tree-gimple option, and a file foo.c.004t.gimple will be created. Its content is like this:
main () { int D.3135; volatile float f.0; volatile float f; D.3135 = rand (); f.0 = (volatile float) D.3135; f = f.0; }RTL is platform dependent; this stage will perform code generation, which includes, for example, register allocation, instruction scheduling, intra-procedural optimizations. So to answer our question, we need to look at various passes of RTL.
Compile above code with -S -dP -fdump-rtl-all option, and a bunch files named foo.c.XXXr.YYY will be created. XXX are digits, representing the Pass number XXX in the RTL stage, and YYY is the name of that Pass.
... #(insn 19 18 11 1234.c:8 (set (reg:DF 32 0 [121]) # (minus:DF (reg:DF 45 13 [125]) # (reg:DF 32 0 [123]))) 258 {*subdf3_fpr} (nil)) fsub 0,13,0 # 19 *subdf3_fpr [length = 4] ...The key word here is subdf3_fpr. Next, do
grep subdf3_fpr foo.c.???r.*The first occurrence of RTL pass which has subdf3_fpr is foo.c.188r.ira, and the pass immediately before this is foo.c.185r.asmcons. These two RTL dump files still do not show why a subtraction instruction is generated. However, in these two files, one can also spot another key word: movdf_hardfloat32.
Do
grep movdf_hardfloat32 foo.c.???r.*The first occurrence of RTL pass which has movdf_hardfloat32 is foo.c.146r.vregs. Now this file provides some clue:
(insn 10 9 11 3 1234.c:8 (parallel [ (set (reg:DF 121) (float:DF (reg:SI 119 [ D.3135 ]))) (use (reg:SI 122)) (use (reg:DF 123)) (clobber (mem/c:DF (plus:SI (reg/f:SI 113 sfp) (const_int 24 [0x18])) [0 S8 A64])) (clobber (reg:DF 125)) (clobber (reg:SI 126)) ]) 271 {*floatsidf2_internal} (expr_list:REG_EQUAL (float:DF (reg:SI 119 [ D.3135 ])) (nil)))floatsidf2_internal sounds like the integer to floating-point conversion code we are seeking and probably holds the key why a subtraction instruction is generated.
The platform-dependent RTL rules/patterns are in gcc/config/<arch> in GCC source tree. In our case, <arch> is rs6000. Look at the file gcc/config/rs6000/rs6000.md (md=machine description), we can find the following rewriting rules:
(define_expand "floatsidf2" [(parallel [(set (match_operand:DF 0 "gpc_reg_operand" "") (float:DF (match_operand:SI 1 "gpc_reg_operand" ""))) (use (match_dup 2)) (use (match_dup 3)) (clobber (match_dup 4)) (clobber (match_dup 5)) (clobber (match_dup 6))])] "TARGET_HARD_FLOAT && ((TARGET_FPRS && TARGET_DOUBLE_FLOAT) || TARGET_E500_DOUBLE)" " { if (TARGET_E500_DOUBLE) { emit_insn (gen_spe_floatsidf2 (operands[0], operands[1])); DONE; } if (TARGET_POWERPC64) { rtx x = convert_to_mode (DImode, operands[1], 0); emit_insn (gen_floatdidf2 (operands[0], x)); DONE; } operands[2] = force_reg (SImode, GEN_INT (0x43300000)); operands[3] = force_reg (DFmode, CONST_DOUBLE_ATOF (\"4503601774854144\", DFmode)); operands[4] = assign_stack_temp (DFmode, GET_MODE_SIZE (DFmode), 0); operands[5] = gen_reg_rtx (DFmode); operands[6] = gen_reg_rtx (SImode); }")
(define_insn_and_split "*floatsidf2_internal" [(set (match_operand:DF 0 "gpc_reg_operand" "=&f") (float:DF (match_operand:SI 1 "gpc_reg_operand" "r"))) (use (match_operand:SI 2 "gpc_reg_operand" "r")) (use (match_operand:DF 3 "gpc_reg_operand" "f")) (clobber (match_operand:DF 4 "offsettable_mem_operand" "=o")) (clobber (match_operand:DF 5 "gpc_reg_operand" "=&f")) (clobber (match_operand:SI 6 "gpc_reg_operand" "=&r"))] "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT" "#" "&& (can_create_pseudo_p () || offsettable_nonstrict_memref_p (operands[4]))" [(pc)] " { rtx lowword, highword; gcc_assert (MEM_P (operands[4])); highword = adjust_address (operands[4], SImode, 0); lowword = adjust_address (operands[4], SImode, 4); if (! WORDS_BIG_ENDIAN) { rtx tmp; tmp = highword; highword = lowword; lowword = tmp; } emit_insn (gen_xorsi3 (operands[6], operands[1], GEN_INT (~ (HOST_WIDE_INT) 0x7fffffff))); emit_move_insn (lowword, operands[6]); emit_move_insn (highword, operands[2]); emit_move_insn (operands[5], operands[4]); emit_insn (gen_subdf3 (operands[0], operands[5], operands[3])); DONE; }" [(set_attr "length" "24")])The floatsidf2_internal is really just an identifier (pattern name).
The [(set (match_operand:DF 0 ... "gpc_reg_operand" "=&r"))]] is the RTL matching template.
The "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT" .. are additional predicates. For example
The { rtx lowword, highword; ... } is the code generation.
It is now clear that the generated code and its RTL counterparts are:
xoris 9,9,0x8000 emit_insn (gen_xorsi3 (operands[6], operands[1], GEN_INT (~ (HOST_WIDE_INT) 0x7fffffff))); stw 9,28(31) emit_move_insn (lowword, operands[6]); stw 0,24(31) emit_move_insn (highword, operands[2]); lfd 13,24(31) emit_move_insn (operands[5], operands[4]); fsub 0,13,0 emit_insn (gen_subdf3 (operands[0], operands[5], operands[3]));
bl rand mr 9,3 // result of rand() is in $r9 lis 0,0x4330 // the high-order part of "magic double", i.e. 0x43300000 lis 11,.LC0@ha lfd 0,.LC0@l(11) // have magic double in $r0 now xoris 9,9,0x8000 // flip the sign of $r9 (Step 1) stw 9,28(31) // put $r9 in lower part of a doubleword in memory (Step 1) stw 0,24(31) // the high-order part of our doubleword is also 0x43300000 (Step 2) lfd 13,24(31) // Step 3 fsub 0,13,0 // Step 4In PowerPC 64-bit mode, the code is much shorter since the instruction fcfid can be used.