<oriansj>gio: if perfection is playing PC games; AMD64 is impossible to beat <xentrac>I feel like division hardware is sort of marginally worthwhile really <oriansj>xentrac: granted when transistor budgets were low but not when you are doing OoO <xentrac>well, I mean multiplication has an enormous amount of potential parallelism, so you can get a big speedup from hardware multipliers <xentrac>but division is, if not inherently serial, serial in practice, so hardware dividers don't give as big a speedup <xentrac>as i understand the situation, anyway. I haven't dissected production division logic even on ancient processors, much less modern ones <xentrac>I just know it's slow as shit and that classic Crays didn't even have a vector division instruction, just a reciprocal ***ng0_ is now known as ng0
<oriansj>xentrac: well even in classic designs it is possible to do 1bits of division per clock cycle (so 128 clock cycles tops for 128bit division, with 64 remainder and 64bit modulo); which is still better than the 8 clocks per bit required for a software loop (Assuming perfect pipelining and prediction) <oriansj>More modern approaches can do 16bits per clock cycle (9 cycles per idiv) or can pipeline idivs to reduce stalls in idiv heavy blocks (39cycle latencies) <oriansj>The big argument against integer division is it consumes alot of transistors and complicates pipeline logic. <oriansj>But on modern OoO that isn't really a valid argument <oriansj>janneke: it is appearing my experiment might bare some interesting fruit shortly (at a small regression cost) <janneke>i'm still playing with integrating guile's module system