IRC channel logs

***Guest48746 is now known as jao

<mark_weaver>weinholt: based on your bug report, I worry that (fixnum-width) is now 63 on your system. that's based on your example (fxcopy-bit 0 (fixnum-width) 1) => 9223372036854775808

<mark_weaver>is that true? if so, something is broken.

<weinholt>nope, it's 62

<mark_weaver>and yet (fxcopy-bit 0 (fixnum-width) 1) returns 2^63 for you?

<weinholt>no, 2^62?

<weinholt>hm

<weinholt>that's odd

<weinholt>the output in the mail is pure copy-and-paste, not sure why it is different today

<mark_weaver>what does (fxcopy-bit 0 (fixnum-width) 1) return for you now?

<weinholt>4611686018427387904 with 2.0.9.40-824b-dirty rebuilt today

<mark_weaver>okay, good. I know it's not correct, but at least it's not surprising either :)

<mark_weaver>I'm torn on what to do about these fixnum procedures sometimes returning non-fixnums. Right now, the fixnum procedures are built on top of the generic numeric procedures.

<mark_weaver>Obviously, at some point we should make them into more optimized procedures that avoid creating bignums and flonums.

<mark_weaver>but right now, doing these checks would make them a lot slower.

<weinholt>yeah, it's a bit opposite to what you'd want

<mark_weaver>well, for now, I just pushed 93da406f331a1849f05e63387442b9aaf33f9540 to stable-2.0, which optimizes several of the R6RS bitwise procedures (upon which the fixnum bitwise procedures are based). as a side effect, some of the crashes you reported in bug 14917 (Missing range check in fxcopy-bit can give SIGABRT) are avoided.

<weinholt>cool

<mark_weaver>they are avoided because the new optimized fxcopy-bit (and bitwise-copy-bit) does some clever checks where if the bit you are going to set is already set to the right value, then it knows it doesn't have to do anything.

<weinholt>the fixnum library is designed so that you can elide lots of type checks, since you know that everything returns fixnums. but i guess guile isn't quite there yet

<weinholt>clever indeed

<mark_weaver>on the other hand, if you tried to set such a high bit to a value that it wasn't already set to, you'll still run out of memory. I'm not really sure how to fix that properly. In general, running out of memory is fatal.

<mark_weaver>at some point, it would be nice to have more robust handling of resource limits and such.

<mark_weaver>regarding the purpose of the fixnum and flonum libraries, I'm already thinking about how to ensure that we make proper use of them when we have native compilation.

<mark_weaver>but that's a longer term project obviously :)

<weinholt>when i'm out of memory, i _demand_ an &implementation-restriction be raised :)

<weinholt>now to figure out where to allocate the &implementation-restriction...

<mark_weaver>yes, that would obviously be desirable.

<weinholt>guile's fixnums are tagged with a non-zero tag, right? any hope of that changing?

<weinholt>(afaik a non-zero tag makes for slower code)

<mark_weaver>*nod*

<mark_weaver>yes, it's currently a non-zero tag.

<mark_weaver>I think there might be some requirements imposed on us by the Boehm GC garbage collector, but I'm not sure. wingo might know.

<wingo>i am not sure either

<mark_weaver>I guess civodul would know, but he's away for the next month.

<wingo>i seem to recall i tried for a nonzero tag in wip-retagging

<wingo>but i don't remember why that didn't work out; probably something unrelated

<wingo>s/nonzero/zero/

<mark_weaver>right now we're using the zero tag for non-immediate objects (i.e. pointers)

<wingo>anyway it's pretty low on the speed priority list :)

<mark_weaver>the thing is, any pointers from C will effectively have a zero tag, so Boehm GC has to be configured to treat zero tags as pointers anyway.

<weinholt>interesting

<mark_weaver>if we changed our internal pointers to use a non-zero tag, then Boehm would have to treat two different kinds of tags as pointers.

<mark_weaver>and furthermore it might confuse our fixnums for pointers, and thus cause garbage to sometimes be kept around. so it would make our GC less precise.

<weinholt>the scheme vm i wrote in go also had to use non-zero fixnums because of the GC

<mark_weaver>My gut feeling is that the non-zero tag for fixnums won't make a huge difference in speed. It's true that addition and subtraction requires masking out the tag from one of the operands first, but that's just one instruction which is probably lost in the noise compared with the overflow check.

<mark_weaver>(since conditional branches are expensive)

<weinholt>that may well be

<wingo>mark_weaver: any pointers from c will be wrapped in a SCM, which can have a nonzero tagging scheme

<wingo>dunno

<mark_weaver>as wingo reminds me periodically, the real way to make numerics faster is to unbox numbers when possible, in which case there will be no tag at all.

<weinholt>here's some related material, that is likely completely inapplicable to guile: http://weinholt.se/scheme/alignment-check.pdf http://weinholt.se/scheme/fixnums.txt

<mark_weaver>wingo: Hmm. I strongly suspect that there are places in the code where C has a bare pointer to some GC allocated block, and that in some of those cases it's important that the GC see that pointer.

<wingo>you are probably right

<wingo>weinholt: i enjoyed very much the alignment hack when you wrote about it some months ago :)

<weinholt>months? try a year or more :)

<mark_weaver>ah yes, I've seen your alignment hack document before as well. nice hack!

<wingo>weinholt: time passes :)

<weinholt>yeah, turned out i wasn't the first to find it, but still a nice hack

<mark_weaver>though I seem to remember coming to the conclusion that we couldn't use that hack for Guile.

<mark_weaver>I don't remember why I thought that. hmm.

<mark_weaver>weinholt: speaking of which, do you know how to efficiently and portably check for overflow of multiplication of signed fixnums?

<weinholt>portably? no. :)

<weinholt>the x86 has an overflow flag, no idea how to access it from C

<mark_weaver>currently, we do this: http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=libguile/numbers.c;h=07bcaad270e11314ed394afbda25b61f2e222d2a;hb=stable-2.0#l8041

<mark_weaver>which is painful to think about, but I can't think of a better way.

<weinholt>i'll just go and make a cup of tea while number.c loads...

<mark_weaver>I guess I should put in some 'asm' snippets for popular processors.

<mark_weaver>yeah, I know that processors tend to have an overflow flag. such a shame that it cannot be accessed from C.

<weinholt>fwiw, one problem with checking for overflow in C is that such code easily invokes undefined behavior

<mark_weaver>yes, I know that all too well :) http://stackoverflow.com/questions/14495636/strange-multiplication-behavior-in-guile-scheme-interpreter/14498437#14498437

<weinholt>there's a recommendation here: https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow

<mark_weaver>our older multiplication overflow check ran into that problem.

<weinholt>(which looks even slower)

<mark_weaver>fortunately, for addition and subtraction, we don't have to worry about that since we have a couple of extra bits to spare (fixnums have 2 fewer bits than the machine integer)

<mark_weaver>yeah, they're basically doing the same thing that we're doing now, except that they break out all the sign cases whereas I just do the test on the absolute values.

<mark_weaver>absolute value can be done without a conditional branch, though I don't think the code I currently have written will be compiled to avoid it.

<taylanub>Whoa, we accept '. to mean #{.}#, that's kind of weird.

<wingo>yay, a merge from stable-2.0

<mark_weaver>weinholt: fixnums.txt is interesting. I appreciate the snippets of x86 code. I confess that those nice short instruction sequences you have there make me want a zero tag for fixnums.

<wingo>tx mark_weaver

<mark_weaver>wingo: you're welcome :)

<mark_weaver>wingo: do you have any thoughts on how best to integrate R6RS's fixnum and flonum operations into the VM?

<wingo>mark_weaver: nope, haven't thought about it

<wingo>we could add typed opcodes

<wingo>racket does that and it seems to work very well for their jit

<mark_weaver>the fixnum ops could perhaps simply be a flag on the generic ops.

<mark_weaver>such that the flag could only be checked in the slow path.

<wingo>could be, yes; dunno

<mark_weaver>is it feasible to add new ops in 2.0, or only in 2.2?

<wingo>it's possible to add new ops in 2.0 but please do not do so ;-)

<mark_weaver>is there a free bit available anywhere in the basic arith ops in 2.0?

<wingo>i think we're close enough to getting the rtl vm working that anything on the stack vm is a waste of time

<wingo>mark_weaver: i don't think so; you'd need to make new ops

<mark_weaver>well, it'll be a while before 2.2 is deployed. it just hurts how slow these fixnum/flonums ops right now.

<mark_weaver>even if only a couple of the most commonly-used ones (+ - * /) could be integrated, it would be nice.

<mark_weaver>well, I'll let you decide :)

<mark_weaver>I wonder if 'jo' is handled specially by the branch prediction logic of modern x86 processors.

<mark_weaver>I often regret that there is no obvious way to modern x86 processors that a branch is highly unlikely (or likely), and that it should simply always predict one way, instead of using up the resources of the branch predictor.

<mark_weaver>*way to tell

<mark_weaver>If 'jo' were always predicted to not happen, it could perhaps be used as a clever way to avoid mispredicted branches.

<wingo>www.agner.org/optimize/microarchitecture.pdf‎

<mark_weaver>*nod* yes, that's where I learned most of what I know about modern branch prediction :)

<wingo>:)

<mark_weaver>and it seems that modern x86 processors have done away with the simpler forms of branch prediction entirely, such that I don't see an obvious way to specify that static branch prediction should be used for a particular branch instruction.

<mark_weaver>in the old days, typically a predictor would always predict that a backward branch would be taken, so you could cleverly arrange your code to make the most likely branch go backwards.

<mark_weaver>ah, the x86 actually has "branch hint" instruction prefixes.

<weinholt>those no longer work though, right?

<mark_weaver>I don't know, that's a good question.

<weinholt>iirc it only worked on the P4

<mark_weaver>okay

<mark_weaver>weinholt: would you be willing look at my proposed fixnum multiply x86 code, to see if you can see an obvious way to improve it? http://paste.lisp.org/display/138140

<mark_weaver>I don't have much experience writing x86 assembly.

<mark_weaver>it works anyway, it seems to make the MUL VM instruction faster by a factor of at least 3 (probably more).

<weinholt>do you really need the sub to remove the tag?

<weinholt>oh, nevermind

<mark_weaver>heh :)

<mark_weaver>it would probably be better to somehow combine those two tag tests into a single test.

<mark_weaver>more instructions but fewer conditional branches.

<weinholt>i'd suggest doing the tag tests in C, and maybe see if you can use contraints to let gcc pick all the registers

<mark_weaver>makes sense.

<mark_weaver>I copied wingo's asm code for ADD and SUB and just modified it for MUL. All of them could probably benefit from your suggestion.

<mark_weaver>one possible complication is that we need to make sure the word size of the chosen destination register matches the size of SCM, so that the overflow test will work properly.

<weinholt>yeah, asm constraints are tricky

<mark_weaver>actually, since the variables passed as "x" and "y" are SCM variables, I guess that's a safe assumption anyway.

<weinholt>you probably need "cc" in the clobbered list, btw, since you modify the flags

<mark_weaver>good point :)

<weinholt>you could add the tag to %rcx using LEA between the IMUL and the JO (since LEA doesn't modify the flags), but i don't know if it will be faster

<mark_weaver>Given that the LEA would need the result from the IMUL, I doubt it would be faster.

<weinholt>true enough

<mark_weaver>anyway, until we are generating native code, this kind of micro-optimization would probably be lost in the noise anyway.

<weinholt>but at least you can add the tag with C code, let gcc schedule it

<weinholt>yep

<mark_weaver>it's true, we should move as much as possible of this to C.

<wingo>i think that was civodul's asm code, fwiw

<mark_weaver>ah, sorry to misattribute...

<wingo>heh, np...

***sneek_ is now known as sneek

***dsmith_ is now known as dsmith

IRC channel logs

2013-07-21.log