IRC channel logs

2013-08-12.log

back to list of logs

***Guest17480 is now known as jao
<shanecelis>I'm working with SWIG on guile (I want to try to add Directors, so you can extend a C++ class transparently.)
<shanecelis>And I'm just trying to get the tests to compile in the SWIG repo for guile.
<shanecelis>I get this error: Undefined symbols for architecture x86_64:
<shanecelis> "_scm_cells_allocated", referenced from:
<shanecelis> _scm_double_cell in example_wrap.o
<shanecelis>Does this mean anything to anyone? It's linked against libguile-2.0.
<shanecelis>(I'm doing this on Mac OS X, will try on Linux just to see if that's part of the issue.
<davexunit>shanecelis: I heard that SWIG doesn't work so well with Guile these days.
<davexunit>bbl
<shanecelis>Hmm... all SWIG tests for guile run under Linux.
<mark_weaver>shanecelis: it sounds like some header files from 1.8 might be in your C include path.
<mark_weaver>shanecelis: in both 1.8.x and 2.0.x, 'scm_double_cell' is an inlined function. in guile 1.8 it's defined in inline.h, and that version references the 'scm_cells_allocated' global variable, which no longer exists in 2.0.x.
<mark_weaver>shanecelis: in 2.0.x, 'scm_double_cell' is defined in 'gc.h', and that version does *not* reference 'scm_cells_allocated'.
<mark_weaver>sneek: later tell shanecelis it sounds like some header files from 1.8 might be in your C include path. in both 1.8.x and 2.0.x, 'scm_double_cell' is an inlined function. in guile 1.8 it's defined in 'inline.h', and that version references the 'scm_cells_allocated' global variable, which no longer exists in 2.0.x. in 2.0.x, 'scm_double_cell' is defined in 'gc.h', and that version does *not* reference 'scm_cells_allocated'.
<sneek>Got it.
<nalaginrut>morning guilers!
<shanecelis>Here's the SWIG guile hero: https://github.com/swig/swig/pull/42
<sneek>Welcome back shanecelis, you have 1 message.
<sneek>shanecelis, mark_weaver says: it sounds like some header files from 1.8 might be in your C include path. in both 1.8.x and 2.0.x, 'scm_double_cell' is an inlined function. in guile 1.8 it's defined in 'inline.h', and that version references the 'scm_cells_allocated' global variable, which no longer exists in 2.0.x. in 2.0.x, 'scm_double_cell' is defined in 'gc.h', and that version does *not* reference 'scm_cells_allocated'.
<nalaginrut>shanecelis: cool~very nice
<shanecelis>thanks, mark_weaver.
<shanecelis>sneek: later tell mark_weaver Here's the reason SWIG guile has been updated:
<shanecelis> https://github.com/swig/swig/pull/42
<sneek>Will do.
<nalaginrut>posix.c:1489: warning: the use of `tmpnam' is dangerous, better use `mkstemp'
<nalaginrut>hmm...seems not a bug
*nalaginrut is compiling Guile-2.0.9 on Hurd
<Chaos`Eternal>...
<Chaos`Eternal>not a bug..
<Chaos`Eternal>awesome
<mark_weaver>Guile itself doesn't use 'tmpnam', but it provides a binding for it.
<sneek>Welcome back mark_weaver, you have 1 message.
<sneek>mark_weaver, shanecelis says: Here's the reason SWIG guile has been updated:
<Chaos`Eternal>but we shall never use that
<Chaos`Eternal>cause it is really dangerous
<mark_weaver>there's probably some Guile code out there that uses it.
<mark_weaver>Guile should emit a warning when user code uses it.
<mark_weaver>however, this gcc warning should be ignored, imo
<Chaos`Eternal>agree
<youlysses>nalaginrut: Yah! Any problems so-far? :^)
<mark_weaver>nalaginrut: indeed, I'll be curious to hear how it goes!
<nalaginrut>youlysses: I think no any problem for regular compiling
<nalaginrut>but I installed en.US.utf-8 which Hurd doesn't install
<nalaginrut>autogen.sh throws warning for that
*youlysses is excited for the inevitable Hurd + Guix Distro.
<nalaginrut>and there's warning about readline symbols missing, I'll try to compile with readline then
<nalaginrut>when all done, I'll try 'make check'
<nalaginrut>maybe the compiling errors on Hurd is just about utf-8 issue (just guessing)
<nalaginrut>youlysses: it's OK for the compiling
*nalaginrut runing make check on Hurd
<nalaginrut>UNRESOLVED: regexp.test: regexp-quote: regexp/extended: (string "aX" 252 #\\\\xfc "a\\xfc" "a\\xfc")
<nalaginrut>well, I wonder if there's on regexp lib on Hurd in default?
<nalaginrut>s/on/no
<nalaginrut>failures: 2
<nalaginrut>FAIL: time.test: internal-time-units-per-second: versus times and sleep
<nalaginrut>FAIL: version.test: version reporting works
<nalaginrut>are these two expected failures?
<nalaginrut>mark_weaver: should the 'make check' must be all passed for stable-2.0? or not?
<nalaginrut>hmm...so bad grammar
<nalaginrut>seems it's all passed for i386 debian/Linux, but 1 failure for debian/Hurd
<nalaginrut>for my Hurd, it's two failures
<ArneBab>I wrote a blog post yesterday on the elegancy of let-recursion in scheme. It’s in German, but I think the code-samples should speak for themselves: http://draketo.de/licht/freie-software/let-rekursion#code-snippets
<ArneBab>it uses guile for the example - with wisp preprocessing.
*ArneBab is excited to see guix on Hurd, too!
<nalaginrut>fortunately I'm learning German, but it's not my current work for let-recursion ;-P
<ArneBab>;)
<ArneBab>I actually only realized two days ago how awesome let-recurision is… before that I did not like that it’s not consistent with define. Then I wrote some recursive code and saw how well let-recursion fits the usecase of requiring additional helper variables for recursion.
<ArneBab>and with that requirement, it does not actually feel inconsistent anymore - especially since it is completely consistent with let itself: why only use the variables of let, when we can use the body, too?
<wingo>moin
<sneek>Welcome back wingo, you have 1 message.
<sneek>wingo, dsmith says: So! v2.1.0-1093-g5270bb5 (master) passes make check no problems on 32bit! Yey
<wingo>nice
*wingo has the rtl compiler almost ready for submission
<nalaginrut>;-D
<nalaginrut>wingo: shoud 'make check' of stable-2.0 passed all tests? or it can't be promised
<nalaginrut>should
<wingo>nalaginrut: dunno, i think it should pass, yes
<wingo>if it doesn't, please file a bug
<nalaginrut>well, it do pass all on debian/Linux ,but 1 or 2 failures for debian/Hurd
<nalaginrut>FAIL: internal-time-units-per-second: versus times and sleep
<wingo>in that case there's a bug somewheres
<nalaginrut>well, I found that (times) always be #(0 0 0 0 0) on Hurd
<nalaginrut>FAIL: version.test: version reporting works
<nalaginrut>hmm...it's UNKNOWN for current version
<nalaginrut>OK~interesting hints
<dsmith-work>wingo: Sadly, even more recent changes cause failures again.
<wingo>dsmith-work: that was probably me
<wingo>will try to fix later this evening
<wingo>is it a failure in the ecmascript compiler?
<dsmith-work>wingo: I'm not sure right now. Not at that machine.
<dsmith-work>I redirected to a build log (I don't usually do that), but only stdout.
<dsmith-work>So what I saw was a little confused.
<mark_weaver>wingo: I fixed the ecmascript problem (from the <prompt> changes), but a strange failure popped up on hydra from your recent changes: http://hydra.nixos.org/build/5699944/log "ERROR: rtl.test: cached-toplevel-set!: 1 - arguments: ((misc-error #f "~A ~S" (bad-instruction (return-values 0)) #f))"
<mark_weaver>I'm not sure what's going on there.
<mark_weaver>anyway, looking forward to the rtl compiler :)
<wingo>mark_weaver: you probably need to remove your module/system/vm/{dis,}assembler.go
<wingo>in a later patch i add a rule to the makefile to fix that
<mark_weaver>wingo: this happened on hydra, which presumably starts each build clean.
<mark_weaver>I haven't reproduced it on my system.
<wingo>ah
<wingo>yes it is a rebase artifact :/
<wingo>i renamed an instruction at some point I think
*wingo checks
<wingo>actually i'm not sure what that could be
<mark_weaver>yeah, I looked into it a bit, and am at a loss. I hoped maybe you might have an idea.
<mark_weaver>wingo: btw, I was thinking about how to initialize structs with immutable fields, and I wonder: why not make a special "make struct" instruction that does the complete initialization, by referring to as many registers as needed? so it would have a count in the instruction, and then read as many following words in the instruction stream as needed?
<wingo>well for a few reasons
<mark_weaver>sounds a bit complex for an instruction, but it should also perform better.
<wingo>one is that it would be the only "vararg" instruction
<wingo>i got rid of the rest
<wingo>and so there is an advantage to not having vararg instructions
<wingo>for the compiler and the vm
<mark_weaver>what advantage?
<wingo>another is that it's not necessarily the case that one instruction would be faster
<wingo>because that one instruction would be like a little interpreter
<wingo>mark_weaver: compiler complexity, vm size, assembler/disassembler complexity
<wingo>and anyway separating the instructions gives you a chance to "compile" the allocation
<mark_weaver>creating structures is a very common operation. it seems good to avoid the dispatching overheads on all those "set!" instructions.
<wingo>maybe
<wingo>but the runtime interpretation cost isn't free either
<mark_weaver>I can certainly agree with the desire to reduce the complexity of the compiler/vm/etc, but I don't see why this would add much complexity there.
<wingo>and when we compile natively, it's all a moot point
<wingo>did you see my follow-on patches that removed complexity?
*mark_weaver looks
<wingo>i guess it's not very much
<mark_weaver>in this particular case at least, it doesn't seem like it would add much complexity.
<mark_weaver>and it would solve some thorny problems, like (as you say) the issue of having a "set-immutable-field-yes-its-really-okay"
<mark_weaver>instruction.
<wingo>i guess we can revisit this later
<wingo>will be useful when we have a whole system to test
<mark_weaver>okay, just a thought.. I defer to your expertise in this area.
<wingo>thanks, and thanks for bringing this up
<wingo>i'm trying to simplify and gain perf and that's hard with an interpreter ;)
<mark_weaver>*nod* :)
<stis>evening guilers!
<mark_weaver>I've also been looking at adding VM OPs for the R6RS fixnum and flonum ops. I'm a bit on the fence of how many instructions to add.
<wingo>heya stis :)
<wingo>mark_weaver: currently in the rtl vm there are around 120 ops, fwiw
<wingo>so there is room for more if needed
<mark_weaver>I also wonder if it would make sense to widen the opcode a few bits (maybe to 10 bits or so), and reduce the register field sizes (to say 6 bits)
<wingo>mmm, dunno
<wingo>it sounds like a bad idea to me
<mark_weaver>keeping them on byte boundaries hardly matters on modern machines afaict.
<mark_weaver>whereas adding more instructions for simple ops that are common in some programs could help a lot.
<wingo>why would you say that?
<wingo>maybe i am wrong but i would thing byte-sized accesses would be better in general
<wingo>dunno
<wingo>i haven't looked at the vm disassembly yet
<wingo>perhaps that is a problem ;)
<mark_weaver>well, if byte accesses can make the native instruction counts smaller, then maybe.
<mark_weaver>the thing is, I was looking at the fixnum/flonum ops, and was tempted to add on the order of 100 new instructions or so. but then the opcode space is getting tight. I dunno, maybe I need to restrain myself here :)
<mark_weaver>of course I realize that it's important to keep the VM within the instruction cache.
<wingo>100 is a bit nuts :)
<mark_weaver>but if you have a program that uses a lot of fixnum and flonum ops, then it could simply be another area of the VM that is kept in cache (that would normally be cold). dunno.
<wingo>though who knows...
<wingo>would be nice to be able to deal in unboxed values at some point, and i wouldn't want to box (ahem) ourselves out of that
<wingo>btw mark i think you are going to like the cps stuff
<mark_weaver>wingo: oh, one more (unrelated) thought while I've got you here: the other day, when we talked about NaN boxing, you mentioned that it would be nice to add more immediate tags to SCM pointers, e.g. to distinguish pairs and structs from other things without needing an extra word in the heap.
<mark_weaver>wingo: but when I thought about how to do that, it seemed problematic with the GC.
<stis>another option is to add a reference to a fkn via a argument and do an fast-apply instruction
<wingo>it's going to be relatively straightforward to do native code
<mark_weaver>wingo: I think we'd need to tell the GC to ignore some bits in each word when deciding whether it looks like a pointer.
<wingo>mark_weaver: yeah, with boehm gc you would need for those tags to be in the low 3 bits
<wingo>i think anyway
<mark_weaver>wingo: and that might cause problems with making it confuse non-pointers for pointers.. dunno.
<mark_weaver>maybe it wouldn't be a problem, dunno..
<mark_weaver>wingo: the high bits might work too, at least on 64-bit platforms. on any machine where NaN boxing is feasible anyway, the high bits of valid addresses are going to be all zeroes (or all ones on some platforms).
<mark_weaver>well, I'll let you get back to work :)
<wingo>:)
*wingo working on firefox today, whee
<mark_weaver>cool! :)
<stis>mark weaver:, quite a lot of speed improvements can be accomlpished if one can make C function application become faster
<stis>e.g. say that you have 4 or so instrucitons of the form
<stis>inst-1 fkn-index
<stis>inst-2 fkn-index arg1
<stis>inst-3 fkn-index arg1 arg2
<mark_weaver>stis: I suspect the main issue is how to get the address of the C function.
<mark_weaver>you can't encode it directly in the instruction stream, because of shared libraries needing to be relocatable.
<stis>fkn-index = 16 bit address in a 16K array of function pointers
<mark_weaver>so you need to load the address from somewhere.
<mark_weaver>okay, then you are limiting yourself to a fixed set of C functions to call.. so essentially it's just more VM instructions, encoded in a different way.
<mark_weaver>which is not necessarily a bad idea, but I wouldn't call it "making C function application faster"
<stis>Yes, I would call in instruction functions or something like that
<mark_weaver>but if it's just a way to increase the VM op space, then it might be better to simply widen the main opcode field, which would have the advantage that there'd be only one dispatch, not two.
<stis>Yes, but some code will not use these extra function and would be hurt by the extra overhead?
<mark_weaver>there's a relatively heavy cost involved in jumping to the opcode via a pointer table.
<stis>But it's much faster then using C wrapped funcitons
<mark_weaver>in your idea, you pay that cost twice for each of these special instructions.
<mark_weaver>if our primary opcode field was wide enough for everyone's needs (within reason), then we could avoid that.
<mark_weaver>and it seems to me that 10 bits would be more than enough. whereas 8 bit register indexes seems like much more than we actually need.
<mark_weaver>(for most instructions, at least).
<stis>Yes, I actually do not know the impact of say 16 bit opcodes.
<stis>or 10
<stis>I just noted that our instruction format is a bit wotrdy compared to x86-64 assembler
<stis>also got scared when you explained the need to not blow instruction caches
<mark_weaver>it really comes down to this question: is there an advantage to using byte addressing to access those components, vs bit operations.
<stis>If we know thy typical speeds of the functions we are trying to apply here one could calculate perhaps
<mark_weaver>x86-64 assembler requires a very complicated instruction decoder. in hardware you can get away with that by having a big circuit that does a lot of that work in parallel.
<stis>some numbers to indecate if the extar inderection is important or not
<mark_weaver>in a software VM, I suspect that there's a bit advantage to the RTL's fixed-width instructions.
<stis>true!
<mark_weaver>s/bit/big/
<stis>sed s/all my missspelling/correct spelling/g
<mark_weaver>well, I'll want to ensure that the R6RS fixnum and flonum ops are as fast as we can make them.
<stis>for this class I agree that we need to try to skip the inderection for many of the functions.
<stis>for logic programming ops i think an inderection is ok!
<stis>in 90% of the suspects
<mark_weaver>stis: the thing is: how many ops would you want to add for logic programming?
<stis>10-30
<stis>But it's hard to see if there are other fields that need speed boosts as well.
<mark_weaver>I guess I don't see the point in adding an extra level of indirection, when another bit or two in our primary opcode would make enough space for everyone.
<stis>Before answering this I would try to see how many bov sbcl have employd.
<davexunit>yay my library got posted to r/scheme. :)
<davexunit>got 3 new stars on my github repo thus far.
<stis>s/bov/vop/
<mark_weaver>davexunit: congrats! :)
<stis>yipee
<davexunit>no comments yet, but hey. :P
<add^_>Heh
<davexunit>r/scheme is low traffic so that's to be expected.
<stis>mark_weaver, you may be correct, but let me check sbcl
<mark_weaver>stis: okay, let me know what you find!
<stis>mark_weaver grep define vop *.lisp | wc
<stis>grep define-vop *.lisp | wc
<stis>460 1192 24043
<stis>Hmm that would give us 500 to use for specialized fields not covered by sbcl.
<stis>mark_weaver: you are probably right 10 bits is probably enough
<add^_>You should probably say probably a bit more probably.
*add^_ is just joking
<stis>:-), Drink your beers anr cheers, cause the time is a'funnieng
<add^_>:-)
<stis>mark_weaver, the last 4 instructions could be the inderection versions if we ever come to that
<stis>grep define-vop float.lisp | wc
<stis>60 151 2762
<stis>mark weaver: You beat them there, you had 100 didn't you?
<mark_weaver>actually, I was exaggerating a bit :)
<stis>:)
<mark_weaver>I drew up a quick list of fixnum ops to add, and it came out to about 40.
<stis>No I think that 10 bit fields should be ok, but shrinking the register size is that wise?
<mark_weaver>the number of desirable flonum ops would probably be about the same. so probably more like 80 instructions.
<mark_weaver>well, we can't increase the opcode without taking bits from the register indices.
<mark_weaver>but it would be exceedingly rare for a procedure to require more than 64 registers.
<mark_weaver>the move instructions could deal with a larger set of registers.
<mark_weaver>but the other operations don't need more than 64, imo.
<mark_weaver>it's a tradeoff, in exchange for having more operations available quick at hand.
<mark_weaver>if you really wanted, you could have 7 bit register indices... 128 registers.
<mark_weaver>I guess there's no harm in that.
<stis>We also have the option of 10 11 11 = 32 and add the extra aops to the next 32 bits
<mark_weaver>you could, but I'm not sure it even makes sense. do we really need more than 1024 fast ops?
<mark_weaver>the ops needed to call an arbitrary C procedure are not that bad.
<mark_weaver>but I suppose it could be done, if deemed useful.
<stis>1. currently we wrap all calls to C funciton in anothe dispatching funciton
<stis>2. there is a branchy if args
<stis>the disptahcing funciton beein in scheme land
<mark_weaver>right, that's the error checking, which most of the time you'll want. but there's nothing in the design of the VM that prevents you from inlining the C function call and omitting the check.
<stis>That's true, if one is programming in glil things would be 2x faster sometimes
<stis>Hmm, we could add some low-level code that inline functions without checks
<stis>and use macrology to check arguments if possible else dispatch to the safe variant
<mark_weaver>better yet, get peval to do it for you.
<mark_weaver>replace calls to <primitive> with a call to a constant lambda expression, that does the check and makes the low-level call. if peval can eliminate the check then it does. it may also inline the low-level call.
<stis>Do you think that this would be an interim solution to speed up your numops?
<mark_weaver>well, it wouldn't be to a constant lambda expression exactly, but rather a constant variable reference.
<mark_weaver>a reference to a variable that would be *assumed* to be constant.
<mark_weaver>which is essentially what we are doing now anyway, when we convert calls to '+' into a primitive op.
<stis>yep, these things below to the internals
<stis>s/below/belong/
*stis will now assume that most of the flops will be 2x faster the next guile release.
<stis>;-)
*davexunit also assumes that
<wingo>moo
<mark_weaver>hey wingo!
<wingo>heya :)
*mark_weaver is eagerly awaiting the rtl compiler :)
*wingo polishing off the commits :)
<wingo>sorry for working in the dark, just that it's easier to explain code if it's clear from the beginning
<wingo>so i fabricate a history based on rebasing :)
<mark_weaver>btw, I looked at the x86 code in vm.o, and indeed there is a slight advantage to 8 bit fields.
<wingo>in the false history everything is clear and well-thought from the beginning ;)
<mark_weaver>wingo: no need to apologize; I've got lots of Guile code that has not yet seen the light of day. I really ought to dust off some of this work :)
<mark_weaver>wingo: haha, indeed!
<mark_weaver>I think I can probably get by with 8-bit opcodes, because the fixnum ops will actually (I think) be the same as generic ops, except for one bit stashed outside of the 8-bit opcode.
<mark_weaver>basically, that extra bit will only be checked if the fast path fails. then, if it's a fixnum op, it raises an exception, but for the generic op it goes to the slow path.
<mark_weaver>so then only the flonum ops would need new instructions.
<mark_weaver>but one of the 8-bit register indices would have to go down to 7 bits.
<mark_weaver>(in the 3-op arithmetic instructions)
<wingo>that's fine
<mark_weaver>cool, I'll work on it :)
<wingo>nice :)
*wingo doesn't have any more rtl vm work in mind
<wingo>meaning, the calling convention seems sane, things seem to work more or less, though there are sure to be broken things still
<wingo>btw
<wingo>it used to be that I did the chez trick of having the MVRA be at a fixed offset from the RA
<mark_weaver>I'd be glad to help shake out whatever broken things may remain.
<wingo>but I thought ahead to native compilation and that would kill the return branch buffer
<wingo>in one case or the other
<mark_weaver>indeed
<mark_weaver>what's your current thought?
<wingo>so i thought that an inline comparison that the number of values was the expected number wouldn't be too bad, in the native case
<mark_weaver>yeah, that was my thought as well.
<wingo>and that in the single-value VM case we can add an op to do things that are needed when a call returns
<wingo>so that's the "receive" op I added
<wingo>there is also "receive-values'
*mark_weaver looks
<wingo>with the difference that "receive" actually copies the value down
<wingo>the intention is that for MV returns where the number of values is >1 that the compiler tries to leave the MV returns where they are
<wingo>which ends up that your frame has some dead values in the middle where the call frame was
<wingo>but in the future the call frame won't have a separate MVRA slot, so that will be better
<wingo>anyway back to the emacs, will check in a bit later
<mark_weaver>sounds good to me! happy hacking :)
<wingo>mark_weaver: do you have a record type naming convention that you're happy with? sometimes i think <foo> is too much
<wingo>almost makes me want a lisp-n ;)
<mark_weaver>hehe
<mark_weaver>I don't have any clear thoughts on the matter :)
<wingo>ok :)
<mark_weaver>naming things is hard
<wingo>mark_weaver: ok
<wingo>pushed wip-cps-bis
<mark_weaver>whee!! :)
<wingo>still needs tests and better documentation and an intro
<wingo>i will do all that later
<mark_weaver>much appreciated!
<mark_weaver>thanks for all the awesomeness, wingo! :)
<wingo>but i thought you might have fun with it :)
<wingo>np, thank you for yours :-)
<mark_weaver>:)
<wingo>btw before landing i want to convert to compiling directly from CPS to "vmcode", which will involve direct calls to the instruction emitter procedures instead of building up lists then parsing the lists
<wingo>that also opens the possibility of compiling to "native", and compile options could select the target
<wingo>dunno, just a thought
<wingo>but more static checking as to e.g. arities of the assembler instructions would be nice
<wingo>so it's still a WIP
<wingo>but hopefully the basics are there
<wingo>ok, zzz
<wingo>night :)
<mark_weaver>sounds great! sleep well :)