IRC channel logs

2013-11-18.log

back to list of logs

<mark_weaver>tupi: interesting. that makes me suspect that the procedure passed to 'par-map' is sleeping.
<mark_weaver>tupi: what does (current-processor-count) report?
<mark_weaver>I don't know how the futures one behaves, but here's how this one behaves: the are N workers, where N is (current-processor-count).
<tupi>mark_weaver: it seems that anything using java start to sleep indeed, but why?
<mark_weaver>each worker just does a loop: grab the next available item on the list, run proc on it, and then continue doing that until there are no more items.
<tupi>scheme@(guile-user)> (current-processor-count)
<tupi>$1 = 12
<tupi>
<mark_weaver>if 'proc' sleeps, then that worker will just sit and nothing more will be done on that processor.
<tupi>now i have 12 java processes sleeping
<mark_weaver>well, that's out of guile's control, obviously.
<tupi>yes, sure, and of my understanding as well :)
<tupi>these are triggered through the following command:
<tupi>all cores went back to 100% now, once the java proc ends i guess
<mark_weaver>does the java proc do some network activity?
<tupi>reading writing an image ye, but the cmputation not
<mark_weaver>you can ask for a different number of workers by using 'n-par-map' instead. the first parameter is N, the number of workers to create.
<mark_weaver>you could ask for more workers, so that if they sleep some percentage of the time, in practice you'll make better use of the processor.
<mark_weaver>obviously, you want to keep N to some small multiple of the number of actual processors. the higher it is, the more context switching overhead there wil be.
<tupi> i was looking at the code, here is an example of a java call:
<mark_weaver>(not an optimal solution, of course)
<mark_weaver>well, actually, I'm not sure there will be much more context switching overhead. it might be more about memory consumption.
<tupi>cd /usr/lpdi/projects/clojure/jars; java -Djava.awt.headless=false -cp clojure.jar:ij-core.jar clojure.main /usr/lpdi/projects/clojure/watershed/watershed.clj ~A ~A ~A
<mark_weaver>sorry, that means nothing to me.
<tupi>the args are the path, the name and the type of the image watershed must be calculated
<mark_weaver>I have to go afk for a while. good luck!
<tupi>it just launch java, with the option, and compiles/run the clojure code
<tupi>ok thanks mark_weaver
<mark_weaver>welcome :)
<mark_weaver>tupi: one possibility is to print the command line when you start a java process, and then the same message when it finishes but with "END" after it, and then sort the resulting partial output. then you'll see which ones started but slept for a long time.
<mark_weaver>tupi: then you could run those troublesome commands outside of guile to see if you can reproduce the problem and debug further.
<mark_weaver>you'll need a mutex for the output port though. in stable-2.0, ports are not thread-safe. if you use the same port from multiple threads, you must protect them yourself.
<mark_weaver>just make a mutex with (make-mutex) and wrap (with-mutex <mutex> ...) around the print statements.
<mark_weaver>(sorry if that's obvious, but I didn't know if you know about those handy utilities)
<tupi>mark_weaver: i will try tx. then i will talk to the clojure people. when popen runs an octave script, cores are all at 100%
<tupi>the must be something wrong or with java or with hte way i call it
<tupi>don't know
<tupi>i write clojure because it's lisp like, i don't know anything [almost anything] abut java and less again about the java runtime ...
<tupi>why does the java server enters in sleep mode, i guess i will find out talking to clojure guys, i hope :)
<mark_weaver>well, on second thought, I guess that's silly. you can just look at 'ps'.
<tupi>top ?
<tupi>the command top i mean
<mark_weaver>hmm. I don't know that much about top, but as I recall top is meant to show the processes that are using a lot of resources, not the ones that are sleeping.
<tupi>it does
<mark_weaver>there's probably a nice 'ps' command that would sort by CPU time, and then piping that to grep to filter the commands of interest.
<tupi>but the real Q is why would these all sleep ?
<mark_weaver>I would try to find just one reproducable example of a command that sleeps, and then focus on debugging that one.
<mark_weaver>(by running it outside of guile)
<tupi>then it does not sleep
<tupi>let me try to be sure
<mark_weaver>are you creating bidirectional pipes? (where you send it its input and then read its output?)
<tupi>yes
<mark_weaver>ah, I think I know the problem then.
<tupi>that would be nice
<tupi>to know
<tupi>but i do that with octave too
<tupi>they all do something and write something
<mark_weaver>for many (most?) commands, you need to close the command's input pipe so that it gets EOF, *before* you close it's output pipe.
<mark_weaver>but (ice-9 popen) doesn't currently provide a way to do that. however, there are patches floating around to add that. it's quite easy.
<tupi>how do i do that ?
<mark_weaver>another question: are you sending a lot of data over these pipes?
<tupi>no
<tupi>a command
<mark_weaver>okay, good. one less thing to worry about then :)
<mark_weaver>are you receiving a lot of data from them?
<tupi>no
<mark_weaver>okay, I'll add what you need.
<tupi>a few values csv
<mark_weaver>give me a few minutes.
<tupi>sure, thanks again!
<tupi>i'll make sure it does not sleep manually outside guile
<mark_weaver>np :)
<mark_weaver>I think the problem is that the commands are waiting for EOF, but never getting it.
<mark_weaver>well, the other possibility is that they don't need EOF, but some of what you sent is stuck in the buffer, and you just need to flush it out.
<tupi>mark_weaver: wait a sec
<mark_weaver>(with 'flush-output-port')
<tupi>i just did a test and it sleeps as well in manual mode outside guile
<mark_weaver>ah, okay. is it waiting for EOF?
<tupi>no, not waiting, i just hit enter and wait until its done
<tupi>but top shows _always_ the S mode, never goes up 13% of the cpu
<tupi>and then it returns
<mark_weaver>hmm
<tupi>i think java is the problem
<mark_weaver>well, if you can run the commands and get the output from them before sending EOF (ctrl-d), then I guess you don't need anything more from guile at the moment.
<mark_weaver>are you calling 'flush-output-port' ?
<tupi>no i am not calling
<davexunit>mark_weaver: *finally* getting back around to the alist->hash-table stuff. I am making an ice-9/hash-table module. The options were to either include this module from boot-9 or leave it standalone. what do you think is appropriate?
<tupi>let me paste an example
<mark_weaver>hi davexunit!
<davexunit>mark_weaver: given that all of the other hash tables are imported by default I thought that maybe it would be best to include it in boot-9
<mark_weaver>tupi: I don't know off-hand what buffering mode is used by default for these pipes, but if there's any buffering at all, then you should call 'flush-output-port' after writing the input, before waiting for the output.
<mark_weaver>davexunit: yeah, I dunno. I suspect the procedures will be fairly small, so I think boot-9 would be fine.
<tupi>mark_weaver: here is a typical code http://paste.lisp.org/display/139990
<mark_weaver>davexunit: before you send another patch, one more thing I meant to suggest: in the tests, don't use the name of one of the hash table functions as the prefix for all the tests. instead, use a more generic prefix, and then give a name to each individual test. also, it would be good to test error handling.
<tupi>where should I flush ?
<mark_weaver>davexunit: btw, you should talk to unknown_lamer about his experiences with my preliminary nan-boxing patches.
<davexunit>mark_weaver: noted re: tests
<mark_weaver>tupi: oh, these are not bidirectional pipes. you're not sending any input to them at all.
<davexunit>mark_weaver: should the hash table functions be written in boot-9 or included from another file? I see two instances of include-from-path in boot-9.
<tupi>mark_weaver: ok, then it is definitely a java problem
<mark_weaver>davexunit: I dunno :) did ludovic make a suggestion?
<mark_weaver>tupi: okay
<tupi>mark_weaver: i have to go of the kb for a while, many tx again
<mark_weaver>tupi: okay, good luck!
<tupi>tx
<davexunit>mark_weaver: I'm pretty sure he would want it in a separate module. I'll just go with that.
<mark_weaver>yeah, it's probably best. really, we should be moving more stuff out into modules I think.
<mark_weaver>the core is getting a bit bloated.
<mark_weaver>s/getting // :)
<davexunit>just seems more approachable and intuitive
<mark_weaver>davexunit: so, fyi, here's what unknown_lamer said a couple days ago, after trying the nan boxing patch:
<mark_weaver><sneek> mark_weaver, unknown_lamer says: the patch applied with a bit of minor hacking, and yow! time spent in gc every 20 secs in my demo is down from 800+ms to about 15ms (only running at all every other period about). Still not much improvement with stutter, so I guess it's statprof time... still, NaN boxing looks like a great idea...
<sneek>Understood.
<mark_weaver>oops, what did I just ask sneek to do?
<davexunit>hahaha
<davexunit>we shall see
<mark_weaver>he said "gc is no longer an issue!", and "it's spending like 30ms per minute in gc now :)"
<mark_weaver>and "(and that gc time is constant, no matter how many particles I create!)"
<davexunit>that is extremely promising
<davexunit>I'm very happy to hear that.
<mark_weaver>so I will soon propose to add a compile-time option for nan-boxing, to tide us over until we have a better solution (unboxing floats based on type inference in the compiler)
<davexunit>the stutter that he is still experiencing is likely due to an effect known as "temporal aliasing" and not a guile problem.
<davexunit>that or the fps is dropping from too many particles or something.
<mark_weaver>you and he should definitely talk. maybe you have some insight that can help him track down the remaining issues.
<davexunit>yeah. that would be a good idea.
<mark_weaver>unknown_lamer: you around? :)
<davexunit>I'm in a coding slump at the moment, I'd like to be of help to someone
<mark_weaver>I'm done fixing tupi's bug, so I can spend some time on this in the next few days.
<davexunit>unknown_lamer: I'll be around to talk for a couple of hours (hopefully) if you see this.
<mark_weaver>(well, that makes it sound like it's tupi's fault, but actually it's just that the bug was causing him trouble)
<ArneBab>Wish (just to have it recorded somewhere): (rundoctests callable ... args) → expand the callable, extract the the docstring for all functions which will get called during execution, check for doctests and run them. ⇒ when you hack on something you can always rerun all tests which actually get touched by your codepath (instead of running all) ⇒ shorter test-cycles ⇒ faster development.
<davexunit>mark_weaver: in the C version of alist->hash-table, I used hash-create-handle with SCM_UNDEFINED as the default value. I don't know of a way to do the same thing in Scheme.
<mark_weaver>davexunit: create a private cons cell in a lexical scope enclosing the procedure. use that instead.
<mark_weaver>e.g. (define alist->hash-table (let ((undef (list 'undef))) (lambda ...)))
<davexunit>mark_weaver: sounds good. thanks. wasn't sure if something like that was too hacky.
<mark_weaver>Heh, I guess it's a little hacky. I wonder what ludovic will think :)
<mark_weaver>he'd probably prefer a simpler, elegant procedure even if less efficient.
<mark_weaver>whereas I try to make the primitives as fast as I can.
<mark_weaver>I dunno, in this case it might not be worth optimizing. maybe it's best to keep it simple.
<mark_weaver>hmm, maybe it would be better to just reverse the list and do it in reverse order.
<mark_weaver>I dunno, use your judgement :)
<mark_weaver>actually, I think the reverse method is nice for a few reasons: you'll get cycle detection for free (from reverse), and the code will be simpler and maybe even faster than the other options.
<mark_weaver>more generally, reverse will take care of all your error checking and duplicate detection for free.
<mark_weaver>s/free/cheap/
<davexunit>mark_weaver: sure I can just use reverse
<davexunit>though I imagine that reverse is a bit expensive?
<davexunit>iterate n times to reverse the list and then iterate another n times to set the hash elements.
<mark_weaver>well, anything other than the 'hash-create-handle' method will have an added expense.
<mark_weaver>in the case of reverse, it's mostly the expense of allocating N more cons cells, were N is the length of the alist.
<mark_weaver>if you do the more straightforward duplicate check, then it will be N more hash table lookups.
<davexunit>yeah that's fair.
<davexunit>I'll just reverse. :)
<davexunit>that sounds the most elegant to me.
<mark_weaver>yeah
<mark_weaver>you might consider using (ice-9 match).
<mark_weaver>it generates efficient code and looks very elegant, much better than cars and cdrs and null?s
<mark_weaver>most of us have been using it more and more over time.
<mark_weaver>though in this case it might not matter much.
<davexunit>mark_weaver: http://paste.lisp.org/display/139991
<davexunit>this of course doesn't account for the different hashing methods yet, but it's short and sweet. :)
<mark_weaver>beautiful! :)
<mark_weaver>perfect, I'd say.
<davexunit>I'll turn this into a macro to handle the other hashing methods
<davexunit>but it will remain pretty elegant I think.
<mark_weaver>sounds good.
<davexunit>although, now that I'm thinking about it, since I'm including the file in boot-9, it's going to be hard to hide the implementation detail: the macro that generates the 4 alist->hash-table procedures.
<mark_weaver>ah, I thought you decide on a separate module. but yeah, these are short enough that including four of them directly in boot-9 should be fine, imo.
<davexunit>I decided that they should be included in boot-9 because afaik every other hash table procedure is in the default module.
<davexunit>but you did bring up the point of making more modules.
<mark_weaver>hmm.
<davexunit>and then I agreed, but thought you meant something else.
<davexunit>:P
<mark_weaver>I actually think they should go in a separate module.
<mark_weaver>mostly because I think we should be encouraging the use of other hash table APIs like SRFI-69 and R6RS.
<davexunit>instead of native hash tables?
<mark_weaver>I think so. although right now they're not as fast, we can fix that.
<mark_weaver>I'd probably use the native ones in my own code where efficiency is important, but that's a different issue.
<davexunit>so less guile-specific APIs and more standard issue Scheme stuff like SRFIs?
<mark_weaver>I think so, yes.
<davexunit>in order to promote more portable code or other reasons?
<davexunit>making core guile smaller and more maintainable?
<mark_weaver>to promote more portable code. and also because I think hash tables should know what type of hash table they are.
<mark_weaver>I think it's a mistake that guile hash tables don't know.
<ZombieChicken>How large is the guile interpreter when compiled statically?
<davexunit>mark_weaver: yeah I hear you. I would only have to write 1 procedure if they worked that way.
<mark_weaver>I don't know off hand. and of course it depends on the platform.
<mark_weaver>ZombieChicken: ^^
<ZombieChicken>hrm
<ZombieChicken>I may just go with lua for my initramfs then
<ZombieChicken>Thanks for the inof
<ZombieChicken>info*
<mark_weaver>ZombieChicken: ah, I just realized I had one around from guix. altogether it's about 13 megs.
<mark_weaver>(including the .go files and such)
<ZombieChicken>hrm. Thank you. That is a tad larger than I thought
<ZombieChicken>something told me the couple hundred k seems a little light
<mark_weaver>you could probably trim a lot of those .go files.
<mark_weaver>there's a lot of stuff in there including the whole compiler.
<mark_weaver>the guile executable is about 3.4 megs.
<ZombieChicken>Thanks
<ZombieChicken>Somewhat off topic, but what would you suggest using in an initramfs for a scripting language?
<mark_weaver>depends on what you want to do, I suppose.
<mark_weaver>lua is definitely lightweight, I'll give it that, but IMO it's a poorly designed language.
<mark_weaver>nice implementation though.
<ZombieChicken>Some basic logic. I'd rather dodge having to do everything in shell
<ZombieChicken>I agree. Lua is nice but seems to lack in a few aspects
<mark_weaver>if you want a lightweight scheme, maybe check out chibi scheme. it's nicely done, modern, fast and light.
<ZombieChicken>chibi scheme?
<mark_weaver>yep
<dsmith>And chibi is designed to be embedded
<dsmith>IT's also r7rs
<dsmith>small
<davexunit>sometimes I wonder if I should have chosen chibi scheme when I set out to write a game.
<dsmith>About as modern as you can get
<davexunit>but I just like GNU and Guile so much.
<mark_weaver>well, I guess I should have left out "fast" from my list of attributes. It's probably not all that competitive in that department.
<ZombieChicken>This is an initramfs. If it is slow enough to really be a bother for that, then I can't imagine it is fast enough for many other purposes
<mark_weaver>davexunit: well, chibi is minimalist. it probably doesn't have a lot of what you need.
<mark_weaver>bindings, among other things. much smaller community no doubt.
<davexunit>yeah, it would only be for embedding. the engine would have to be written in C with a Scheme API exposed.
<davexunit>mark_weaver: about the hash table tests. you said to choose a common prefix and name each test. can I name a test via the pass-if form?
<mark_weaver>yes, the first argument can be a name (usually a string).
<davexunit>cool.
<davexunit>in my commit log, if I added a new module, need I list the procedures added or can I just say "New module"?
<davexunit>mark_weaver: posted a new patch to the list.
***haroldwu1 is now known as haroldwu
<nalaginrut>morning guilers~
<nalaginrut>wip-thread-safe-popen yeah~ ;-P
<davexunit>hey nalaginrut
<nalaginrut>davexunit: heay~
<nalaginrut>heya
<Chaos`Eternal>helo nalaginrut
<nalaginrut>halo
<nalaginrut>Chaos`Eternal: wip-thread-safe-popen is what you need
<mark_weaver>the wip one is no more. now it's just 'thread-safe-popen'.
<mark_weaver>davexunit: cool!
<Chaos`Eternal>cool
<mark_weaver>davexunit: you need to add the file to module/Makefile.am
<davexunit>mark_weaver: ah crap. I always miss something like that.
<mark_weaver>no worries :)
<mark_weaver>other than that, looks great to me!
*nalaginrut receiving mails, he has 800+ mails each day...
<davexunit>mark_weaver: yay! good to hear.
<davexunit>I'm off for the night. later guilers.
<nalaginrut>two hours later, I'm still receiving mails...
<nalaginrut>maybe there'd be some faster mail receiver...
<mark_weaver>nalaginrut: where were you trying to use 'include' with a relative pathname?
<nalaginrut>mark_weaver: I saw that mail just now, I'll reply it ;-P
<mark_weaver>okay
***sneek_ is now known as sneek
<civodul>Hello Guilers!
<nalaginrut>heya
<wingo>mark_weaver: morning / evening :)
<wingo>mark_weaver: i was wondering, does our evaluation of the 64k strategy change in the light of many small files, as is currently the case in Guile?
<wingo>mark_weaver: thinking out loud; I don't know of many alternatives (besides using the page size of the machine you're compiling on, or allowing that as an option)
<mark_weaver>wingo: well, we could do this: include a table of known maximum architecture page sizes for systems we recognize, and if we don't recognize the system, then use 64K.
<mark_weaver>though we need to think about what happens when we add a new system to our table.
<wingo>howdy :)
<wingo>i was wondering also about the ia32 and x64 cases
<wingo>i mean for x64 they say it can be up to 64 kB
<wingo>which means that's the minimum size for an object file with two segments
<wingo>seems excessive, but perhaps we have to pay for it
<wingo>that would push us towards compiling more than one .scm file into a single .go file, in the mid-term
<wingo>wouldn't necessarily be a bad thing :)
<mark_weaver>true!
<tupi>mark_weaver: hello. since we demonstrated that the core usage 'reduction' ['sleep' mode] is not a guile problem, i'd like to know if you think i should still use the par-map 'special' branch or not [will you integrate these changes in stable or not?]
<mark_weaver>tupi: I think you should go back to using the standard par-map for now, unless you see a problem with it.
<mark_weaver>wingo: in the short term, I have the problem that my attempts to make a more proper patch for the page size issue have all led to segfaults.
<wingo>mark_weaver: exciting!
<mark_weaver>but it's not worth bugging you about that now. I can debug it. I just had to put it down to work on the popen issue first.
<tupi>mark_weaver: ok, that is what i thought and did already, perfect, tx.
<wingo>cool; i would be fine with just pushing a patch bumping the page-aligned alignment to 64k and asserting that the runtime alignment of pages needing mprotect is aligned on the runtime page size
<mark_weaver>sounds good. we can always improve it later, before the final release. that's pretty much exactly what I tried to do, now I just have to figure out why it doesn't work :)
<mark_weaver>one nice thing about that approach is that we can always refine the page size of an architecture down to a smaller page size, and the existing 64K-aligned .go files should still continue to work.
<mark_weaver>wingo: on another topic: do you think my idea of cleaning up the semantics of multiple-value truncation is still reasonable to do, or has there been too much work on top of the current method? (to remind you, my idea was that the core semantics should not truncate, but that normal single-valued continuation should act like (lambda (val . _) ...)
<mark_weaver>)
<wingo>mark_weaver: i think your suggestion is the right way to go
<mark_weaver>cool :)
<wingo>there are some FIXMEs to that effect in the code :)
<mark_weaver>ah, good!
<wingo>btw did you see that i got function argument allocation to work better? there is much less shuffling now
<wingo>if you guild disassemble module/ice-9/eval.go you can see it, to an extent
<wingo>it's not perfect by any means tho
<wingo>fib only gets a 40% speed improvement currently
<mark_weaver>I did see something in the commit logs about that. sounds great! obviously I've been distracted by other things for a while, but I intend to spend more time in guile land soon.
<wingo>though there are some things to fix in fib's compilation
<mark_weaver>40%! only? :)
<wingo>hehe :) consider that primitive-eval is more than 3x faster than the stack vm :)
<wingo>and could be even faster with closure optimization
<mark_weaver>great stuff!
<civodul>primitive-eval faster than the 2.0 vm, woow
<mark_weaver>I really need to get fixing-letrec-reloaded in there. part of the reason I've been reluctant to post the patch is because I never added the error checking, and without it, it's easy to write code that works but shouldn't.
<wingo>civodul: sorry, to be precise: primitive-eval in master is around 3x as fast as primitive-eval in 2.0
<civodul>ah, that's already pretty good :-)
<mark_weaver>(without error checking, fixing-letrec-reloaded with rearrange the order of your letrec* clauses into the right order to make things work, automatically)
<mark_weaver>s/with/will/
<wingo>mark_weaver: yeah, error checking there would be swell
<wingo>the words "scope" and "automagic" don't go together :)
<mark_weaver>indeed :)
<mark_weaver>3x as fast, that's almost hard to believe that it can be so much faster. are you talking about the one in Scheme?
<wingo>mark_weaver: yes, the one in scheme
<mark_weaver>do you know why that one sees such significant performance improvements compared to other code?
<wingo>my test case is (primitive-eval '(let lp ((n 0)) (when (< n 1000000) (lp (1+ n)))))
<wingo>a lame test case, I know, but none of that gets inlined, so it does stress the evaluator
<mark_weaver>oh well, okay :)
<wingo>it tests calls, shallow lexical access, toplevel-ref, const, and conditionals
<mark_weaver>I don't think you can quite conclude from that example that "it's 3x faster", but it's still nice that it's 3x faster for that example :)
<wingo>so it's a fairly good spread
<dsmith-work>wingo: Can .go files have holes?
<dsmith-work>Doens't help much with packaged .go files I guess
<wingo>dsmith-work: they could, yes
<mark_weaver>heh, interesting idea :)
<wingo>:)
<wingo>mark_weaver: dunno, i think it is a reasonable test case for eval, covering most of eval's features
<wingo>not closure creation
<wingo>but most other things
<wingo>there were various tweaks to the evaluator's structure that helped get the 3x speed
<wingo>adding the capture-environment op, for one
<mark_weaver>well, it's easy enough to come up with other more real-world tests, such as: how long does it take to compile psyntax-pp.scm with everything compiled vs everything interpreted (except for compiled eval.scm)
<wingo>making memoized code just pairs instead of a structure that required a subr call
<wingo>mark_weaver: that would be different because the compiler is different
<mark_weaver>or, just try running a set of well-known benchmarks, or something :)
<mark_weaver>admittedly, all benchmarks are a crock, even the "good" ones. it really depends what you're doing.
<wingo>the two dimensional lookup helps too, but it didn't help in 2.0 because loops were compiled so poorly
<mark_weaver>oh, does the scheme primitive-eval do 2d lookups now?
<wingo>yep
<mark_weaver>nice!
<wingo>vectors linked through their first field
<mark_weaver>great stuff
<wingo>here's the lookup loop:
<wingo>L1:
<wingo> 10 (make-short-immediate 6 2) ;; 0
<wingo> 11 (br-if-= 5 6 #t 5) ;; -> L2
<wingo> 13 (add1 4 4)
<wingo> 14 (vector-ref 3 3 4)
<wingo> 15 (return 3)
<wingo>L2:
<wingo> 16 (vector-ref/immediate 3 3 0)
<wingo> 17 (sub1 5 5)
<wingo> 18 (br -8) ;; -> L1
<wingo>could be better in various ways but still is ok
<wingo>depth is in 5, width in 4
<wingo>(specifically: by hoisting the make-short-immediate out of the loop, or adding a br-if-inum=; rotating the loop so the br-if-= is at the bottom; making the loop body contiguous)
<mark_weaver>it must be because I just woke up; I'm having trouble groking this code. give me a moment :)
<mark_weaver>what is this code in scheme?
<wingo>(let lp ((e env) (d depth))
<wingo> (if (zero? d)
<wingo> (vector-ref e (1+ width))
<wingo> (lp (vector-ref e 0) (1- d))))
<wingo>eval.scm:480
<mark_weaver>ah, right, of course. sorry, just forgot about how the data structure worked.
<mark_weaver>(with the links in index 0)
<wingo>yep
<wingo>avoids a needless second allocation for the spine
<mark_weaver>yeah, the hoisting thing would be nice, but the current code is perfectly reasonable.
<mark_weaver>one of the things that's on my mind lately is the fact that some of our users are having problems with flonum performance, because of all the GC allocation.
<mark_weaver>unknown_lamer experimented with my nan-boxing patch, and it made a *huge* difference for him.
<mark_weaver>now, I don't think we should make nan-boxing the default.
<mark_weaver>however, I wonder if there's something else we could do medium-term.
<wingo>i was thinking that for tight floating-point loops we should unbox
<wingo>and maybe add flonum vm ops
<mark_weaver>flonum VM ops that work on unboxed floats, you mean?
<wingo>but another possibility, a very real possibility, is to write a cps -> assembly compiler for special cases
<wingo>mark_weaver: yes, with debugging support to mark those locals as "not SCM values, but unboxed values"
<wingo>we'd have to ensure alignment of the stack on 32-bit systems; shouldn't be too hard, the compiler can guarantee that
<mark_weaver>sounds good, although again I worry about opcode space.
<wingo>i was thinking that in davexunit's case you could emit assembly very easily for a subset of scheme
<wingo>yeah, any opcode fix is temporary; obsolete once we do native...
<wingo>it's becoming more clear in my mind. the only thing i don't have clear is how to call from native to libguile routines -- you'd need something like the plt i guess
<mark_weaver>I'm still not sure it makes sense to go 100% native, but I dunno, maybe I'm wrong.
<wingo>i think it makes sense, but we'll see
<mark_weaver>I worry about code compactness and cache effects.
<wingo>we can start with half-steps, like specialized compilers for things like davexunit's case
<wingo>i'd much rather have a system with just one stack! :)
<mark_weaver>all of the tag checking and fallback code makes the code bigger. hiding those things in the VM makes the code much smaller.
<wingo>than a mixed vm/native system
<wingo>yeah i don't know, many tradeoffs there
<mark_weaver>trivial benchmarks will of course always point toward 100% native.
<mark_weaver>in order to see the benefits of code compactness, you must consider the memory footprint of a larger system.
<wingo>sure.
<mark_weaver>well, I guess I'll start by implementing VM ops for the R6RS fixnum/flonum operations. and then, once they are primitives from the compilers standpoint, giving a lot of information to a potential type inference pass, we'll be able to play around with unboxing and such.
<mark_weaver>what do you think?
<mark_weaver>(if you have another idea, I'm not set on this approach)
<wingo>dunno :) i also worry about opcode explosion, and i don't find the r6rs ops to be terribly helpful from the representation standpoint -- they correspond to a semantics and not a representation
<wingo>i.e. they don't operate on unboxed values
<wingo>it would be too expensive to encode some kind of runtime table that opcode would use to indicate "unboxed vs boxed"
<wingo>to do dispatch at runtime
<wingo>so to me unboxing and r6rs semantics are orthogonal
<wingo>if we need them i am fine with having them but i never paged all of that into memory :)
<wingo>so i'm happy to make up opinions but basically i don't know anything here and am happy to trust whatever you decide to do :)
<mark_weaver>I see what you mean, and am sympathetic to the issues you raise.
*wingo verbal diarrhea ;)
<mark_weaver>the thing is, R6RS fixnum/flonum operations are right now *much* slower than the generic ops, which discourage people from using them.
<mark_weaver>but using them would be useful, because they provide very useful information to the type inference pass.
<mark_weaver>namely, they are guaranteed to return a fixnum (or flonum, depending on the op)
<wingo>so you are saying they are useful as primitives but people don't use them because they are slow
<wingo>and so to get people to use them we should make opcodes so they are not so slow
<wingo>is that it?
<mark_weaver>well, yeah, that was my thought.
<wingo>cool
<mark_weaver>but maybe I'm not thinking long-term enough :)
<wingo>hehe, dunno :)
<mark_weaver>I suppose there are entirely different approaches we could take, such as explicit type annotations.
<wingo>yes, or things like (logand EXP #xffffffff) for 32-bit numbers
<wingo>being able to say "this is a 32-bit number with the usual overflow semantics" is most useful from a compiler pov
<wingo>(* x 0.0) for flonums, etc
<wingo>but i haven't thought about this at all, i have no idea what would work
<wingo>i was thinking of maybe a low-level cps language below our current cps, that includes representation information
<mark_weaver>ah, interesting idea.
<wingo>so you could reason about representations, and optimize based on representations -- like transforming the logand case to the right machine ops
<wingo>but at the IL level -- so you do your thinking and optimizing at that level, and when it comes time to emit code there is very little to decide
<mark_weaver>sounds good, but I think we need to support some way for the user to provide some type information at the higher level, or else the type inference system won't be able to do much.
<wingo>that would obviously be the level at which to express unboxing
<wingo>mark_weaver: sure, like declarations in CL i guess
<wingo>dunno
<mark_weaver>the thing is, R6RS fixnum/flonum ops are currently the only standardized system for doing that in Scheme that I know of. and there's a fair bit of code out there that uses them I think.
<mark_weaver>I'd like to find some way to support it well, if possible.
<wingo>sure
<mark_weaver>even if we add something else in addition (we probably should).
<wingo>sure
<mark_weaver>one thing that would help reduce the pressure toward opcode explosion is to have a faster way to call out to specialized C code... code whose calling convention is specialized to work well with our VM.
<mark_weaver>any thoughts on whether that makes sense, and if so, how best to do it?
<mark_weaver>right now we have a huge performance gap between opcodes and non-opcodes.
<mark_weaver>it would be nice to have a middle ground.
<wingo>what is the performance gap?
<mark_weaver>well, admittedly I'm making assumptions. I haven't measured it :)
<wingo>heh heh :)
<wingo>i do the same thing many times so i'm not the right one to say it, but measurement is important :)
<wingo>do measure on a recent build of master btw
<mark_weaver>true, but I have a limited amount of time, and the more time I spend measuring the less time I can spend on other hacking.
<wingo>fair enough
<wingo>we should add a kind of subr that takes its arguments in an array
<wingo>and then just invoke it with arguments from the stack
<wingo>maybe only for internal use; it's not a nice API to expose to users, without the ability to prevent out-of-bounds access
<mark_weaver>right, internal use only.
<wingo>that should be as fast as most opcodes, i would think
<mark_weaver>it would be nice to avoid calling out to a stub that involves several VM instruction dispatches before you even get to the C code.
<mark_weaver>I'd probably go even a step further, and let the C code look at the instruction stream and stack directly, and handle extracting the opcode fields.
<mark_weaver>basically, i'd like to see a way to call C code that does pretty much the same thing as the current VM instructions do.
*wingo grimaces
<mark_weaver>heh
<mark_weaver>well, maybe this is the wrong way to think about it. maybe a better way to think about it is this: they would be part of the VM, but part of a second tier of instructions.
<mark_weaver>they'd be segregated from the main VM, so wouldn't contribute to the footprint of the core, and they'd make use of a larger opcode space, but otherwise they'd be part of the VM.
<wingo>i am currently getting that subr calls make a tight loop go from 0.7 to 3 seconds, whereas vm ops are 0.7 -> 1s
<mark_weaver>I dunno, maybe it's just a bad idea. I'm thinking out loud I suppose.
<wingo>scheme@(guile-user)> ,time (let lp ((n 0) (x 0)) (when (< n 100000000) (lp (1+ n) (logand x n))))
<wingo>;; 1.019501s real time, 1.020164s run time. 0.003477s spent in GC.
<wingo>scheme@(guile-user)> ,time (let lp ((n 0) (x 0)) (when (< n 100000000) (lp (1+ n) x)))
<wingo>;; 0.700154s real time, 0.700493s run time. 0.000000s spent in GC.
<wingo>scheme@(guile-user)> ,time (let lp ((n 0) (x 0)) (when (< n 100000000) (lp (1+ n) (logbit? 0 n))))
<wingo>;; 2.976694s real time, 2.978131s run time. 0.000000s spent in GC.
<wingo>but the subr case is suboptimal
<wingo>because of some loop-related things (you'd want to hoist the toplevel-box out of the loop)
<mark_weaver>the test loops have to be unrolled I think, or else the loop itself will dominate the benchmark.
<wingo>mark_weaver: that was the second case
<wingo>the case of a bare loop
<wingo>0.7s
<mark_weaver>oh, I see.
<mark_weaver>okay, so .3 seconds vs 2.3 seconds.
<mark_weaver>i.e. a factor of about 7.67
<mark_weaver>hence, the big performance gap :)
<wingo>sure, that's an ok back-of-the-envelope answer
<mark_weaver>I have a lot of arithmetic operations that I'd like to make faster. too many to make opcodes for them all.
<wingo>faster calling conventions are good, but i'd rather avoid primitives knowing about a vm instruction stream
<mark_weaver>well, too many to fit into 8-bit opcodes comfortably, anyway. and of course, I don't want to blow up the cache footprint of the core VM either.
<wingo>that way leads to bloat of a much stickier kind
<wingo>ah, the logand case is special also
<wingo>because it has a rest arg
<wingo>er
<wingo>hum.
<wingo>i did logbit?, that's not that bad :)
<wingo>anyway, let me get back to this firefox thing...
<mark_weaver>okay, ttyl!
<wingo>ciao, happy hacking :)
<mark_weaver>:)
<micro`>join #geiser
<wingo>:)
<jemarch>hi
<civodul>howdy jemarch!
<jemarch>civodul: I need your help
<civodul>tell me everything :-)
<jemarch>Ok, i am doing something with scheme, which is not public yet :)
<civodul>ah ah!
<civodul>:-)
<jemarch>but
<jemarch>You know, I am such a dumb lisp2-er that this lisp1 is confusing me
<jemarch>and probably there is a better way to do this..
<jemarch>lets say you get three strings: "amethod" "arg1" and "arg2"
<jemarch>then you want to invoke the named goops method (generic function) and pass the arguments "arg1" and "arg2" to it
<jemarch>Something like (primitive-eval (list (string->symbol "amethod") "arg1" "arg2")) works, as I would expect
<civodul>so first of all, eval is evil
<jemarch>yes
<civodul>:-)
<jemarch>I noticed :D
<jemarch>what is the alternative?
<civodul>so you want to invoke a method whose name comes from elsewhere?
<civodul>like the names comes from a UI
<civodul>*name
<jemarch>Yes. This is what I came with, which works: (define (invoke-method obj method-name args) (apply (primitive-eval (string->symbol)) args))
<jemarch>but I find it too convoluted... so probably there is a better way
<civodul>it depends
<civodul>if method-name really comes from a UI or something like that, then there's no better way
<civodul>however, if it comes from code, then yes
<civodul>you would pass the generic function itself instead of its name
<jemarch>no, it comes from an external source
<civodul>ok
<civodul>you could have a hash table mapping names to procedures/generic functions
<civodul>(mapping names to applicable things in general)
<civodul>so you would do (apply (hash-ref %mapping method-name) args)
<jemarch>hm, yes, but then I would need to register the stuff every time a method is created
<civodul>right
<civodul>but it may be safer than evaluating arbitrary things in the context of the current module
<civodul>you could just have a define-method* macro that does define-method + hash-set!
<civodul>or, if you really want to, you can also use plain define-method, and then use (module-ref the-module method-name)
<jemarch>the methods are also defined in terms of external stuff :)
<jemarch>even the classes
<civodul>oh!
<civodul>well evaluate all these external things in a module of their own
<jemarch>I am already doing that I think
<jemarch>I am also defining goops classes on the fly using primitive-eval
<jemarch>the same problem again: since define-class is not a procedure I can't directly (apply) it. I have to pass through (apply (primitive-eval 'define-class) ...)
<civodul>ah, right
<civodul>well, in that case, eval may be the way to go
<jemarch>so I am not doing something extremely stupid?
<civodul>doesn't seem like it :-)
<civodul>that sounds interesting
<civodul>intriguing ;-
<civodul>;-)
<jemarch>well, once I release you can rewrite all the scheme stuff :)
<jemarch>oh, another thing...
<civodul>heh
<jemarch>how can you test that something is a goops class?
<jemarch>I am using return (posh_guile_format_eval (&val, "<%s>", cname) == POSH_OK) for the moment but that sucks. It returns true for any defined <foo> symbol be it a class or not.
<jemarch>I guess I could eval (class-name <%s>) I bet that fails if <%s> is not a class..
<civodul>or class?
<jemarch>that is the first thing I tried
<civodul>os (is-a? obj <class>)
<civodul>something like that
<civodul>"posh" hmm :-)
<jemarch>damn! :D
<civodul>muahaha!
<civodul>lemme get in touch with Wikileaks...
<jemarch>anyway, class? does not exist
<civodul>so is-a?
<jemarch>I would need to instantiate an object for that
<jemarch>and the instantiation itself could be the predicate: (eval '(make <foo>))
<civodul>you could do (is-a? (eval '<foo>) <class>)
<jemarch>oh, that works?
<mark_weaver>is-a? is definitely what you want here, I think
<jemarch>it works
<jemarch>great :)
<jemarch>(/query civodul
<jemarch>fuck, too much lisp :D
<unknown_lamer>eval, what
<unknown_lamer>jemarch: much like in CL, there is a procedural interface for generating classes and whatnot at runtime...
<unknown_lamer>IIRC, you should be able to (define <newclass> (make <class> ...))
<unknown_lamer>and methods are just applicable instances (or "funcallable instances")
<unknown_lamer>err, well, generics are. And methods are attached to them. Same as CLOS. No need to involve eval for generating classes and methods at run time
<jemarch>What about calling them by name? Can eval be avoided in that case?
<mark_weaver>unknown_lamer: I think the issue here is that he's starting with the names of things as strings.
<jemarch>yes, right
<mark_weaver>unknown_lamer: and to make matters worse, some of these "things" are actually macros.
<unknown_lamer>you can intern strings, and then module-ref
<unknown_lamer>hrm, macros
<unknown_lamer>well, if you're doing that you're doing something wrong and should flog thineself five times
<jemarch>heh, nice
<mark_weaver>jemarch: If you'd like to share, I'm curious about the bigger picture of what you're doing here, in case one of us can think of a better way.
<unknown_lamer>also, what about read+compile
<unknown_lamer>CLers don't seem too annoyed when dynamically generated stuff creates expressions and then calls the compiler on them
<unknown_lamer>but eval has wacky behavior
*unknown_lamer used to hack on ucw + parenscript + contextl + lisp-on-lines...
<mark_weaver>where are these strings coming from?
<unknown_lamer>so, waaaay more weird "wait you don't need eval for this?" stuff than a mere mortal should ever have to care about
<unknown_lamer>all I'm good at is MOP trickery :(
<unknown_lamer>oh yay, freeglut was ported to OpenGL ES2 / OpenGL 3.3 core profile
<unknown_lamer>the teapot shall live on
<unknown_lamer>hrm, there isn't an existing guile library (or another scheme implementation, a pure scheme version would be nice and worth any porting effort) for loading images is there?
<unknown_lamer>my current inclination is to just bind two or three functions from imlib2
<unknown_lamer>the FFI makes life so much easier when you just want to something like that