IRC channel logs

2021-07-05.log

back to list of logs

<vijaymarupudi>I'm having trouble with speed after using a hashtable with a million keys and values for data analysis. I was wondering if there was anything I could do to help the GC?
<rlb>Which guile version?
<rlb>There were some hashing issues in 2.2. iirc that could cause serious performance trouble.
<vijaymarupudi>guile3
<rlb>OK, then I think that issue should be fixed.
<rlb>I'm a little surprised that a million key hash table has trouble, depending on what kind of trouble you mean.
<dsmith>vijaymarupudi: Are you giving it an initial size?
<vijaymarupudi>I'll have to apologize for the artificial example, but it's reproducible
<vijaymarupudi> https://pastebin.com/raw/PzUArBFJ
<vijaymarupudi>Python version here: https://pastebin.com/raw/Dftd65yL
<vijaymarupudi>The guile version takes 2.5s on my machine
<vijaymarupudi>Python takes 0.4s
<vijaymarupudi>Racket seems to zoom through similar code
<vijaymarupudi>I suspect I'll have to take this to C
<vijaymarupudi>Looking at strace, it seems like it seems to "stop" at a null byte written to a pipe
<vijaymarupudi>The source code says that this is the sleep pipe
<vijaymarupudi>Not entirely sure what that means
<vijaymarupudi>dsmith: initial size seems to take my example down to 1.19 seconds, but I don't think I know what the size would be in my data analysis code
<dsmith>vijaymarupudi: How about something like a prime number larger than your number of keys?
<vijaymarupudi>dsmith: that does seem to improve things a bit
<vijaymarupudi>it still seems slower than a directly equivalent python program
<vijaymarupudi>guile seems faster at the numerics, but slower at the hashtable
<dsmith>vijaymarupudi: Another thing may be the hashing function.
<vijaymarupudi>hash(i) == i in python3, I assume guile does the same?
<rlb>I *won't* be surprised if (c)python is faster for some "normal" hash operations. I have the impression it's pretty fast there.
<vijaymarupudi>I just checked with pypy3, which does it 0.5s, which is slower than cpython
<vijaymarupudi>I'm pretty impressed with python here
<vijaymarupudi>Huh, I wonder if there's a way to optimize guile's hashtables?
<dsmith>Well, picking a large enough size would prevent it from resizing (and re-hashing) the table.
<dsmith>A fast hasher that doesn't have (m)any duplicates for your keys should be faster too.
<vijaymarupudi>That does help, but I was curious if it was possible to reach python perf here
<vijaymarupudi>Looking at https://github.com/python/cpython/blob/main/Objects/dictobject.c, it seems like they use an algorithm that doesn't have to recalculate hashes on resize
<vijaymarupudi>They put the objects (with cached hashes) in a separate vector and only refer to the index from the "main table"
<vijaymarupudi>Comparing with https://git.savannah.gnu.org/cgit/guile.git/tree/libguile/hashtab.c, it seems like guile rehashes on each resize
<vijaymarupudi>Which explains why passing the size helps so much
<vijaymarupudi>(I'll have to leave for a bit, will check the logs later, thanks for taking my questions!)
<rlb>guile also doesn't cache hash values for immutable objects/trees the way the jvm (and/or clojure/jvm) do.
<rlb>I have a foo.test file that works fine if I run it via ./check-guile foo.test, but it crashes if I run it via "make check" like this:
<rlb> ERROR: json.test: (tree-equal? x (call-with-input-string (call-with-output-string (lambda (port) (write-json x port))) read-json)) - arguments: ((misc-error "simple-format" "FORMAT: Unsupported format option ~~~A - use (ice-9 format) instead" (#\f) #f))
<rlb>
<rlb>The test module has no references to format.
<rlb>Oh, hmm, might have found it.
<rlb>Hit another weirdness with srfi-88. I just wanted to use keyword->string, but using it in a foo.test file crashes tests later on, i.e. creating a test-suite/tests/breakthings.test containing this causes subsequent tests to crash: (define-module (test-suite breakthings) #:use-module (srfi srfi-88))
<rlb>And just #:selecting keyword->string doesn't help.
<rlb>wingo: wonder if that might be reader related ^ Haven't tried that with anything but f449d2ebcb242b5824e167ae9cf73bec6218c683 yet.
<rlb>My previous issue with srfi-99 was that reader changes persist after the end of a module file -- be more useful if there were a way to say that "just this module uses 'prefix", but this is caused just by loading the module.
<rlb>"this new issue is caused"
<rlb>wingo: oh, I take it back. That issue was found with current master, not that sha.
<rlb>Oh, right, I forgot that just using srfi-88 calls read-set!. That's pretty surprising, but maybe that's what the srfi wants... I just need to avoid srfi-88 -- can always define keyword->tring and string->keyword myself. At least for now, there's no more direct function.
***apteryx_ is now known as apteryx
***cadmium.libera.chat sets mode: +o ChanServ
***chris2 is now known as Guest8193
<leoprikler>rlb perhaps we'd want to provide those in (guile), WDYT?
<ecraven>how do I tell the guile3 command line to load an r7rs library? how do I point it at the sls file?
<ecraven>also, which library exports `interaction-environment'?
<wingo>ecraven: guile --r7rs should work fwiw. to load an sls you use import in the file you load, as usual
<sarna>hey, I have a `let*` in which I just pass the previous variable to the next one. does guile have something like clojure's threading macros? ( https://clojure.org/guides/threading_macros )
<Guest8193>srfi-197
***Guest8193 is now known as chrislck
<chrislck>sarna: srfi-197
<sarna>thanks chrislck!
<sarna>oh, guile doesn't support it :^(
<vijaymarupudi>I was trying to make a c extension wrapper around GLib's hashtable to get around performance problems with Guile's hashtable (I really want to use guile), and it performs similarly to Guile's hashtable. This is odd, because in pure C, GLib's hashtable is really fast. Turns out the culprit is scm_gc_protect_object, which appears to acquire a mutex on each call, and uses Guile's hashtable, making the c wrapper kinda redundant. I was
<vijaymarupudi>wondering if there was a way to get around this?
<vijaymarupudi>Solutions I can see working here: An API to protect multiple objects at once (to prevent mutex locking in tight loops) or to convert the scm_protects hash_table to c? I'm not sure which one is responsible for the performance impact, is there a way for me to know?
<vijaymarupudi>To quantify performance impact, removing scm_gc_protect improves runtime for hashtable loops from 300ms to 50ms
<vijaymarupudi>(ignoring the segfault afterwards :))
<dsmith>heh
<leoprikler>ignoring segfaults reduces runtime to 0ms in the optimal case :)
<vijaymarupudi>:) in a less ridiculous approach, I have paired GLib's hashtable with a Guile list (which keeps the references) and performance is now 100ms with no segfault.
<vijaymarupudi>But with this approach, removing an item from the table doesn't make it available to the gc
<chrislck>sarna: I believe guile *could* import srfi-197 however I don't know how
<sarna>chrislck: I don't need it *that* much, I'll just use `let*` :D
<sarna>how do I make libraries visible to my program? I installed one with guix and when I run `guile myprogram.scm` it can't find the module
<civodul>sarna: hi! when you run, say, "guix install guile guile-lib", there's a hint displayed about setting environment variables
<civodul>if you follow that hint, it'll set GUILE_LOAD_PATH
<civodul>from then on, "guile" will find the Guile-Lib modules
<leoprikler>vijaymarupudi: what if you used the list pairs as keys though?
<leoprikler>in that case, for a deleted key, you would have to seek that pair (O(n), maybe one can do it faster) and then simply drop the car
<vijaymarupudi>I might do that, and then drop the cars during hash table resizes
<vijaymarupudi>thanks for the suggestion!
***ays1 is now known as ays
<sarna>civodul: I must have missed it. I'll set it up, thanks!
<rlb>sarna: If it helps, these may work fine in scheme too: https://git.sr.ht/~rlb/lokke/tree/main/item/mod/lokke/base/syntax.scm#L108 And the cond->/cond->> in there will likely work if you chagne the %scm-if to if, etc.
<rlb>(And of course, I'd be happy to contribute scheme versions of any of that to Guile proper if we were to decide it's desirable.)
***cadmium.libera.chat sets mode: +o ChanServ
<manumanumanu>sarna: this is a guile module I wrote a long time ago: https://hg.sr.ht/~bjoli/guile-threading-macros
<manumanumanu>but if anyone has a working implementation of srfi-197 you should use that
<dsmith>Hmm, The sample implementation https://github.com/ar-nelson/srfi-197/tree/draft-3 has Guile working with the r6rs lib