IRC channel logs

2022-01-12.log

back to list of logs

<rlb>wingo: I don't know if I'll get back to the utf-8 bits I was working on soon, but if I do, I'll let you know.
<rlb>I had a good bit of it in place, but it's a lot of change, and there was likely a good bit left to do.
<rlb>Either way, I think the "proportional" indexing might be sensible (saw an article later suggesting swift(?) may have done something similar)...
*dsmith likes the way Rust handles all that.
<rlb>dsmith: which bit, the memory layout, the osstring vs string distinction, or something else? If the latter, then I definitely agree, I'd love for us to have something like that too (or equivalent), i.e. support for binary paths, usernames, groupnames, etc.
<robin>a couple random string things: should 'osstrings' be distinct from scheme strings or allow them in regular strings a la PEP-383 (related to markus kuhn's utf-8b idea), and also would it be reasonable to support emacsy extended-utf-8 (which, iirc, allows 5/6-byte sequences as a kind of a giant private use area for non-unicode charsets)
<robin>ah, emacs only uses 5-byte extended-utf-8 sequences, as well as an 'eight-bit-char' encoding: https://git.savannah.gnu.org/cgit/emacs.git/tree/src/character.h
<robin>i'm not sure whether there are actually still charsets that aren't effectively subsets of unicode
<robin>for example mule-conf.el says that jisx0213 "contains characters not in Unicode (3.2?)" and 3.2 was released...literally two decades ago
<robin>indeed, japanese wikipedia indicates that jisx0213 is a subset of unicode as of version 5.0 from 2006
<robin>wp:MULE: "MULE provides facilities to handle text written in many languages (at least 42 character sets, 53 coding sets, 128 input methods, and 58 languages[1]), and multilingual texts containing several languages in the same buffer. This goes beyond the simple facilities offered by Unicode to represent multilingual text."
<robin>[citation needed] (though it was probably true when MULE was originally implemented)
<robin>wingo, i'm very curious as to what the plans are for gc :)
***sneek_ is now known as sneek
<wingo>robin: aaaaaaaaaahhhh i don't have a concrete idea right now! on the high level, gc allocation speed is a bottleneck for us. if we could do bump-pointer allocation and avoid scanning the whole heap when we need to reclaim space that would be a large win. a whole pile of tradeoffs, but i would imagine this is something that could work: keep conservative scanning of the C stack; any heap object referenced by the c stack is pinned in place.
<wingo>compiled scheme allocates in a nursery. precisely scan the scheme stack. 2-generation heap, with compacting nursery and stop-the-world parallel mark/sweep old generation
<robin>coincidentally, i tried to implement ephemerons today in pure scheme and ended up implementing a library that allows the user to create, ahem, *one* ephemeron (or, ephemerons that don't reference one another would be a more charitable evaluation); things like ephemerons and blobs seem quite tricky atm (i *think* ephemerons are doable at the c level...but rising c levels are bad for the environment, no?</terrible-pun>)
<wingo>yeah you really want primitive gc support for ephemerons and ephemeron tables...
<wingo>btw i don't know if you've read it but if you are interested, https://gchandbook.org/ is really quite definitive
<robin>i had it on a to-read list at one point but didn't know if it was particularly good or not; will definitely take a look, thanks for the recommendation!
<wingo>a couple good reads recently: https://arxiv.org/pdf/2112.07880.pdf https://arxiv.org/pdf/2004.11663.pdf
<wingo>in my scheme above -- consider a call stack C1 -> Scheme1 -> C2 - > Scheme2 -> C3
<wingo>when would a nursery get pinned? that's the issue iiuc. in Scheme1 it should be quite unlikely because Scheme1's call frame is newer than C1's, but possible if an object allocated by S1 was written into an object from the old generation
<wingo>the write barrier could detect this but we're still stuck with a pinned nursery
<wingo>but then... actually is it stuck? the reference to the nursery object would be from an on-heap data structure which guile should control
<wingo>so we could relocate the object as needed
***Guest4659 is now known as roptat
<wingo>so i guess that it's just that C1 can't conservatively reference an object from the nursery, if the nursery was created in S1
<wingo>however C2 could conservatively reference a young object allocated by S1. in that case the nursery is pinned and can only be freed later. but, if we make C always allocate into old space (nursery allocations are only from compiled scheme), then we don't need to compact the nursery if C2 triggers collection
<robin>(zixian cai et al. looks informative; ocaml paper promises to be a banger as there must be *huge* issues involved or they'd've gotten around to it many years ago, surely)
<robin>wingo, i think that makes sense, after scribbling some inchoate diagrams (roommate's probably going to think i'm getting into qabala and/or abstract algebra...)
<robin>nursery compacting sounds like it could be a big win, along with bump-pointer consing
<tohoyn>daviid: what's the status of virtual functions (in interfaces) in G-Golf? Call (make <function> #:info info) fails if info is a vfunc info.
<wingo>i was very interested to see that the low-pause gc's sometimes had higher latency than stop-the-world
<wingo>both because sometimes they had to pause, and because of what they had to do to slow down the mutator
<wingo>in the "empirical lower bound of gc overhead" paper
<dsmith>rlb: The string/ostring really makes sense to me. But I like just about everything about it.
***daviid` is now known as daviid
<stis>Tja guilers!
<stis>Wingo: support of the gc for prolog systems would be great!
<wingo>civodul: i propose to merge wip-inline-digits, wdyt?
<wingo>civodul: also i hope i fixed the resolve-free-vars bug in git
<wingo>not wip-inline-digits tho
<wingo>on "main"
<civodul>wingo: re resolve-free-vars, awesome
<civodul>i managed to build Guix with wip-inline-digits BTW
<civodul>(i just added all the optional dependencies as a workaround)
<civodul>i haven't measured memory usage and build time tho
<civodul>i did get that GC warning once, which might suggest that memory usage got up
<civodul>but that's very unscientific
<wingo>hum, interesting
<civodul>i like the new integers.[ch], much more pleasant than the big numbers.c
<wingo>wip-inline-digits should cause less gc-managed allocation but more mallocation -- because it lets gmp use malloc/free when working on mpz types
<wingo>or rather when allocating memory for a mpz value
<wingo>there is the possibility that i forgot to mpz_clear something that i needed to clear
<wingo>hopefully not of course :P i will read over it again to check
<civodul>should we still allow for the use of libgc as GMP's allocator?
<civodul>because in a way the new implementation doesn't change this tradeoff
<civodul>woow, i hadn't realized scm_i_big2dbl and friends were public, fun :-)
<wingo>civodul: imo, no. we should stop mucking with gmp's allocator
<wingo>before, the tradeoff was "use libgc and run the risk of breaking other gmp users" or "use finalizers, which is very slow"
<wingo>now we can avoid both problems and also have bignums be pointerless (i.e. not just the digits but the header too)
<wingo>setting the gmp allocator wouldn't do anything for us functionally. for stack-allocated mpz_t, gc_malloc_pointerless may be faster but we can't assume all mpz_t will be marked
<civodul>put this way, i agree that's quite compelling
<civodul>i guess it all depends on how frequently GMP operations entail mallocing
<wingo>basically if it's a problem for guile then we switch to mpn
<wingo>so it's a problem that we can address, i think
<civodul>ok, that makes sense to me
<wingo>the other thing about malloc is that it's quite transient -- it's only for temporaries
<wingo>in gmp as used by guile anyway
<civodul>right
<dsmith-work>{appropriate time} Greetings, Guilers
<civodul>o/
<wingo>o/
<civodul>wingo: one thing i wanted to look at is environment lookup in psyntax
<civodul>currently it's an alist like in the good ol'days
<civodul>when interpreting gnu/packages/*.scm, we spend a lot of time traversing those alists
<civodul>(when compiling too)
<wingo>funny
<wingo>why is that? are those lists quite long?
<civodul>it'd be nice to use something like vhash or fash or something
<wingo>is that a property of the source program?
<civodul>i haven't gathered figures about the length of environments, but i guess so
<wingo>or do those lists grow and grow with file length
<civodul>(package ...) forms create one local variable per field, for example
<civodul>so in a typical package file, you end up doing lots of lookups in not-so-small environment alists
<civodul>that's my interpretation at least
<wingo>it sounds odd to me that assq would be the bottleneck. not impossible but my instincts are saying that might not be it
<wingo>anyway, just fwiw
<civodul>yeah i should provide more accurate data, i'll see if i can focus on this soon
<wingo>there is an odd list-ref in search-list-rib, dunno how often that case is run... would make things quadratic
<wingo>anyway i guess a statprof run would be good
<civodul>what comes pretty high is ice-9/psyntax.scm:735:10:search
<civodul>full profile: https://web.fdn.fr/~lcourtes/pastebin/computing-guix-derivation-profile.html
<civodul>this is the code that runs the infamous "Computing Guix derivation" phase in 'guix pull'
<civodul>it ends up interpreting a large subset of gnu/packages/*.scm
<wingo>civodul: https://wingolog.org/priv/guile-maybe-psyntax.patch ?
<wingo>er
<wingo>civodul: https://wingolog.org/priv/maybe-psyntax-patch.patch ?
<wingo>dunno, just an idea to remove the quadratic part
*wingo has a plan for how to improve guile performance by 2-3x for non-gc-bound code
<wingo>that would put us on par with chez
<wingo>as often faster as slower
<lloda>:-O
<morenonatural>sup, y'all
<morenonatural>I wondered if there was a performance gain of preferring call/cc instead of passing lists … I'm processing list after list, starting with `file-system-fold`
<civodul>wingo: that looks like nice improvement
<civodul>though the list-ref bit doesn't show up in the profile above
<rlb>wingo: you may have seen it, but in case not, this is the lokke build failure with main (which as mentioned might just be a lokke problem, but I think started when the inlining was enabled): https://paste.debian.net/hidden/ca3f7b22/
***roptat_ is now known as Guest7387
***daviwil_ is now known as daviwil`
***daviwil` is now known as daviwil
***theruran_ is now known as theruran
***duncanm_ is now known as duncanm
***distopico_ is now known as distopico
***Guest7387 is now known as roptat
***ecraven- is now known as ecraven
***Ekho- is now known as Ekho
***aweinsto1k is now known as aweinstock