IRC channel logs

2021-03-14.log

back to list of logs

<spk121>does anyone know a way to test if the JIT is working correctly? I'm trying to figure out if my MinGW JIT hack is running right
<civodul>spk121: you can set GUILE_JIT_THRESHOLD=0 to force jitting everything
<spk121>civodul: so if I to that and run ./check-guile and it passes, does that mean that JIT is working?
<civodul>yup, i think so!
*civodul -> zZz
<civodul>happy hacking! :-)
<spk121>huh. I did not expect to be able to get that working
<civodul>well done
<rlb>Is the 'h' in scm_struct_init documented anywhere? (Maybe I missed it.)
***apteryx_ is now known as apteryx
<fnstudio>hi, suppose i want to glue a few strings together into a url, eg "<schema>://<host>:<port>/<path>/"
<fnstudio>is there a guile-y way of doing this?
<fnstudio>i'd like to avoid string-append since a few bits are sort of scaffolding that'd make the use of string-append very verbose
<rekado>fnstudio: (web uri) has build-uri to build URIs
<fnstudio>rekado: oh great, i'll look at that then, thanks!
<fnstudio>hi, i'm struggling with a http-post request: https://paste.debian.net/1189371/
<fnstudio>there's a header that i'm supposed to add to the request that doesn't seem to be recognised/accepted
<fnstudio>and the request generates a "Bad value for header" error
<mdevos>fnstudio: maybe replace the symbol test with the string "test"?
<mdevos>I dunno
<fnstudio>mdevos: hey thanks, i'd actually tried that but it didn't help, but thanks for suggesting it
<mdevos>I found something relevant, let me post a link ...
<mdevos> https://git.savannah.gnu.org/cgit/guix.git/tree/guix/scripts/publish.scm#n665
<fnstudio>let me see
<mdevos>--> declare-header!
<fnstudio>mdevos: hm true, it must be that
<fnstudio>although i've just tried with "declare-opaque-header!" and that didn't work, i'm going to try with "declare-header!" but that means i need to dig a little bit deeper and get a grasp of what the parser/validator/writer should be
<fnstudio>thanks a lot, that's definitely put me on a good track
<iv-so>what is current state of elisp in guile?
<fnstudio>mdevos: this seems to work now https://paste.debian.net/1189380/
<fnstudio>i can see the request to be generated and sent to the server
<fnstudio>so i'm now past that error
<fnstudio>although i got into another one immediately downstream of that
<fnstudio>the server doesn't seem to like the request and the connection is closed straightaway
<fnstudio>with a 401
<fnstudio>it works with curl
<fnstudio>this is no longer about guile though, it might be that i'm missing a header or something
<ArneBab>fnstudio: can you watch the network with wireshark?
<fnstudio>ArneBab: yes, that's what i did, i was able to compare the two requests, curl vs guile
<fnstudio>that gave me a hint
<fnstudio>as the curl one contained a authorization header
<fnstudio>i thought that guile would transform the userinfo argument into a auth header (by calculating the hash of username + password)
<fnstudio>(or whatever... i might have misphrased the above, but you get what i mean)
<fnstudio>but i was wrong
<fnstudio>and i can actually read: "But since passwords do not belong in URIs, the RFC does not want to condone this practice"
<fnstudio> (https://www.gnu.org/software/guile/manual/html_node/URIs.html)
<fnstudio>so the userinfo is actually just the username
<fnstudio>so i added the auth header "manually" and that... did the trick!
<fnstudio>so i now have a working request, filed under small wins :)
<fnstudio>i actually copied the authorization header over from the curl tcpdump
<fnstudio>as a test
<fnstudio>so i now need to actually build it programmatically in guile
<wingo>good evening
<wingo>spk121: nice work getting jit enabled on mingw!!!!
<wingo>works with GUILE_JIT_THRESHOLD=0 then ?
<ArneBab>wingo: do you know offhand how the Lilypond-folks could switch between optimization levels? On the lilypond list there are now benchmarks of Guile 1.8 vs. 2.2 vs. 3.0.6 that do not look all bad — 3.0.6 is still slower for them than 1.8, but less so than 2.2
<wingo>ArneBab: these benchmarks, i guess they are about time to load a file from source and then read and local-eval some expressions, is that right?
<ArneBab> https://lists.gnu.org/archive/html/lilypond-devel/2021-03/msg00054.html
<wingo>i assume that compile time is not part of the benchmarks
<ArneBab> https://lists.gnu.org/archive/html/lilypond-devel/2021-03/msg00049.html
<ArneBab>I’m not sure — it could well be, because Lilypond has lots of code embedded in ly-files.
<wingo>right. so what is happening here is a few things at once. it would not appear that the change to the reader is significant. the biggest difference between 1.8 and 2.x/3.x is that we run psyntax eagerly on the input, then eval, instead of just starting eval directly. and in 1.8 eval is in C, and 2.x/3.x it is in scheme
<mdevos>wingo: have you received my mail on a (fixed) wait-until-port-readable/writable patch for guile-fibers? <https://lists.gnu.org/archive/html/guile-user/2021-03/msg00035.html> (I don't need a response to it yet, just checking :-) as some people have inboxes with >4K unread mails).
<ArneBab>But 3.x is much faster than 2.2 again
<ArneBab>It’s just not yet at the level of 1.8 again
<wingo>depending on whether eval is called from c or scheme will affect lilypond's experience; calls from scheme faster than calls from c
<rlb>Hmm, I'm likely misunderstanding something, but i18n.c's SCM_STRING_TO_U32_BUF uses scm_i_string_wide_chars which strings.h says doesn't guarantee null termination, but it then calls u32_strcoll, which requires null termination?
<rlb>s/requires/depends on/
<wingo>ArneBab: anyway i guess good that 3.0 on the same order as 1.8, and good to know reader isn't big issue. might make sense to think about time-to-eval in future but no significant perf regressions in 3.0.6 afaiu for lilypond
<wingo>though i guess they don't test 3.0.5, so i can also assume that our general direction towards better perf will be fine for them, given that they don't track it on the day-to-day
<ArneBab>it’s still not back at the performance of 1.8, but I think we’re getting closer to enabling them to switch.
<wingo>hard to beat the latency of a pure-C interpreter with lazy macros
<wingo>lazy macros. what a bonkers thing
<ArneBab>3.0.5.116-85433 "eating a little more memory than with guile-1.8.8 but far less than guile-2.2.6"
<fnstudio>(any obvious module for base64-encoding/decoding?)
<wingo>fnstudio: there is one in guile-lib
<ArneBab>fnstudio: I only have some base32 encoding
<wingo>~/src/guile-1.8$ ./pre-inst-guile
<wingo>guile> ((if 42 if list) #f 10 'hey)
<wingo>hey
*wingo shakes head
<fnstudio>wingo: excellent, thanks, looking that up in guile-lib now
<civodul>wingo: wat?!
<wingo>civodul: :)
<mdevos>wingo: likewise?!
<wingo>the good old days weren't so good!
<civodul>what if, in fact, this captured the essence of programming?...
<civodul>fnstudio: guile-gcrypt has base32 and base64
<ArneBab>civodul: I didn’t know …
<fnstudio>civodul: super, i'll give that a try immediately, thanks
<wingo>incidentally john shutt of the kernel language died recently. i always thought fexprs were bonkers but enjoyed reading him in the golden days of lambda-the-ultimate
<civodul>oh
<wingo>yeah. fellow traveller of weird languages; things are more boring without him
<wingo>mdevos: no attachment on that mail?
<rlb>wrt i18n.c and u32_strcoll, fundamentally I'm wondering if we might have a potential buffer overrun there.
<wingo>rlb: humm i think we probably do
<wingo>i agree with your reasoning
<rlb>OK, and I don't see any way to fix it (trivially) with libunistring -- don't think they have non-null-terminated versions.
<wingo>really!
<wingo>that's weird
<wingo>anyway, ok.
<wingo>i guess we mangle SCM_STRING_TO_U32_BUF then
<wingo>are you on it? :)
<rlb>(As an aside, I ended up fully converting strings.c to utf-8 -- in a very blunt manner (would need clean up), and saw that when I started working on the other code that calls into strings.h.)
<wingo>!!!
<rlb>I'm not, but I could be -- obviously a very expensive approach would be to just duplicate the wide strings and null terminate them.
<wingo>yeah but it's reasonable imo
<rlb>Another would be to put a secret null at the end of all wide strings? Could we get away with that?
<wingo>i.e. it fixes the problem. for latin1 strings it's already like that
<rlb>i.e. just stop not null-terminating strings.
<wingo>can't put a secret null in, doesn't work for shared substrings
<rlb>ooooooooooooooooooooooooooh
<rlb>:)
<rlb>of course -- I should know better.
<rlb>been staring at that code off and on for days, of course.
<wingo>:)
<wingo>how do you see the latin1 / utf-32 / utf-8 tradeoffs?
<rlb>Oh, and not sure if this is OK either (and could be changed), but I got rid of SH_STRINGs, i.e. strings just point into their buffers, and always know their fully computed offset.
<wingo>i.e. is utf-8 a clear winner for you?
<rlb>Well, I think it's a good bit easier to deal with in some ways, since I changed it to have ascii and non-ascii strings, the former are fixed-with, but *all* bytes are utf8.
<rlb>i.e. we can pass the string bytes from either string to the u8_... functions.
<rlb>The biggest cost I've seen so far is (unsurprisingly) any use of set_x
<rlb>or ref
<rlb>say in a loop.
<wingo>yeah
<rlb>Code like that will need to change (where possible) to use traversals/transformers/folders of various sorts.
<rlb>and sorry, didn't get rid of shared strings, just changed them to hold a pointer to the original *string* not the string's buffer since non-ascii strings now have byte-start and byte-length memebers.
<fnstudio>(thanks civodul, gcrypt did it!)
<rlb>That's a trade off I made for perf in the common cases, i.e. non-ascii strings keep the computed byte offsets too.
<rlb>(into the buffers)
<rlb>And of course, imagine it's all buggy -- can't run any of it yet, and need to review for remember_upto_here use (once I remember the rules), etc.
<wingo>sounds tricky from a threadsafety POV wrt string-set!
<rlb>yeah, I'm not even sure I understand what our expectations are there yet, and/or what we promised before.
<rlb>unless it's an ascii string and an ascii char, then you have to just copy/rewrite it.
<wingo>races are possible, crashes are not, is the basic thing. means that if you need to change the "shape" of a shared mutable string, you need to be able to allocate a new one and atomically swap it in; from that POV, any byte offset would be invalidated
<rlb>wingo: I think maybe any of the libunistring functions with "str" in the name require null termination, so this could find other places we might want to review: git grep -E 'u[138][^_]+_.*str'
<wingo>logic is in scm_i_string_ensure_mutable_x
<rlb>Hmm, I'll have to think about what that might mean for sharing -- i.e. we can't update all that atomically the way I have it when there's sharing? Not sure. I'll think about it.
<wingo>yeah something to mull over
<rlb>Maybe we can't do that without an immutable "shared offset" table or something?
<rlb>that we can swap in atomically, i.e. one pointer would have to cover both the stringbuf and all the shared offsets
<rlb>or we'll have to "do something else"
<wingo>generally speaking character offsets can persist; byte offsets are tricky
<rlb>shared strings and multibyte...
<wingo>for utf8
<rlb>right I meant the byte offsets
<wingo>so if you can detect that the underlying object changed, you can invalidate a byte offset
<rlb>but if we don't cache those, it's going to be much uglier perf-wise.
<rlb>hmm, yeah, maybe that's better -- come up with a lazy approach
<wingo>and if you need to make an atomic change that might alter the byte size of a character, either you realloc or serialize through a mutex
<rlb>but may still have some tricky data races if we're not careful...
<rlb>mutex would make it much easier (of course)
<wingo>but tricky without string buffers as an indirection
<wingo>nb, i am not arguing for string buffers
<civodul>fnstudio: actually weinholt is the original author of the base64 module and the one to thank :-)
<rlb>oh, I kept string buffers.
<wingo>just pointing out an aspect of the current arrangement
<wingo>rlb: mutation-sharing substrings are not an important perf case -- imo anyway
<wingo>so simple and correct >>> complicated and faster
<wingo>in that regard
<rlb>suppose we could just use more of a sledgehammer for the first version then, and optimize later if it turns out to be worth it.
<rlb>i.e. think that's roughly what you said.
<wingo>sure. as long as we can avoid a mutex for most instances of string-set!
<fnstudio>right, thank you civodul for the tip and weinholt for a library that's just been extremely useful (gcrypt)
<wingo>or maybe that's not even a requirement, dunno
<rlb>(...looks like most of the cases that might involve null termination questions are in vasnprintf.c and i18n.c)
<rlb>well in any case, ascii strings don't have that problem, so that's another chunk of the time we won't be affected either way.
<wingo>well, goldurnit, should probably fix that before 3.0.6
<wingo>but is my only current blocker in that regard fwiw
<wingo>the null-termination issue i mea
<wingo>n
<rlb>well, it's been that way for a *long* time.
<rlb>I'd guess.
<wingo>yeah sure but now that i know it, i hate it :P
<rlb>Heh - I might be able to help, but depends on the timetable - I might or might not have time in the much shorter term.
<rlb>Anyway, and wrt the utf8 stuff it's all so far a *hack* - I just clobbered code freely (deleting, rearranging, reformatting), in ways that might be pretty eyebrow-raising for an upstream submission, so aside from actually getting it working, I might also have to do a *lot* of clean up, deprecations, properly documented removals, etc.
<rlb>(So still might not go anywhere...)
<wingo>yeah regarding utf8, is definitely in the 3.2 category
<wingo>not a 3.0 thing
<rlb>Oh, *no doubt* :)
<wingo>:)
<rlb>btw, I'd started using utf8_t* for the bytes (as does libunistring), but of course a lot of apis have char* -- in terms of the public strings.h api, would we stick with char* and coercions, or...?
<wingo>good question :P
<wingo>publicly, no change IMO
<wingo>there is very little good that would come from change to the string C API :P
<wingo>(initial reaction, obvs)
<wingo>internally -- hoo dunno. initial reaction is uint8_t* is sufficient, not char*, but not sure about utf8_t*; though note that FOO_t is reserved by the c standard
<wingo>so would hesitate to define one locally
*wingo zzz
<rlb>we already get one via libunistring.