IRC channel logs
2025-02-26.log
back to list of logs
<rlb>...I guess for any not already familiar with the mess, I should have said in there "arbitrary bytes (bytes that may not be encodable in the process or thread's current locale). <mwette>rlb: I'm not sure what this implies about the set of string functions like string-join, string-ref etc. <rlb>mwette: not sure what you mean? <rlb>They'll work just as they always would with a latin-1 string. <rlb>In practice, for things like paths, we only care about specifically mentioning ascii chars, and the interesting data is ascii supersets, I hope? <rlb>i.e. "/" for fs and ":" for PATHs <rlb>Though I'd need to think through any implications for multibyte like utf-16 -- *if* we care about that. <rlb>But that won't matter for linux/*bsd, etc. <rlb>s/multibyte/non-byte-granularity/ I guess. <rlb>We can still easily round-trip that with this proposal, but '/' might not come through right without extra help. <rlb>...I guess those just don't work since they'd have embedded nulls -- so if we care about that, we might be back to just bytevectors. <rlb>My default inclination is still first-class bytevector support, but I'm not sure that's entirely popular, so I was trying to consider alternatives. <rlb>(and it *is* more work on the application side) <mwette>I think for string functions to work you need to know what in the string is a character. If it's undecodable what is a char? <rlb>doesn't matter - just take the original bytes and store them in a latin-1 string. Which works just fine for ascii supersets that don't include null. <mwette>s/in the string/in the byte sequence/ <rlb>Oh, and the current suggestion from civodul is to just use latin-1 across the board. <rlb>i.e. *you* have to know to set a fluid to latin-1 in your app for every syscall "that matters". <rlb>ACTION suspects that won't happen and apps will just crash on undecodable bytes <rlb>(hypothetical fluid -- iirc we don't have that yet) <rlb>Especially since the edge cases are likely rare, but critical for system tools (like say tar) to support correctly. <mwette>Is undecodable really latin-1 or does it include more? <rlb>latin-1 has the property that any 8-bit byte is a valid char valid <mwette>Can you string-append a utf-8 and a latin-1 ? <rlb>i.e. it can round-trip arbitrary bytes "cleanly" <rlb>So I suppose you could produce a mess... <mwette>That's what I meant. So string-append could work with ascii and utf-8 or latin-1 but utf-8 and latin-1 is an exception. <rlb>But whatever we do that's not bytevectors, you have to just know what the destination encoding is, etc. Maybe that just leaves me back and bytevectors should be first class, and apps that care have to use them instead, but that's also difficult. <mwette>I all seems complicated to me. I'm sorry the unix system API has not been cleaned up, if that is the case. <rlb>I suppose python's smuggling approach just keeps the regions separate. <rlb>It hasn't -- if you want to write tar/cp/etc. you *must* handle arbitrary bytes correctly. For paths the only special bytes on linux are ascii '/' and null. Anything else is valid. For user/group names, anything except null is. <rlb>So unless we're going to just say that guile isn't intended for apps like that, we have to "do something". <rlb>and it can't just be strings <rlb>(naively be strings) <rlb>python fixed it via a "unicode smuggling trick", but that requires broader commitment in the implementation. <rlb>Happy to do that for us if we want that, but I have the impression we don't. <rlb>The "just know what you're doing and switch to latin-1" approach has the difficulty that it's up to you to recall which strings are latin-1 so you can always process them that way, and you have to know to send them back to the os with latin-1 as the locale. But of course you don't want latin-1 globally unless you really are in a latin-1 locale. <rlb>But it has the implementation advantage of "just" requiring a new "platform data conversion" fluid or something. <rlb>ACTION shrugs -- very much want it fixed, but eventually have to know which way to jump. <rlb>You have me back to thinking it'd be better to just provide bytevector support in all the right places somehow, and maybe enhance the bytevector apis for some common cases e.g. bv-split/join (for say ascii bytes like '/', ':', etc.). <rlb>mwette: if we choose the "encoding fluid" approach, i.e. when set, it determines the locale for relevant arguments and return values for relevant functions, then I'd wondered if we could have a special value that requests bytevectors instead of string conversions. Then you can ask for the values to be handled as say utf-8, or ascii, or uninterpreted bytevectors. <wingo>rlb: i am getting the weirdest errors building a version of guile in guix, the "check" phase ends up dying with: <wingo>make[5]: *** No rule to make target 'tests/version.test', needed by 'tests/version.log'. Stop. <wingo>make[5]: *** Waiting for unfinished jobs.... <wingo>like, it runs a bunch of other tests. but not version.test. <wingo>does that say anything to you? <wingo>fwiw with help from #guix i realized that i was building a newer guile with an out-of-date guix, which used to actually remove `version.test` <wingo>that was a funny way of disabling the test but it didn't work with the parallel tests setup (even when run serially) <civodul>old: something looks wrong with the āenvironā warning: āecho '(environ)' | guile >/dev/nullā triggers it <civodul>weāre not supposed to have multiple threads at that point, except for the finalizer thread <civodul>itās always unclear to me what āscm_all_threadsā returns <old>yess it is weird. I've had this issue back then <old>Have you seen the proposed change I made? Only emit a warning when mutating the environ in a multi-thread environment <old>guile -c '(environ)' >/dev/null <old>so something is spawning a thread somewhere but not all the time <old>echo '(pk (length ((@ (ice-9 threads) all-threads))))' | guile <old>guile -c '(pk (length ((@ (ice-9 threads) all-threads))))' <civodul>i think those multi-thread warnings were always wrong because of āall-threadsā including the finalizer thread <old>I think that you could have both the patch to fix this <old>see the rationale of my patch above <old>what's the finalizer thread by the way? <civodul>ah yes, this patch makes perfect sense to me <civodul>the finalizer thread is the thread that runs finalizers :-) <civodul>if you have a guardian or a SMOB finalizer, it runs in a separate thread <civodul>or a pointer finalizer from (system foreign) <civodul>re āall-threadsā, i assume this is a fine change, though thereās always the possibility that code out thread expected the current behavior (e.g., by subtracting 1 to its length) <old>instead of changing all-threads and risking breaking some code <old>whenever a user create a thread, store atomically true into a bool <old>or maybe just a counter <old>that would also speedup things instead of applying length onto the list of threads <old>I'll reply this on the patch <civodul>although itās also problematic that āall-threadsā returns internal threads <old>I mean, it is too late for that I suppose <old>I don't even have a use case for calling all-threads to be honest <old>If I manage threads, I keep my own list <cow_2001>also add an --expose flag for the guile-bytevector-peg directory <cow_2001>š« libgit2 sha256 support is still experimental in v1.9.0 <cow_2001>and i guess guix would not use anything experimental, so no guile-git sha256 support in guix any time soon :| <rlb>wingo: yeah, that did ring a bell -- I hit some issues like that with parallel builds a while ago and fixed those. Related, I think: 1c96e4ab6dde18c69f1493a8e1560e80a347cd21 and *maybe* 6bd70136d96e73542e6725bc490199e17f56ee92 <civodul>cow_2001: great that you investigated this anyway! itāll prove useful <cow_2001>i am now trying to build the libgit2-1.9.0 package i am drafting in my channel with -DEXPERIMENTAL_SHA256=ON and failing <lechner>Hi, is there a shorter way to write (with-output-to-string (lambda _ (write exp))) but without using 'format'? <cow_2001>lechner: (define-syntax write-as-string (syntax-rules () ((_ x) (with-output-to-string (lambda _ (write x)))))) now you have (write-as-string exp) <cow_2001>lechner: seriously, that is how i would do it š® <cow_2001>or just write a procedure that does it, maybe? <taylan>procedure would work well indeed, no need for macro. don't think there's a built-in thing. I could swear I've seen `->string' somewhere but I may have dreamt it. <lechner>cow_2001 / taylan / thanks! it's all good. just checking because I'm such a newbie <lechner>Hi, this example allows only 'key' or '(key alias)' in a syntax pattern. What is the general strategy to allow both, please? https://bpa.st/AOYQ <taylan>lechner: I get an HTTP 502 on that URL <taylan>lechner: I'm not sure I understand the question. what else should it accept? <lechner>taylan / right now it takes (key1 key2 key3) or ((key1 alias1) (key2 alias2) (key3 alias3)) but no mix like (key1 key2 (key3 alias3)) <taylan>ah I see. hmm yeah, syntax-rules is annoying with stuff like that. <taylan>I would probably try to write it in a recursive fashion, so every time it handles one element, which may be either, and then calls itself to handle the rest of the bindings, but then you probably want a wrapper to ensure alist-expr is only evaluated once.