IRC channel logs
2024-02-23.log
back to list of logs
<mwette>Anyone else get this? My own built 3.0.9 libguile-3.0.so on ubuntu has no scm_c_take_typed_bytevector. But if I rebuild with --disable-lto it does have it. To check: nm $prefix/libguile-3.0.so | grep take_typed_bytevector <old>Does LTO optimize symbol not used? <old>not sure what kind of optimization the linker do with LTO <mwette>it can do stuff like beta reduction between compilation units, IIRC <old>but public symbol like scm_c_take_typed_bytevector (I assume it has default visibility) should not be optimized away right? <mwette>I see now. it's internal. See SCM_INTERNAL explanation in scm.h <mwette>I'm going to have to use C->scheme->C route. <old>mwette: for me it is extern, so basically does nothing <chsasank>hey folks, is it possible for me to call guile functions from another language like python? <mwnaylor>The C module calls the guile interface, wrapped in function that exposes the interface to be called from Python. <mwnaylor>Hopefully your implementation will go as smoothly as the examples show. <mwnaylor>What are in guile do you expect to call from python? <mwnaylor>I am discussing this from the theoretical point of view, I haven't attempted what you are. You might get some good feedback on the channel #python. <rlb>civodul: regarding the discussion about apis like getenv/setenv that require the ability to handle arbitrary binary, do we already have some efficient way to change the locale to latin-1 for just the current dynamic extent (i.e. per-thread, etc.), or would that be the next notable thing we'd need if that's at least the medium term plan? <rlb>I looked around a bit, but didn't see one (may have missed it). <rlb>I ask in part because I just had to write some trivial C wrappers for some functions (including those) to support a tool I'm toying with. <rlb>(or I thought I did) <civodul>rlb: hi! at the C level, there’s scm_{to,from}_latin1_string <civodul>otherwise there’s the ‘%default-port-encoding’ fluid, but changing it is a bit more expensive <rlb>Hmm, so how do I get a unicode incompatible value back from (getenv "foo") thread-safely? <rlb>i.e. I need to set the locale to latin-1 briefly, but just for that thread/region, I think? <ArneBab>old: it might be interesting to focus the fuzzing on limit values — getting close to the boundaries of a range for example. <rlb>civodul: ...and if we don't have something like that yet, then seems like we might need to add it for this approach. <ArneBab>old: ⇒ the most valuable part of the fuzzing might be to navigate the search space efficiently. <civodul>rlb: i think scm_{to,from}_latin1_string is the way, but fluids are thread-safe too <rlb>I'm not sure I follow -- how do I use that to help with (getenv "foo") from an arbitrary scheme thread, if say the value of foo can't be encoded in the current locale? <rlb>i.e. I need to thread-safely change the locale to latin-1, make then call, then change back, I think? <rlb>(...and more broadly, of course unless you know "foo", there's no way to know whether or not you have to do that) <rlb>(but that's a separate higher-level concern) <rlb>civodul: put another way, I was trying to figure out how I can now, or how we might want to make it possible for me to safely call getenv. <civodul>if scm_getenv does “return scm_from_latin1_string” instead of “return scm_from_locale_string”, then it’ll do what you want, no? <rlb>I didn't realize we were entertaining the idea of not respecting the current locale as the default for the relevant syscalls. i.e. are you saying getenv would never return anything but latin-1? <rlb>ACTION may be substantially misunderstanding <old>ArneBab: my goal is to mimick a drunk user <old>Just doing the worst thing you could think of with the API <old>typically this will test invalid inputs, but also test the internal state of the library <mwette>old: This reminds me of a story my grad advisor told me. He was shown a new CAE tool someone had developed. At the command line he typed in gibberish hit the return key and the program crashed. <old>mwette: typically what I want to avoid ^^ <mwette>ACTION has demo of creating binary guile w/ embedded modules (from .go files): github dot com slash mwette slash guile-saapp <civodul>i’d love to have a way to embed bytecode in libguile, so you can have a statically-linked Guile that can do minimal stuff without accessing the file system <dthompson>would be nice to have that before native compilation is a thing <rlb>civodul: so just to double-check, were you suggesting that (getenv ...), etc. might eventually only produce results encoded via latin-1, and not the current locale? <civodul>no no, i’m just saying how this could be achieved :-) <rlb>civodul: with the current code? If so, that's what I was asking -- is there a thread-safe way to briefly change the locale for the current thread only? <civodul>no sorry, i guess i need to page that back in <rlb>If not, then I was thinking *that's* what we'd need for a latin-1 strategy. <civodul>what i had in mind in Brussels was to split %default-port-encoding into two fluids <civodul>or at least have a new fluid for the encoding of “OS strings” <rlb>OK, right -- I think we're talking about the same thing. To follow a latin-1 strategy, we'd have to have all the relevant functions respect a locale fluid. <civodul>i could imagine %default-file-name-encoding, which would default to locale encoding <civodul>now, extending that to getenv, getpw, etc. etc. is tricky <civodul>well, there could be one fluid for everything that goes through the name service switch <rlb>I suppose we could have a new "to/from" string function pair that respects another "override" fluid, and scatter those in all the right places. <rlb>i.e. scm_getenv would call scm_to/from_maybe_bytes() and then maybe_bytes would respect a fluid override or, whatever... <rlb>i.e. general idea, not specific details. <rlb>Then you could say (with-something-something (getenv "foo")) and get back latin-1 :) <civodul>maybe one fluid for file names, one for “NSS names”, one for “process-related things”? <civodul>or just a single fluid for “OS strings”? <rlb>Offhand, I'd think it should just be for any function that's returning or receiving values that are actually just bytes in the end. <civodul>yes, but maybe that’s too vague? how would you know what’s affected? <rlb>The other option *might* be to consider python-style byte-smuggling, but I'd *really* want to think about that harder first. <civodul>also, one might want to special-case file names <rlb>wrt know what's affected -- well, in the limiting case, you just need to know what the underlying call specifies, but you really need to know that anyway to write correct code, error-wise already? <rlb>i.e. getenv might just crash you right now, with no (thread-safe) recourse... <rlb>but of course, ideally, we'd also document in the info pages (and/or docstrings) which calls have arguments that might not actually be strings. <rlb>(underneath, and so might require using this latin-1 facility) <rlb>(if you need to handle arbitrary results) <rlb>i.e. if you're writing tar, or cp, or... <rlb>It's basically much of posix (at least as implemented in linux, *bsd, etc.), really? <rlb>i.e. paths, user names, group names, xattrs, etc. <rlb>they're all just null terminated bytes. <rlb>The python-style approach does have an advantage in that it's finer-grain, i.e. the locale-changing approach means all args have to be the same on this front when the function takes multiple args... <rlb>And neither approach is without cost with the utf-8 conversion because they're both multi-byte, i.e. non-ascii latin-1 won't be single-byte anymore. <rlb>Of course the other approach we discussed is to just allow #u8() or strings to all the relevant functions, which I suppose could be done via scattering a scm_to/from_os_string()ish in all the right places. <rlb>But there, you'd still have the issue of picking return value types. <rlb>Anyway -- just started wondering about it because I hit it again, and had to write more trivial C wrappers. <rlb>I can keep doing that, and it's fairly easy for me, but it's of course not ideal for people in general. <rlb>(Oh, and I suppose not high priority unless we can figure out what we want, in which case, I might hack on it.) <rlb>Higher might be the thread fix. <civodul>yes, i’ve had to work around it in Guix too (non-locale-encoded file names specifically), not great <civodul>it’s late for today but i should really schedule time for it <graywolf>Out of curiosity, do people still use 2.0 version of guile? <Arsen>ACTION has a dependency on 0.9 <graywolf>aaaah, so it seems that guile-2.0 actually requires the .scm extension while loading files. (load-from-path "foo") does *not* load "foo.scm". <graywolf>But based on documentation that sounds like a bug <graywolf>Well whatever, time to make some symlinks <civodul>ACTION would assume that 2.0 has practically disappeared <dthompson>the real question is: when we gettin 3.0.10? 😈 <graywolf>I hope not before the copy-on-write copy-file is merged