IRC channel logs


back to list of logs

<mwette>Anyone else get this? My own built 3.0.9 on ubuntu has no scm_c_take_typed_bytevector. But if I rebuild with --disable-lto it does have it. To check: nm $prefix/ | grep take_typed_bytevector
<old>Does LTO optimize symbol not used?
<old>not sure what kind of optimization the linker do with LTO
<mwette>it can do stuff like beta reduction between compilation units, IIRC
<old>but public symbol like scm_c_take_typed_bytevector (I assume it has default visibility) should not be optimized away right?
<mwette>I see now. it's internal. See SCM_INTERNAL explanation in scm.h
<mwette>I'm going to have to use C->scheme->C route.
<old>mwette: for me it is extern, so basically does nothing
<chsasank>hey folks, is it possible for me to call guile functions from another language like python?
<mwnaylor>I don't know about directly, but a C module can written as a bridge.
<mwnaylor>The C module calls the guile interface, wrapped in function that exposes the interface to be called from Python.
<chsasank>is this a good example of this?
<mwnaylor>Yes, that would be the first half.
<mwnaylor>Something like this would be the next half.
<mwnaylor>Hopefully your implementation will go as smoothly as the examples show.
<mwnaylor>What are in guile do you expect to call from python?
<mwnaylor>I am discussing this from the theoretical point of view, I haven't attempted what you are. You might get some good feedback on the channel #python.
<rlb>civodul: regarding the discussion about apis like getenv/setenv that require the ability to handle arbitrary binary, do we already have some efficient way to change the locale to latin-1 for just the current dynamic extent (i.e. per-thread, etc.), or would that be the next notable thing we'd need if that's at least the medium term plan?
<rlb>I looked around a bit, but didn't see one (may have missed it).
<rlb>I ask in part because I just had to write some trivial C wrappers for some functions (including those) to support a tool I'm toying with.
<rlb>(or I thought I did)
<civodul>rlb: hi! at the C level, there’s scm_{to,from}_latin1_string
<civodul>otherwise there’s the ‘%default-port-encoding’ fluid, but changing it is a bit more expensive
<rlb>Hmm, so how do I get a unicode incompatible value back from (getenv "foo") thread-safely?
<rlb>i.e. I need to set the locale to latin-1 briefly, but just for that thread/region, I think?
<ArneBab>old: it might be interesting to focus the fuzzing on limit values — getting close to the boundaries of a range for example.
<rlb>civodul: ...and if we don't have something like that yet, then seems like we might need to add it for this approach.
<ArneBab>old: ⇒ the most valuable part of the fuzzing might be to navigate the search space efficiently.
<civodul>rlb: i think scm_{to,from}_latin1_string is the way, but fluids are thread-safe too
<rlb>I'm not sure I follow -- how do I use that to help with (getenv "foo") from an arbitrary scheme thread, if say the value of foo can't be encoded in the current locale?
<rlb>i.e. I need to thread-safely change the locale to latin-1, make then call, then change back, I think?
<rlb>(...and more broadly, of course unless you know "foo", there's no way to know whether or not you have to do that)
<rlb>(but that's a separate higher-level concern)
<rlb>civodul: put another way, I was trying to figure out how I can now, or how we might want to make it possible for me to safely call getenv.
<civodul>if scm_getenv does “return scm_from_latin1_string” instead of “return scm_from_locale_string”, then it’ll do what you want, no?
<rlb>I didn't realize we were entertaining the idea of not respecting the current locale as the default for the relevant syscalls. i.e. are you saying getenv would never return anything but latin-1?
<rlb>ACTION may be substantially misunderstanding
<old>ArneBab: my goal is to mimick a drunk user
<old>Just doing the worst thing you could think of with the API
<old>typically this will test invalid inputs, but also test the internal state of the library
<mwette>old: This reminds me of a story my grad advisor told me. He was shown a new CAE tool someone had developed. At the command line he typed in gibberish hit the return key and the program crashed.
<old>mwette: typically what I want to avoid ^^
<mwette>ACTION has demo of creating binary guile w/ embedded modules (from .go files): github dot com slash mwette slash guile-saapp
<ieure>Why obfuscate the URL?
<ieure>What a strange habit.
<civodul>mwette: nice!
<civodul>i’d love to have a way to embed bytecode in libguile, so you can have a statically-linked Guile that can do minimal stuff without accessing the file system
<civodul>(such as in an initrd)
<dthompson>civodul: that would be cool
<dthompson>would be nice to have that before native compilation is a thing
<ArneBab>mwette: very cool!
<rlb>civodul: so just to double-check, were you suggesting that (getenv ...), etc. might eventually only produce results encoded via latin-1, and not the current locale?
<civodul>no no, i’m just saying how this could be achieved :-)
<civodul>but hmm
<rlb>civodul: with the current code? If so, that's what I was asking -- is there a thread-safe way to briefly change the locale for the current thread only?
<civodul>no sorry, i guess i need to page that back in
<rlb>If not, then I was thinking *that's* what we'd need for a latin-1 strategy.
<civodul>what i had in mind in Brussels was to split %default-port-encoding into two fluids
<civodul>or at least have a new fluid for the encoding of “OS strings”
<rlb>OK, right -- I think we're talking about the same thing. To follow a latin-1 strategy, we'd have to have all the relevant functions respect a locale fluid.
<rlb>or similar
<civodul>i could imagine %default-file-name-encoding, which would default to locale encoding
<civodul>now, extending that to getenv, getpw, etc. etc. is tricky
<civodul>well, there could be one fluid for everything that goes through the name service switch
<rlb>I suppose we could have a new "to/from" string function pair that respects another "override" fluid, and scatter those in all the right places.
<rlb>i.e. scm_getenv would call scm_to/from_maybe_bytes() and then maybe_bytes would respect a fluid override or, whatever...
<rlb>i.e. general idea, not specific details.
<rlb>Then you could say (with-something-something (getenv "foo")) and get back latin-1 :)
<civodul>maybe one fluid for file names, one for “NSS names”, one for “process-related things”?
<civodul>or just a single fluid for “OS strings”?
<civodul>this is tricky
<rlb>Offhand, I'd think it should just be for any function that's returning or receiving values that are actually just bytes in the end.
<civodul>yes, but maybe that’s too vague? how would you know what’s affected?
<rlb>The other option *might* be to consider python-style byte-smuggling, but I'd *really* want to think about that harder first.
<civodul>also, one might want to special-case file names
<civodul>right :-)
<rlb>wrt know what's affected -- well, in the limiting case, you just need to know what the underlying call specifies, but you really need to know that anyway to write correct code, error-wise already?
<rlb>i.e. getenv might just crash you right now, with no (thread-safe) recourse...
<rlb>but of course, ideally, we'd also document in the info pages (and/or docstrings) which calls have arguments that might not actually be strings.
<rlb>(underneath, and so might require using this latin-1 facility)
<rlb>(if you need to handle arbitrary results)
<rlb>i.e. if you're writing tar, or cp, or...
<civodul>ACTION nods
<rlb>It's basically much of posix (at least as implemented in linux, *bsd, etc.), really?
<rlb>i.e. paths, user names, group names, xattrs, etc.
<rlb>they're all just null terminated bytes.
<civodul>yes, the OS interface in general
<rlb>The python-style approach does have an advantage in that it's finer-grain, i.e. the locale-changing approach means all args have to be the same on this front when the function takes multiple args...
<rlb>And neither approach is without cost with the utf-8 conversion because they're both multi-byte, i.e. non-ascii latin-1 won't be single-byte anymore.
<rlb>Of course the other approach we discussed is to just allow #u8() or strings to all the relevant functions, which I suppose could be done via scattering a scm_to/from_os_string()ish in all the right places.
<rlb>But there, you'd still have the issue of picking return value types.
<rlb>Anyway -- just started wondering about it because I hit it again, and had to write more trivial C wrappers.
<rlb>I can keep doing that, and it's fairly easy for me, but it's of course not ideal for people in general.
<rlb>(Oh, and I suppose not high priority unless we can figure out what we want, in which case, I might hack on it.)
<rlb>Higher might be the thread fix.
<civodul>yes, i’ve had to work around it in Guix too (non-locale-encoded file names specifically), not great
<civodul>oh yes, the thread fix!
<civodul>you had a patch for ‘join-thread’?
<rlb>Yep - can try it, and the parallel tests there too if you like. The proposed deadlock fix is the last two commits there (the parallel test changes are before that.)
<civodul>it’s late for today but i should really schedule time for it
<rlb>sounds good
<graywolf>Out of curiosity, do people still use 2.0 version of guile?
<Arsen>ACTION has a dependency on 0.9
<Arsen>I think
<graywolf>aaaah, so it seems that guile-2.0 actually requires the .scm extension while loading files. (load-from-path "foo") does *not* load "foo.scm".
<graywolf>But based on documentation that sounds like a bug
<graywolf>Well whatever, time to make some symlinks
<mwette>redhat 8 provides guile 2.0
<civodul>ACTION would assume that 2.0 has practically disappeared
<dthompson>the real question is: when we gettin 3.0.10? 😈
<graywolf>I hope not before the copy-on-write copy-file is merged