IRC channel logs

2026-04-02.log

back to list of logs

<loquatdev>Hello, everyone. I have a quick question. I have a block of memory in C that holds a list of structs. What is the best way to pass this pointer to a Scheme function and write values to the structs from the Scheme side?
<loquatdev>I know I can use scm_from_pointer on my pointer in C then pass that to an Scheme function. I'm not really sure where to go from there.
<ekaitz>loquatdev: you can declare the types
<loquatdev>As in, using make-c-struct?
<ekaitz>from guile, once they are declared everything is simpler, if I remember properly
<ekaitz>yes
<loquatdev>Doesn't that allocate a new bit of memory for the values each time it's used?
<loquatdev>I want to write the values directly into the memory block in order if possible.
<ekaitz>let me open the manual
<ekaitz>(it doesn't load aaaaaa)
<loquatdev>I've been reading through the "Foreign Structs" section and it says this:
<loquatdev>Create a foreign pointer to a C struct containing VALS with types ‘types’.
<loquatdev>Which to me implies that it's allocating new memory.
<ekaitz>yes, in the guile side
<loquatdev>I already have a block of memory allocated on the C side that I want Scheme to write to directly.
<ekaitz>you can also use pointer->bytevector but doing that you wouldn't know the fields
<lloda>memory for the pointer, not for the contents, i presume
<loquatdev>Yeah, I was thinking I could use pointer->bytevector but then I'd have to calculate the padding manually, right?
<loquatdev>I suppose I could just make my struct use the same type for every value. That would certainly simplify things.
<ekaitz>yes
<ekaitz>that's why you would use read-c-struct or so
<ekaitz>parse*
<ekaitz>loquatdev: you could also write all that functionality in C and call it from guile
<loquatdev>I suppose could. I'm mainly trying to do it on the Guile side because I'm "compiling" a nested list of records to a flat array that I need to use from the C side. While I could do that in C that seems like it would be a headache. Also, I need to improve my Scheme skills :)
<loquatdev>Thanks for the help! I'll get to work.
<ekaitz>loquatdev: are you ok with using dependencies? there are some libs that help with this I think
<ekaitz>loquatdev: https://dthompson.us/projects/guile-bstructs.html
<loquatdev>I'll make a note of it. Thank you.
<ekaitz>np
<JohnCowan>How are the platform padding rules determined?
<mwette>loquatdev: the ffi-helper in nyacc can help (with proper alignment and padding): https://github.com/mwette/nyacc/wiki/FFI-Helper-User-Manual
<mwette>I'm not sure bstructs can do arrays of structs with the right alignemnt. IIRC, it will give 9 for sizeof(struct { double d; char c; }) whereas nyacc gives 16.
<old>that's packed representation
<mwette>nyacc will give 9 for __packed__ struct { double d; char c; };
<mwette>But if you want to make arrays and it's not packed on the C side then you want 16.
<old>this is the way
<JohnCowan>It's compiler and arch dependent. Ftypes makes you figure out the padding yourself, so I was interested in how bstructs figures it out.
<JohnCowan>dthompson: ^^
<old>AFAIK, it's not compiler dependant
<old>otherwise, C could not be used as universal ABI
<old>It's not even arch dependant, except for the size of the native types
<old>and endianess ofc
<mwette>The layout is arch dependent. But not compiler dependent. On a 68k the above struct would have size 10, because alignment for double is 2.
<old>68k align on 2 double ? wtf
<old>what's the size of double on that arch
<JohnCowan>It has to bw arch dependent, becauae some archs will fault if doubles are not alignwd on 8, whereas others are okay with it.
<old>right
<loquatdev>How do I pass uvec_type from C? It's unclear to me how I can get the value as an SCM from C. I'm trying to call scm_pointer_to_bytevector.
<old>uvec_type ?
<loquatdev>It's the name of a parameter for that function. I'm not sure what type it's supposed to be.
<jcowan>WP says: "A long long (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with the -malign-double compile time option)" on x86 archs, so it depends on more than the arch.
<old>That's the OS convention
<old>an heresy if you ask me
<loquatdev>In regards to my question, apparently it's just a symbol.
<jcowan>This is even more disturbing: "A long double (ten bytes with C++Builder and DMC, eight bytes with Visual C++, twelve bytes with GCC) will be 8-byte aligned with C++Builder, 2-byte aligned with DMC, 8-byte aligned with Visual C++, and 4-byte aligned with GCC:
<loquatdev>Yikes...
<old>ya better not use long double in an ABI
<old>probably better to stick with common types
<old>it's probably due that this type needs to be emulated on some arch and so compiler have their way of doing so
<old>this is why I tend to write C API with fixed width type. you are pretty sure what this will yield
<old>althought, if you aim to port to 32-bit, that can be a problem sometime
<jcowan>Apparently all that is only true for 32 bits. In 64 bits it's all 8-byte alignment except for long double, which is 16-byte aligned (except on MSVC where it is the same as double)
<dthompson>jcowan: bstructs uses guile's sizeof/alignof which does the appropriate thing
<jcowan>Ah, thanks. Is that figured out by ./configure?
<dthompson>when compiling for a target that is the same as the host, sizeof/alignof are eval'd at expansion time so runtime is fast. cross-compilation takes the slow path, as a result.
<dthompson>jcowan: using sizeof as an example, that's implemented in C, scm_sizeof. so it just uses C's sizeof.
<jcowan>I would think alignof would use C's _Alignof
<mwette>_Alignof is from C11. Not sure Guile has been updated to that yet.
<mwette>and apparently deprecated in C23 (to alignof)
<dthompson>the answer for guile is "it's complicated"
<dthompson> https://codeberg.org/guile/guile/src/branch/main/lib/alignof.h
<old>GCC had __alignof__ for ever AFAIK
<old>ahead of its time as always
<dthompson>but it seems that __alignof__ is what's used in nearly all cases
<old>Let's not forget that if your object is put in a ELF section by the linker, it's alignment can change
<jcowan>Ugh, really?
<old>e.g., if you are putting an array of structure in a .section which you want to iterate over, the most portable way of doing so is to have a shawdow section with pointer to the structure in it, since the pointer will always be natively aligned by the linker, but not the struct
<old>so you have: .my-data and .my-data-pointer
<jcowan>Anyway, adjusting to different spellings of "align of" shouldn't be that hard.
<old>you iterate only over .my-data-pointer directly
<old> https://lkml.iu.edu/hypermail/linux/kernel/0706.2/2552.html
<old>been there, what a mess to debug when you don't know what's happening here
<mooseball>hey, are there any haunt cats around? i started building a basic site with it, using data exported from a blogspot site. i'm having trouble getting haunt to parse the very basic HTML that the posts contain. i get parse-errors on html entities, and also some i can't even figure out like Throw to key `parser-error' with args `(#<input: posts/filename.html 11> "XMLNS [4] for '" #\space "'")'.
<mooseball>is there a way i can make my HTML more acceptable?
<dthompson>mooseball: I think this is something I just need to fix in haunt. it should use htmlprag to parse html, not sxml
<dthompson>short-term workaround is to just use your own html reader that uses the (htmlprag) module from guile-lib
<dthompson>this is like a 10+ year old mistake because no one really uses plain html with haunt so it goes unnoticed
<mooseball>hehe, sorry to be that person
<mooseball>yeah i figured it's unusual, but html was the best i could do with this blog export data.
<dthompson>mooseball: here's some code you should just be able to use as-is https://codeberg.org/spritely/spritely.institute/src/branch/main/html-reader.scm
<dthompson>I need to upstream it as the new html reader
<mooseball>ah great, yeah i had seen something else by another haunt user, was looking for it now but lost the page.
<mooseball>thanks, i'll give it a try with my ugly html
<dthompson>happy hacking
<mooseball>thanks a lot for responding
<old>Hey yall: Guile BoF next Monday April 6th at 13:00 UTC — come discuss Guile and priorities for the next release! https://meet.jit.si/guile-bof
<civodul>wo0t! great initiative, old!
<jcowan>I was researching what inexact numbers are in various Schemes, and as a byproduct I realized that there is now only one 32-bit arch that is still alive, namely armv7
<jcowan>there are some 32-bit microcontrollers that are still productized, but none of them have float support
<old>civodul: :)
<mooseball>i got something working with the spritely htmlprag reader, but it also seems unable to handle enties like &nbsp;. it renders each of them as "<*ENTITY*>additionalnbsp". any idea of how i might deal with that? the text contains poetry that uses such spaces for occasional indentation
<old>jcowan: armv7 is dead, nobody use that anymore or develop for it in any meaningful AFAIK
<old>I'm sure someone do, but the world is moving toward 64-bit ARM. The mess that 32-bit arm is ..
<rlb>interesting (linux oriented) summary from a bit back: https://lwn.net/Articles/1045363/
<old>I thought Linux dropped alpha support
<ieure>old, Sounds like new maintainer. https://www.phoronix.com/news/Linux-DEC-Alpha-2025-Maintainer
<rlb>debian dues still have an unofficial port, fwiw: https://buildd.debian.org/guile-3.0
<old>could not just let this weird arch die uh
<old>I swear, the memory ordering on Alpha is just insane
<old>Dependent loads can be reordered
<old>only get that on Alpha
<old>and maybe MIPS idk
<rlb>At least we don't have to deal with s390 anymore (effectively 31-bit arch --- had to spend some time figuring out guile/emacs pointer-manipulation related issues there in the past).
<rlb>debian supported that up until a few years ago (ish)
<old>lol joke on you ..
<old>I had a bug on s390x last month on our tool
<old>we are still supporting it
<rlb>Oh, right, debian still support s390*x*, but that's "just" 64-bit.
<old>I have not idea who use that other than IBM frame?
<old>right right
<old>s390 itself is not used anymore I think
<hwpplayer1>Do you contribute to any part of Debian ?
<rlb>(It's been less surprising, mostly just of the big-endian variety.)
<hwpplayer1>I wanted to learn
<hwpplayer1>old ^
<hwpplayer1>sorry rlb ^
<rlb>hwpplayer1: oh, no worries.
<hwpplayer1>Do you have a Debian machine
<hwpplayer1>I have an AMD machine and it is Ubuntu 24.04 LTS
<hwpplayer1>switched from Debian
<hwpplayer1>How can I contribute here with my own
<rlb>hwpplayer1: oh, I misread things --- you *were* asking me. So yes, I maintain guile and emacs for debian.
<rlb>among other bits
<hwpplayer1>Thanks , and how can I contribute to guile then ?
<rlb>guile or debian, or both?
<hwpplayer1>guile first
<hwpplayer1>I can write guile code
<hwpplayer1>this is an application
<hwpplayer1>or script
<hwpplayer1>but the "core"
<ieure>hwpplayer1, It's a normal Free Software project, make a Codeberg account, hack something up, pull request. This is the repo. https://codeberg.org/guile/guile
<hwpplayer1>I want to excel computer science and programming language theory and related
<hwpplayer1>I have a Codeberg account hwpplayer1
<rlb>I'd typically start with whatever I'm interested in; never hurts to have a motivating issue. Guile itself has some notable variety (scheme vs c, end-use side vs compiler tower, etc.).
<ieure>hwpplayer1, Then you're set, take a look at the issues, find something that you want to work on, do that.
<hwpplayer1> https://codeberg.org/procyberian/guile
<hwpplayer1>this is our project organization
<hwpplayer1>okay I see
<hwpplayer1>Thanks
<hwpplayer1>I'll read issues also
<hwpplayer1>Time will tell
<rlb>If you're someone who likes reading docs, perhaps look at the reference manual, also good to learn scheme well.
<hwpplayer1>Yes
<hwpplayer1>Thanks
<ieure>Yeah, if you don't already know C and Scheme, you should start there.
<rlb>Some possibilities here, for example: https://docs.scheme.org/
<hwpplayer1>I know some C
<hwpplayer1>I know some Emacs Lisp
<hwpplayer1>okay
<rlb>That first book is "a lot", but also exceptional if it suits.
<hwpplayer1>for scheme.org
<hwpplayer1>how do you code with Guile ?
<hwpplayer1>for what reason ?
<rlb>Also, reference manual is here https://www.gnu.org/software/guile/manual/
<hwpplayer1>Okay
<old>hwpplayer1: there is also the mailing list guile-devel you should subscribe too
<rlb>I currently don't actually use guile a lot, though I have in the past. I mostly work *on* it at the moment. In part that's because for some of the things I'd recently have wanted to use it for, it still needs some additions/enhancements. It's also because I was working mostly, and heavily in clojure (on the lisp-ish front) for a good while.
<old>and there is the debbugs that need some help with triage
<hwpplayer1>thanks odl
<hwpplayer1>old
<hwpplayer1>brb
<JohnCowan>rlb: What are some enhancements that you need?
<rlb>Heh, I'm sure many here have already heard more than their fair share about it --- I need solid support for arbitrary system data, i.e. paths, users, groups can't (just) be unicode strings. Otherwise, you just can't (comfortably) write something like tar, cp, rsync, etc. with a language.
<old>in other words, raw strings handling from the OS :-)
<old>like C can do
<rlb>For example, there was a time when I was at one of my peaks of "disappointment" with python where I was looking at guile with respect to a possible bup rewrite, but just couldn't.
<rlb>ACTION has had any number of those peaks, but it's a bit better now that they finally gave in and handled the same issue.
<rlb>(still not a language I'd prefer to be working in)
<rlb>And there are many ways to handle it, and *it's complicated*.
<rlb>(with many tradeofss)
<rlb>"tradeoffs"
<rlb>(I imagine we'll eventually figure something out, and I may well help implement it once we come to an agreement.)
<rlb>(I'd likely have already been working on it if we'd known which way we wanted to jump.)
<rlb>old: fwiw, the rebase -i --exec make check testing across the utf8 branch revealed a couple of bugs that I'm fixing. Fortunately nothing too serious yet.
<rlb>(bugs in the branch I mean)
<hwpplayer1>I'll be back
<hwpplayer1>I'm checking my infra
<rlb>JohnCowan: ...and my current inclination is to think that whatever else we might do, something like this, for example, should probably "just work", for all paths, without anyone having to know anything about the broader mess: https://paste.debian.net/hidden/9330c8ef
<old>If only utf-8 was invented before everything else
<old>including unix
<rlb>(Right now, you'd have to run guile with a latin-1 locale for that to just work.)
<hwpplayer1>I'm back
<rlb>old: I believe one of the bugs was of the (topical wrt the other day) variety where it only showed up when the string was long enough and we took the "less happy" path for (I think?) string->list. If I verify that's right, I'll see about some additional test coverage.
<rlb>(wrt the utf8 bugs I found)
<rlb>JohnCowan: oh, and in case it wasn't already obvious to you, one way to provoke the issue (assuming you have, say, a utf-8 locale) is "date > $'foo-\xb5'; ./cp $'foo-\xb5' tmp-dest"
<rlb>(and you're on ext4/ufs/xfs/..., but not I believe afps or maybe zfs(?))
<rlb>"apfs?")
<rlb>whatever --- macos' fs.
<ieure>Despise Apple's continued insistence on case-insensitive filesystems. At one point, I reinstalled a machine just to format the disk with case-sensitivity, but all Mac software is riddled with filename sensitivity bugs which the default FS masks, so tons of stuff broke horribly, including first-party software.
<ieure>Awful system.
<rlb>"case" is a whole other mess :)
<jcowan>Sure. If something doesn't matter to the system, then it rests solely on human willingness to accept conventions, and humans are not good at consistency.
<jcowan>"A foolish consistency is the hobgoblin of little minds." --R. W. Emerson
<jcowan>rlb: I have a proposal for how to deal with dirty string data (specifically for pathnames, but it's quite general) at https://codeberg.org/scheme/r7rs/wiki/Noncharacter-error-handling
<jcowan>I think it's a lot better than dealing with everything as strings-or-bytevectors as Python does.
<rlb>right, that, or a variation on it is what I've been tending to think of as our leading candidate (even if we *also* want broader bytevector/API support someday).
<rlb>(e.g. https://codeberg.org/scheme/r7rs/issues/51#issuecomment-3536639 (
<rlb>And python now does roughly the same thing.
<rlb>(in addition to parallel bytevector support all over the place --- originally quite haphazard, now less so)
<rlb>But now in python, things mostly "just work" no that front via the surrogateescape approach that's (somewhat) similar to noncharacters.
<rlb>I think...
<old>jcowan: you have a general solution for possible smuggling of payload with the surrogate tricks?
<rlb>Basically, I currently favor an approach like noncharacters (if we can address all the concerns), whatever else we do, so that most people never have to care.
<old>Say, you have a gate filtering procedure for accessing some files and somehow a user smuggle noncharacter in a way that by-pass the filter and get access to the file
<rlb>And right, old, I'm not sure what I think about that bit yet, though I'm also not sure it's intractable, or even what the material concerns are.
<rlb>I'm also not positive yet (haven't had time to think it through), but perhaps related --- python forbids smuggling the ascii range.
<rlb>i.e. surrogateescape forbids escaping < 127
<rlb>iirc
<old>I'm not even sure if it's possible tbh
<old>but security is not my speciality
<old>I can think of potential security hole, but not make an actual one
<rlb>What's possible, the smuggling? I think it is.
<old>ya
<old>I don't have a practical use case in mind really, but I guess it is indeed plausible in some scenario
<rlb>*in theory* (perhaps to your point)
<rlb>i.e. if a current scanner's looking for "rm -rf /", and those bytes are noncharacter escaped, they'll show back up "on the way out", I think?
<rlb>but the scanner won't see them
<jcowan>By the same token, if the codec is an EBCDIC variant, it won't see them there either.
<jcowan>A covert channel that requires modifying both ends is inherently undetectable, because that just amounts to encrypting your data (with a very weak encryption).
<jcowan>Being able to encrypt data in flight is usually considered a Good Thing.
<rlb>My impression was that were we to get more serious about all this, someone (perhaps at least me) would first need to spend more time trying to clarify the threat, to better know whether we think it matters, and/or whether there were sensible ways to handle it if so.
<rlb>jcowan: I think the main concern here is that you could hav existing code, that's completely correct/secure, and then suddenly it's not, perhaps down in some sub-dependence you don't even know about. I'm not saying that would be the case, just that that's one of the questions/concerns.
<rlb>Before there was no way that data could "go anywhere".
<mwette>There is still a lot of devepment for the 8-bit avr processors for which gcc and libc have sizeof(double)=4.
<rlb>(if that's right)
<rlb>(Suppose it might also affect whether various bits are "opt-in", or whatever.)
<rlb>dunno -- (for me) would just require more thought.
<rlb>(I mostly just set aside thinking about it for now because I think we'll probably want to deal with utf-8 first, and that's already "a lot". Then we'll have to understand everything well enough, and finally come to some consensus.)
<jcowan>For me the central concern isn't security, it's being able to represent 100% of cases while optimizing for the 99.9999...% of cases.
<jcowan>What should readdir return in an R6RS environment where files named using random code points may exist?
<old>What's the intended goal here? To me, we want Guile to be a system language. For this, we need to be able to handle 100% of the cases. But we also need the security
<old>Optimization is good also wherever we can ofc
<jcowan>Saying "Files are named using bytevectors, period" is secure and covers 100% of cases at the expense of usability.
<old>right
<jcowan>So in effect it pessimizes for the 99.9999%
<jcowan>OTOH (as I said in my link above), strings as sequences of Unicode scalar values is not very interesting theoretically; its utility is practical.
<rlb>jcowan: *if* there is a plausible security question, then I wonder if it might also be something that could be handled by "the standard" via some flavor(s) of opt-in. You wouldn't want that if you were starting from scratch, but there's a lot of scheme code out there, so perhaps it could turn out to be a reasonable security/convenience compromise.
<jcowan>Well, if we could prove that some design was secure, we wouldn't need security reviews.
<rlb>old: and yeah, I think that a key question we/guile will have to answer, is whether or not a program like that one I pasted "should work", i.e. should a normal scheme program work without you having to know anything about this mess with totally valid linux paths.
<rlb>Because the failure mode is very confusing for those that don't and don't want to know about the mess.
<probie>using bytevectors also allows invalid filenames, and makes it harder to write portable code (e.g. If I have some path `foo/bar/baz` and want to turn it into `foo/bar/baz.tar`, it's not trivial any more)
<probie>I don't even know how many bytes `.tar` is going to be (is it 4 or 8?)
<jcowan>Right. Scheme procedures can be polymorphic in their arguments, but not in their results, so you end up doubling up on all APIs.
<jcowan>or at least all that can return filenames. But as I also said, filenames are just a special case of the more general problem: dirty text files are if anything more common than dirty filenames.
<rlb>envt vars, user names, group names, extended attributes (i.e. selinux, etc.)...
<jcowan>(Technically they can be polymorphic in results, but that means the caller has to wrap the call in a typecase.)
<jcowan>Python's surrogateescape convention works for some value of "works", but it violates Unicode by assuming that unpaired surrogates are valid.
<rlb>I perhaps, incorrectly, think of file content a bit differently. I suspect there you more often really do want/need to know if the encoding is bad if you're actually manipulating the data, and other wise you could perhaps just use "binary"...
<rlb>"otherwise"
<probie>A "dirty text file" is just a binary file. Filenames are harder because they have constraints; not every sequence of bytes is a valid filename, even on Linux, which is very permissive (they can't contain the byte 0)
<rlb>Where for "system data" I suspect in many cases, all you care about is bytewise-equality and/or a round-trip.
<jcowan>Or the byte slahs.
<rlb>for programs where the programmer doesn't already need to know about the mess anyway.
<rlb>Oh sure, paths have that one rule (no / or \0)
<jcowan>You also care about matching. Environment variables are looked up by name, just as filenames are, and are returned by (get-environment-variables).
<rlb>and *ascii* '/' is what matters
<jcowan>What does that mean?
<rlb>i.e. it's the byte 0x2f, not anythign else.
<jcowan>Ah, yes.
<rlb>"anything"
<rlb>Wherever you end up, I suspect you'll often be assuming "ascii compatible" these days.
<rlb>(for most decisions about what to do)
<jcowan>probie: But a dirty text file is probably almost entirely a text file, it just has a few instances of badly encoded characters.
<rlb>email ;)
<probie>I don't think you can assume "ascii compatible" if you want portability; a windows filename isn't
<jcowan>Treating all text files as binary just-in-case is another example of pessimizing.
<jcowan>(As an analogy, the fact that rlb writes "anythign" does not mean that his messages are not in English.)
<rlb>I mean that you *have* to in various cases where say you need to remove *ascii* control character bytes for safety,e tc.
<rlb>Or, as mentioned that 0x2f is the directory separator, not some other encoding of unicode '/'.
<rlb>at least for linux/*bsd/others.
<jcowan>On ITS the path separator was space.
<rlb>vax semicolon?
<jcowan>Within []s, yes.
<rlb>Of course you could go for "full abstraction", cf. common-lisp.
<jcowan>I put together a library design that used the CL framework to handle not just pathnames but also URIs.
<jcowan>bah
<jcowan>I just noticed that you can't make ls(1) cough up the st_dev field. I was using -i to uniquely identify files, but that's not workable when you have multiple partitions (or mounted cd's, or whatever)
<rlb>does it need to be *portable*?
<rlb>..stat(1) and find(1) can do it, but not posixly.
<jcowan>I didn't know about stat(1). But I think I'll just rewrite my script in Python.
<jcowan>I don't see how to do it with find.
<rlb>printf I think?
<rlb>i.e. perhaps "find -printf "%D"?
<rlb>at least wrt coreutils
<rlb>guile can of course do it too, but as we said, not in a general purpose way (for arbitrary paths) without latin-1.
<jcowan>bsd supports that too, although it's not Posix.
<rlb>I'm sure there's also a perl one liner of some sort.
<rlb>old: I'm not positive yet, but as a quick hack, I think you might be able to cross a .go format change boundary without a full re-bootstrap in some cases via 'find stage2 -name "*.go" -delete' and then a rebuild.
<rlb>(plus or minus rm -r ./.cache and/or .../readline.go depending on what changed)
<rlb>there are some test-suite .go files in .cache/
<rlb>(of course I suppose your tree is then inconsistent, i.e. wrt stage0/1, but may not matter for some purposes)
<rlb>nvm --- I thought that worked last night, but maybe I misunderstood. Isn't working now.
<rlb>(and meant ./cache)
<mwette>ls
<ekaitz>mwette: LOL
<rlb>old: you asked about utf-8 docs earlier, and I completely forgot about https://codeberg.org/rlb/guile/src/commit/82d05cacffcabcb6c879f6a2432fde6fd75ca1d1/README-utf8-conversion which I see has a typo or two I'll fix, and which also discusses the potential stringbuf optimization i mentioned earlier, but couldn't remember clearly.