IRC channel logs

2026-04-21.log

back to list of logs

<rlb>old: one other thing I vaguely wondered about was whether we might want some unusual naming convention for "intangible" modules, i.e. those with no backing file. Might make it harder to accidentally shadow one with a file later (e.g. the new (ice-9 read internal)).
<rlb>...I also noticed that (ice-9 ports) already "hides" a module by defining (ice-9 ports internal) at the end of module/ice-9/ports.scm.
<rlb>And if that actually could be in a file, then I'd be inclined to break it out.
<rlb>(but the question would remain in some other cases)
<dsmith>rlb, How about a name with a \0 char in the start ? *That* can't be a file.
<dsmith>Not an original idea. Linux (posix?) uses that for some special naming. Can't remember right now. Named pipe or shared mem or something.
<rlb>unix(7)
<rlb>And yeah, that'd be plausible if all the relevant module/path related code would handle the "empty" string correctly in any cases where it didn't already treat the string as strictly determined by length. Interesting, though I'd also wonder if it might produce confusing errors, if any of the related code (incorrectly) respected the null, e.g. when printing a path on failure.
<dsmith>YEah, that's it. I thought since Guile uses counted strings internally, it could work.
<dsmith>"The abstract socket namespace is a nonportable Linux extension."
<rlb>It *should* work, indeed ;)
<rlb>Maybe I'll try it later, see what happens...
<rlb>Separate question of whether we want to explicitly use symbols like that.
<dsmith>Chez has some kind of generated symbol. No read syntax for it IIRC. Used in macros
<mwette>Hey. A new implementation of strings would presumably be shared w/ symbols. It would be useful to add a hash code to the string implementation, to make lookups potentially fast. Comments?
<rlb>Currently, symbols still have an indirection (now to a string proper, previously, in 3.0 to a stringbuf): https://codeberg.org/rlb/guile/src/commit/b3b7a75f81c6cfe245e2bb60d974b6f51b231575/libguile/strings.c#L1523-L1531
<rlb>Are you talking about string having a hash too?
<rlb>And/or maybe you mean to make it so that a symbol is really just a string with a different type tag?
<rlb>(after adding hashes to strings?)
<rlb>Can't easily do that, though, I think without some other restructuring and/or indirection because strings aren't conceputally immutable.
<rlb>i.e. *some* strings are, but not all
<rlb>whereas symbols are absolutely immutable, of course
<mwette>just symbols makes sense (to me)
<mwette>could it be unused for use as a string?
<rlb>Hmm, I think I probably don't understand what you're suggesting completely yet.
<mwette>one bit is required to indicate if hash code is there or not
<rlb>Sure, but if it's only symbols, then why not just keep the hash in the symbol as it is above (cell 0)?
<rlb>"only for symbols"
<rlb>We rarely ever need to look at the string, I'd guess, for most programs.
<mwette>if it fits, great
<mwette>in symbol datum
<rlb>right, it does --- hashes are currently a "word" so a cell (i.e. a long).
<rlb>Have been wondering about adding a "completely inline" read-only string variant: https://codeberg.org/rlb/guile/src/branch/utf8/README-utf8-conversion#L76 So truly read-only strings wouldn't have an indirection to a stringbuf, since they would actually *be* what we use for a stringbuf then).
<rlb>Benefit of course is, more compact, and very cache friendly whenever you have read-only strings.
<rlb>(and not having to have separate "stringbuf" operations)
<rlb>but that's optional --- can always pursue it if/when we feel like it
<dsmith>rlb, Like a Rust smartstring ?
<dsmith> https://crates.io/crates/smartstring
<rlb>Hmm, assuming I understand that, not exactly? The main difference would that that instead of a string being a few cells (length, offset, ...), one of which is a pointer to a (possibly shared) stringbuf with the contents, these read-only variants would *be* the stringbufs that other strings might point to, and so would "just be the content", meaning an in-line char-count, byte-count, utf-8 bytes, and if long enough, the sparse index at
<rlb>the end.
<rlb>So cpu prefetching will immediately get the whole string, instead of just the pointer to the contents, etc.
<rlb>(if the string is small enough)
<rlb>And there's of course less scattering across ram, et.
<rlb>etc.
<rlb>I did see something that suggested our sparse index might be similar to what swift has done, but not sure.
<rlb>I just did a "likely" thing.
<dsmith>rlb, That does sound a lot like that smart string. A rust string has a pointer to heap, a capacity, and a length. A smart string uses the those for the actual string bytes. No heap allocatioin.
<rlb>Oh, OK, I was thinking more about the heap vs stack and "limited to the size of the top level item" aspects, which don't apply, but I think I see what you mean.
<rlb>Not a high priority, but it'd be nice if our test suite eventually supported something like pytest --durations=N ... Which reports the run times of the slowest N tests at the end.
<old>re: nul byte in private internal module
<old>how would you enter that module in the REPL with ,m then?
<old>We want to be able to debug these module like other :-)
<rlb>heh, nice
<old>Could we share file with internal things? I've never actually try something along this:
<old> https://paste.sr.ht/~old/9b5038464d42f8717572dea1772f73ba174cb9d1
<old>I think not because of how module name are search wrt to load-path and the file-system
<rlb>Hopefully most of the time we can just have a file on disk --- I actually wonder if (ice-9 rnrs ports) could be split...
<rlb>I suppose with the (ice-9 read internal) thing we just did, it's maybe a tradeoff --- either the intangible module, or we could stash read-bytestring-content in a %guile-i- for later use. But the current full encapsulation has its apeal.
<rlb>i.e. no extra bindings in (guile)
<rlb>Do we have some kind of (add-after-module-load-hook! ...)? If so, we could use that to insert the binding: (after-module-load '(ice-9 read) (lambda () (module-define! read-bytevector-contents value-from-the-read-scm-include)))), and then we're all set.
<rlb>...or something similar (e.g. after-create-hook or something). But that's more complex in some ways and spookier action at a distance than what we're doing now.
<dsmith>Hmm...
<dsmith>scheme@(guile-user)> "\0foo"
<dsmith>$1 = "\x00foo"
<dsmith>scheme@(guile-user)> (string->symbol "\0foo")
<dsmith>$2 = #{\x0;foo}#
<dsmith>What's that ; in there?
<rlb>I think that's probably the escaping syntax (iirc).
<dsmith>ok
<rlb>i.e. so you know where the hex ends.
<rlb>but I don't recall for sure --- one of the syntaxes works that way because I was messing with related string/symbol code, but I forget where.
<rlb>Hmm, given the load_extension approach that ports.scm and binary-ports.scm use (for example), I suppose that's another way we can share internal (C) functions across multiple modules without needing anything like %guile-i-**.
<rlb>...and looks like (ice-9 ports) (ice-9 ports internal) also relies on @@ --- from a quick glance, there's other complexity there, guessing if it could have been a separate file, it would have been.
<adanska>is it possible to spawn a fiber within another fiber?
<ArneBab>Would adding an efficient u4vector be easy (with values from 0 to 7)?
<ArneBab>actually from 0 to 15.
<ArneBab>I could save 25% of memory with that in wispwot :-)
<ArneBab>(article that prompted my question: 4 bit floating point numbers: https://www.johndcook.com/blog/2026/04/17/fp4/ )
<spk121>rlb: with the latest change, my CI goes from pass to fail. In make distcheck, stage0 compilation of eval.scm fails with
<spk121>guile: uncaught exception:
<spk121>Module named (srfi srfi-207 internal) does not exist
<spk121>bug in the am/bootstrap.m4 file, probably
<rlb>nice, and thanks --- sounds right. Will fix.
<old>rlb: wrt to private bindings like %guile-i
<old>why was using not-exported symbols not doable?
<old>using @@
<rlb>I thought we were supposed to avoid those in general, because inlining might lose the reference or similar?
<old>well if the problem is inlining
<rlb>If not, then that'd work fine too. And I just saw that we *are* doing that in at least one place.
<old>then we want a compiler hint
<old>for preventing inlining on these binding
<rlb>e.g. something was using @@ in modules I think.
<old>but that way, we don't expose any internal symbols in the public interface
<rlb>If we want to (and think we can) make that the practice instead, sounds good to me, and I can adjust the docs.
<old>I think that would be the best
<old>IRRC, ludo says that doing (set! foo foo) is enough for now
<rlb>Oh, I think I'd heard and forgot about that.
<rlb>I should plan to do a survey of our modules/ @@ use (or "someone" :) ) along those lines.
<old>we could abstract this in some way like: (declare (not-inline foo))
<old>following `declare' from common lisp
<rlb>Oh, right, that'd be better.
<rlb>i.e. calling it out so it's easy to change
<rlb>Oh, and do you know of any existing practice wrt providing part of a sfri before we have the whole srfi. i.e. I might want to add vector->string and string->vector (srfi-152) before I finish implementing the whole thing. (We could of course just add an incomplete (srfi srfi-152) and document that it's partial.)
<rlb>I might actually finish it first, so it might not matter, but I only actually needed those two functions (and think they're more generally useful "sooner" in the utf-8 world).
<old>hmmm I would avoid merging public modules that are incompleted
<old>but nothing prevent you from pulling these functions from the SRFI, use them in the files you need them
<rlb>The alternative would be some ice-9/guile module to house those functions first, but that's not ideal if we really are going to eventually have a complete srfi-152
<old>and later we refactor
<rlb>i.e. we'll be stuck with the temp module "forever".
<rlb>api-wise
<old>the temp module?
<rlb>Say we want to provide vector->string and string->vector in the ut8 release, but we're not finished with srfi-152. What exports it?
<old>I would just pull the definition in boot-9 (not top level) and reomve them later with proper module referencing
<old>ahh okay you want to provide them to the users
<rlb>right
<old>I thought only used internally by us
<old>hmm the proper thing would be to have full srfi-152 ready for the release I guess
<rlb>You want those because now you might want to work with vector of char instead of string for quick/dirty conversions to avoid string-set!.
<old>until then, we can incrementally build srfi-152
<rlb>(for transient string building)
<old>bbl
<rlb>Hah, ok, I see what you did there.
<rlb>(I was likely to finish it anyway :P )
<rlb>I suppose a disadvantage to the @@ approach is that it could make it a touch harder to find/see all the relevant interdependencies, though at least when finding "by inspection", if we always bind those just after the define-module, it's not too bad.
<old>rlb: we can always define something like: module-internal-ref which is just @@
<old>so it's easier to grep
<old>or something similar
<mwette>Are you assuming all @@ refs in a module are explicit (i.e., exist as text in the .scm file)?
<mwette>try this: ,use (srfi srfi-9) THEN ,expand (define-record-type <foo> (make-foo) foo?)
<rlb>mwette: perhaps you're suggesting that @@ is already "required to work"?
<rlb>(If so, then not surprised, and great --- won't have to mess with lokke either.)
<mwette>I'm not 100% sure of the objective of your effort here. (I admit I'm following on the back-burner.) I just wanted to make sure you are aware that lots of @@ refs come from expanding macros (at compile time).
<mwette>Is this about stale upstream references? -- foo uses bar and bar changes but foo may not automatically be recompiled
<rlb>main builds should be fixed now
<rlb>The concern (I thought) was about the optimizer somehow potentially removing the "internal only" binding entirely.
<rlb>i.e. if it inlined it everywhere, and didn't need it anymore, or something?
<mwette>Ah. OK.l
<rlb>But if we're already relying on @@ everywhere, then we've implicitly promised not to do that anyway ;)
<rlb>so it's "safe"
<mwette>In which case explicit refs to @@ are at risk.
<mwette>It depends on how smart the compiler is. I don't think the compiler can remove internal bindings that are used in public macros.
<rlb>Well, also as Olivier said, acc to Ludovic, (define foo foo) is a sufficient guard if we need it.
<mwette>In modules where internal bindings should not be removed? Clever.
<mwette>I was looking in cps/* to see where things are removed. It's a bit above my head right now (even But wingo is good about commenting his code).
<mwette>s/But/though/
<mwette>Hmm. I think I'm wrong for your case. For macros I guess the compiler can inline local references in the syntax transformer.
<rlb>jcowan: know of anything like string-concatenate-reverse that operates on vectors of char instead of strings? Turns out that's exactly what's needed to allow fairly trivial conversions of existing "buffering" string-set! based code. If not, I might hack up a (possibly internal) string-concatenate-vectors-reverse. I already had a "vector-fragments->string", but this would be better (all in one).
<rlb>(i.e. instead of building a list of pending string fragments (with a possibly partial last fragment), you build a list of pending vector fragments, and then one call produces the final string with one allocation/copy, as efficiently as it can.)
<rlb>cf. sxml next-token, etc.
<rlb>(...perhaps we should also try to come up with a utf-8 migration guide.)