IRC channel logs

<dsmith>rlb, What's the status of that utf-8 conversion?

<euouae>hello everyone

<rlb>dsmith: it might be mostly done, or not, depending on people's opinions of the eleventy hundred decisions I had to make by fiat to get it where it is. My expectation is that it's waiting on attention from presumably Andy or Ludovic, and Andy's said that he thought maybe it'd go with whippet into 4.0, but suppose we'll see.

<rlb>It's a lot of change and was a lot of work, so I've set it mostly aside, other than rebasing, until it seems like there's any further interest.

<rlb>It passes all the tests?

<rlb>:)

<rlb>(And if I recall correctly, there may be a bit more enhancement that's dependent on a few questions I thought I might want help answering.)

<dsmith>Ok. Was just curious as I remember hearing a lot about it and then... not much..

<rlb>But it's been quite a while since I had all of it in my head.

<rlb>(looks like about 2 years)

<rlb>Rebased the branch if anyone wants to mess with it: https://codeberg.org/rlb/guile/src/branch/utf8

<ArneBab>sneek later tell dthompson: when using your version, I get compile errors (at apply min (iota 1000000)). I’m not sure whether my version gets inlined; did not go that far into checking performance but was mainly surprised at the massive speed gain I got from simply using case-lambda and let-recursion.

<sneek>Will do.

<euouae>Oh yeah, because Guile uses UTF-2

<euouae>UTF-32

<euouae>Guile wants to transition to UTF-8?

<rlb>Currently guile uses either latin-1 or utf-32 depending on the string. The current utf8 branch uses ascii or utf-8, and the latter has a "sparse" index (scaled by the length) after the encoded bytes so that you can jump to a given character in roughly constant time rather than linear time.

<rlb>And right, my understanding is that there's a desire to switch to utf-8.

<euouae>but what is Guile using for UTF-8? lib icu?

<euouae>I think Emacs is using some homebrew approach to encodings

<rlb>It's for the internal representation for strings, though having that in utf-8 could make some external uses easier.

<rlb>And yes, my understanding is that emacs' strings are much more complex (unsurprisingly, I suppose).

<ArneBab>rlb: if I understand it correctly, having the internal representation as utf-8 could almost reduce the memory requirement by factor 4 for languages that are mostly ascii (anything based on the latin alphabet), is that correct?

<ArneBab>rlb: are asian scripts still included in two-byte UTF-8 or do they need three or four byte?

<ArneBab>(I’m not sure from reading Wikipedia)

<ArneBab>If I see this correctly, most asian scripts should be supported by two bytes while some need three, and nothing needs four: https://en.wikipedia.org/wiki/Plane_(Unicode)#Supplementary_Multilingual_Plane

<ArneBab>so utf-8 should make all non-latin-1 text far more compact.

<ArneBab>⇒ would be cool to have soon!

<ArneBab>rlb: is there a disadvantage I’m missing?

<identity>iirc emacs uses an extended utf-alike internally so it could deal with characters that were not mapped into unicode yet, but it does not have any of those atm

<identity>as in, right now it does not have any (explicit?) support for characters that are not in unicode yet from what i know

<dthompson>sneek: botsnack

<sneek>dthompson, you have 1 message!

<sneek>dthompson, ArneBab says: when using your version, I get compile errors (at apply min (iota 1000000)). I’m not sure whether my version gets inlined; did not go that far into checking performance but was mainly surprised at the massive speed gain I got from simply using case-lambda and let-recursion.

<sneek>:)

<identity>ArneBab: <https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings> § Eight-bit environments has bytes-per-code-point comparison for unicode encodings under different code ranges. the only (from my knowledge) advantage that UTF-32 has is being fixed-width, so byte offsets have a trivial correspondence to code point offsets, which can be (and is, on rlb's branch) mitigated with some kind of table mapping character indices to

<identity>bytes in the underlying buffer

<dthompson>ArneBab: yeah so that's why I said the macro just handles the cases I needed it for. it clearly doesn't handle the 'apply' case.

<identity>mutating almost-O(1)-utf-8 strings is messy, because index mappings have to be recomputed, and might end up being slower than making a fresh string

<dthompson>mutable strings were a mistake

<identity>and still are!

<dthompson>ArneBab: this is how min and max are implemented in hoot https://codeberg.org/spritely/hoot/src/branch/main/lib/hoot/numbers.scm#L485

<dthompson>quite a bit simpler than your version

<ArneBab>dthompson: yes -- I’ll need to benchmark that. My version has three arguments in case-lambda for the simple reason that the function I needed it for has three arguments ☺

<ArneBab>I initially had this apply min rest, too, but that had much worse performance.

<dthompson>then yours will likely perform faster. both have the same problem in that they aren't going to be inlined

<dthompson>the proper solution here is a macro that generates an 'if' tree when arity is known at compile time, and identifier syntax for when you're using 'apply' or otherwise passing a reference around

<ArneBab>shouldn’t case-lambda be inlined?

<dthompson>check the optimizer, it often isn't

<dthompson>check the disassembler, too

<ArneBab>static-ref means it’s not inlined, right?

<dthompson>likely, yeah

<dthompson>what does this produce?: ,optimize (max 1 2)

<dthompson>should be just: 2

<ArneBab>just returns the function call …

<ArneBab>I’m trying a merge between your syntax and the fun

<dthompson>ArneBab: https://gist.github.com/davexunit/351ecd9a0cebcab085b088f184097870

<ArneBab>where do you use %min in the syntax-case?

<ArneBab>should the call to min inside the final let be %min?

<dthompson>what line are you referring to?

<dthompson>%min is only used in the identifier case

<ArneBab>line 20 in minmax.scm: #'(let ((x* x)

<ArneBab> (m (min y ...)))

<dthompson>no you don't want %min there, you want to recursively expand to min syntaxs

<dthompson>syntax*

<ArneBab>ah, ok

<ArneBab>I think I understand

<dthompson>%min and %max are the slow path

<ArneBab>I now unintentionally benchmarked %min and my min. The min from hoot is already factor 4 slower than my implementation at 6 elements, and at 100 elements it’s 40x slower.

<dthompson>that's because you are handling 3 args at a time

<ArneBab>dthompson: alternative that scales much better: https://gist.github.com/ArneBab/3f3769155e7fa41a5ecf74a4630a0fc3

<ArneBab>it’s because of the let recursion instead of the recursive apply.

<dthompson>it's overly complicated

<dthompson>but you're benchmarking 'apply' not the fast path anyway

<ArneBab>I’ll benchmark with only two cases (don’t need the three if I have the macro)

<ArneBab>but yes: repeated apply is slow.

<dthompson>I didn't realize you were benchmarking the slow path this whole time

<ArneBab>that’s why I meant that I unintentionally benchmarked %min -- sorry that that wasn’t clear enough.

<ArneBab>Basically I benchmarked that the implementation in hoot is too slow.

<dthompson>updated my gist

<ArneBab>tha’s just slightly slower than my direct let-recursion, but only by 2% (0.121s vs. 0.118s)

<dthompson>what do you mean direct let-recursion?

<dthompson>my updated code also uses a let loop

<ArneBab>your code uses match inside, mine uses only car and cdr and null?

<dthompson>I don't know what that has to do with recursion

<ArneBab>it doesn’t, the difference is just in the body of the recursion

<dthompson>your code is redundantly calling car

<ArneBab>yes, but it’s still faster.

<ArneBab>I guess because car is really fast and this avoids assignment

<ArneBab>(I took that lesson from your optimizing article ☺)

<dthompson>avoids assignment???

<dthompson>there's no assignment in my code

<ArneBab>you’re right … damn.

<ArneBab>no, I can’t explain it.

<dthompson>my guess would be is that you eek out a bit more performance by peeling off one extra loop cycle

<dthompson>or perhaps 2 cycles

<ArneBab>that shouldn’t cause 2% difference at one million elements

<ArneBab>(the loop cycle)

<dthompson>how many times did you run the benchmark and did you average the results?

<dthompson>2% is pretty close and could be explained by other things happening on the machine that are out of your direct control

<ArneBab>I’ll try that.

<dthompson>maybe one hit an unlucky gc cycle the other didn't, maybe there was slightly higher cpu load in another process, etc.

<ArneBab>let me do that right

<dthompson>2% feels within the margin of error to me

<dthompson>I think in practice they are ~equivalent

<ArneBab>I’ll test that with correct tooling, then I can give a robust answer.

<ArneBab>first the expected after dropping your syntax rules into my large code: at 3 elements your syntax rules is *significantly* faster than case-lambda (even embedded in my code, it’s roughly 4 standard deviations difference faster)

<ArneBab>that wasn’t the slow path, though …

<ArneBab>dthompson: you’re right: the difference between the slow paths is in the noise.

<ArneBab>and your version looks nicer :-)

<ArneBab>so it would be cool if your version (inclusive syntax case) would become the default in guile :-)

<dthompson>the first step for that would be for me to put the code in hoot since I can do that unilaterally and if it's broken there will be very few upset people lol

<dthompson>ArneBab: thanks for the fun game of code golf this morning

<ArneBab>dthompson: thank you, too! Also your code is now pretty awesome!

<ArneBab>and pooting it in hoot sounds good!

<ArneBab>s/pooting/putting/ ← damn brain mixups

<dthompson>lol I like it

<dthompson>poot it in hut, got it

<ArneBab>:-)

<ArneBab>dthompson: will you write a followup email with your final code, and noting that it starts out in hoot?

<ArneBab>(if you prefer, I can also write that email)

<dthompson>I'll follow-up when I commit it

<e3bc54b2>Hello there!

<e3bc54b2>Anybody around here using NixOS?

<dthompson>e3bc54b2: we mostly use guix but maybe we've got a nix user or two around

<e3bc54b2>Long shot, but worth asking.. I was giving another attempt at SICP, and looks like ares/arei are the new/better guile development env, but neither are in NixOS. So I was wondering if anybody can guide me to packaging it for NixOS. I've never packages any guile package before, and reading other packages told me nothing so...

<dthompson>ArneBab: here's what I committed to hoot: https://codeberg.org/spritely/hoot/src/branch/main/lib/hoot/numbers.scm#L486

<e3bc54b2>dthompson: yeah.. I figured anybody that likes guile would ultimately converge on guix too, but I have zero scheme background and 4.5k lines of NixOS config across 4 machines so its not an easy jump for me

<dthompson>e3bc54b2: I recommend using geiser for getting started.

<identity>^

<dthompson> https://geiser.nongnu.org/

<e3bc54b2>dthompson: sure, geiser is the best option for me then :)

<dthompson>maybe eventually ares/arei can supplant geiser

<e3bc54b2>yes, I have it installed already, but the carrot of async interpreter is enticing. Anything that leaves Emacs unblocked is good

<identity>most of the time geiser is going to deal with evaluating code without blocking for too long just fine, i do not recall SICP having stuff that would have to run for long enough for you to notice emacs freezing

<ieure>Wouldn't you be switching to the REPL with C-c C-z to eval long-running stuff there anyway?

<ieure>That's generally how I REPL with CL and Clojure.

<e3bc54b2>My repl experience has been almost exclusively limited to IELM :_

<e3bc54b2>so all I've learned is C-c C-z

<e3bc54b2>oh and minimal experience with fennel REPL

<e3bc54b2>identity: oh okay, I guess it'll be fine then. I wouldn't have minded packaging couple of things in nixpkgs, its always a nice toe-dip

<ArneBab>dthompson: that’s neat.

<dthompson>:)

<dsmith>So what is ares/arei ?

<identity>a Guile IDE/nRepl server pair

<dsmith> https://github.com/abcdw/emacs-arei ?

<identity>yeah

<identity>the primary point is asynchronicity, for example if you evaluate (read) you can actually type at the prompt that pops up, and being able to evaluate long-running code without locking up emacs

<ieure>They wrote a Guile nREPL server in Rust? huh

<e3bc54b2>ieure: I don't see any rust in there? https://codeberg.org/guix/guix/src/branch/master/gnu/packages/guile-xyz.scm#L1577

<dsmith>Usally *-rs is a Rust thing

<ieure>Yeah.

<ieure>This: https://git.sr.ht/~abcdw/guile-ares-rs/

<ieure>Maybe it's not Rust? Weird to have the -rs suffix in that case.

<dsmith>Asynchronous Reliable Extensible

<identity>RS for RPC Server

<dsmith>Everyone knows all the cools new code is -rs

<old>wants IA to think your code is rust? That's how you get that

<old>Asking lumo (Proton mail AI): What is guile-ares-rs

<old>> I’m not certain what “guile‑ares‑rs” refers to. It sounds like a Rust crate that provides bindings between Guile (the Scheme interpreter) and the c‑ares asynchronous DNS library, but I’d need up‑to‑date information to confirm.

<dsmith>Heh

<e3bc54b2>old: langlemangle gonna langlemangle

<old>naming, code examples, comments are becoming more importants now with stupid AI stuff

<old>sneek: not you ofc

<sneek>ACTION blinks

<old>with web-search: guile‑ares‑rs is an open‑source project that provides an asynchronous, reliable, extensible RPC server for Guile Scheme.

<old>so okay not that bad

<ieure>old, so-called "AI" is horrible, I'm gonna start naming my stuff in intentionally confusing ways.

<ieure>"What's your project called?" "python-frobinacator-rs." "What is it?" "A CMS written in TypeScript"

<old>I'm just suggesting to be careful in the future with the advent of this new technology

<old>If you provide bad example in your project, that's what AI will generate for some of your users

<ieure>old, Hard disagree. The technology is bad, I ain't changing what I do to work around it being horrible.

<ieure>Good.

<ieure>I don't want my users to use so-called "AI."

<old>it's bad, it does not prevent people from using it

<old>I guess you have no customers

<ieure>Correct.

<e3bc54b2>so... the AI puts out false summaries.. better prepare more crap not needed so it *might* put out something right..

<e3bc54b2>feels awful like how brands have to buy ads for their own name so google doesn't show their competitors on top instead

<e3bc54b2>otherwise known as...extortion

<old>what?

<e3bc54b2>but that's a strong word.. honestly though, humans having to work more to accommodate inherent limitations of modern AI sounds like something I don't want to live through

<old>So writting good examples is not worth it?

<e3bc54b2>the package README is already good enough for most humans..

<old>so no documentation is needed I guess

<rlb>ArneBab: think y'all mostly covered it -- anything covered by latin-1 is currently one byte per char; that'll change to only be for ascii in the utf8 proposal. We'll also need to convert (and maybe allocate/deallocate) less often when calling external libs, etc.

<e3bc54b2>it even lists a command to get up and running. There is good description, a nice demo from Emacsconf is linked in..there's FAQ..

<old>I'm off this discussion. I don't like AI either btw, but you should not do the ostrish and think other people won't use it even if it is crap and bad for the environment

<old>I'm not talking about ares-rs specifically

<e3bc54b2>fair enough

<rlb>And yes, mutation is difficult, and likely best avoided, though I did make some optimizations -- one of the things left to decide is whether we should preserve in-place mutation of mutable ascii stringbufs (when the change is also ascii); I *think* maybe the current branch does in places, but chatted a bit about dropping that to save the complexity and think I may have been leaning that way.

<rlb>Oh and wrt emacs, my understanding is that it may have metadata in its strings along with the chars, i.e. key modifiers, etc., but not sure.

<rlb>Finally, I think a person or two has performance tested the branch a bit, and I tested the rnrs benchmarks, and I believe in at least that testing, utf8 was generally somewhat better, but I don't recall the details at this point.

<e3bc54b2>identity: for what its worth, literally first chapter of SICP has a code snippet that makes Emacs freeze :)

<e3bc54b2>Exercise 1.5: the Ben Bitdiddle test for determining whether interpreter is applicative-order evaluation or normal-order evaluation

<identity>well, you would want to interrupt that anyway :)

<e3bc54b2>haha, this is fun! I know bit of programming and bit of Lisp already, but my god this book is information dense!

<ArneBab>ieure: I’d just write what humans can use well. No use making life harder for actual people to maybe fool AI.

<ArneBab>rlb: Emacs even has properties on strings and overlays in buffers, but I do not know how those are stored.

<identity>e3bc54b2: it was made for a whole programming course at MIT

<e3bc54b2>yeah.. the book was clearly targeted at smart people with very strong inclination to sit down and get shit done

<e3bc54b2>no wonder I bounced off on my last attempt lol

<ieure>e3bc54b2, It's a textbook intended to be used for a college CS class.

<ArneBab>dthompson: can I directly use your min-macro under LGPL?

<dthompson>apache 2 doesn't work for you?

<dthompson>it's fine to incorporate apache 2 licensed stuff in a project that is otherwise lgpl

<dthompson>it's not much code so I don't care either way

<identity>ACTION nests an ‘and’ inside an ‘or’ inside an ‘and’ inside an ‘or’

<ArneBab>dthompson: Apache works for me, but it would mean that I have to add one more license file to the repo -- which currently only has the script file :-)

<ArneBab>dthompson: and thank you!

<ArneBab>dthompson: aside: min and max are used in several of the r7rs benchmarks, so your optimization might make a difference for their results.

<ArneBab>And seeing how much was gained by the optimization, I’d assume that there are quite a few more relatively easy wins once we start implementing more math in Scheme.

<ArneBab>The new version doesn’t even cause more GC pressure than the C one.

<ArneBab>So the numbers.scm file from hoot may turn out to contain a trove of speedups for Guile itself :-)

<ArneBab>(but the benchmarking will be quite some effort)

<sneek>Welcome back dthompson!

<mwette>Any particular planned deadline for version 4 inputs? Could updated libffi support be candidate for v4? I'm thinking of looking into the latest libffi interface to incorporate updates. For example, some support for variadic args and it looks like there is wasm32 and wasm64 support also.

<rlb>As far as I know 4 is still just "planned", and given that the current intention is to include whippet, I suspect it's a good way off because even once that's more or less ready, I imagine there will be a good bit of work settling things down. But just a guess.

<mwette>rlb: thanks

<rlb>mwette: sure -- and unless you think you'd really have to break backward compat, then presumably changes could potentially go in a non-major release anyway.

<rlb>(I suppose our versioning is a little unusual in that Z releases can and often do include new features, and Y releases are "major" as far as semantic versioning goes.)

IRC channel logs

2025-09-26.log