IRC channel logs
2026-04-18.log
back to list of logs
<rlb>I just ran the current r7rs benchmarks on main vs utf8, and unless I did something wrong, I'm a touch surprised --- utf8 is now substantially faster. e.g. frequently 20-30% (it looks like, roughly), and say "cat" finishes in ~24s rather than 60+; "tail"'s over twice as fast. <rlb>iirc it looks notably better than last time (I suppose I have converted a lot more functions since then) <dsmith>better *and* faster. What more could you want? <rlb>...an absence of the bugs I've undoubtedly introduced? ;) <rlb>Finished another rebase -x '... make -j11 check' across all 190+ commits, and things still look good. Updated the branch. <rlb>Oh, right --- we still have in-place mutability for string-set! for ascii, but old and I have been discussing starting off without it, and see how it goes. I'll probably benchmark again without that. <JohnCowan>dsmith: One of the Ada requirements was that only space, A-Z, 0-9, and %&'()*+,-./:;<=>?_ are allowed <jcowan>Or more precisely that Ada implementations may be expressed using more characters, but there must be a down-translation into the above highly portable set defined. <jcowan>Fortran, Cobol, Basic, and PL/I all use () for array indexing. I think Algol 60 was the first to require [], and even then a down-translation ("hardware language") was specified. <rlb>looks like our string-unfold is incomplete --- it only accepts/assembles chars, when it's supposed to handle strings too (i.e. from the mapper). <benl>hello, I'm a bit stuck on something wrt Tree-IL. I am trying to generate an if/else-if/else conditional from cond but due to the pattern matching on <conditional> I can't have more than one alternate. <ekaitz>benl: you mean (if a b (if c d e)) ? <ekaitz>so, what's the issue here? cond doesn't produce if-else? <ekaitz>(macroexpand '(cond (1 2) (2 3) (5 6))) => #<tree-il (if (const 1) (const 2) (if (const 2) (const 3) (if (const 5) (const 6) (void))))> <benl>the expansion to tree-il makes sense to me. I'm just doing something wrong with how I'm handling it, because when I pattern match like (($ <conditional> _ test consequent alternate) (...)) and I try to process the alternate branch I just wind up with #<unspecified> <ekaitz>i think I'm missing some context here <benl>I defined a language with `define-language` and I am transforming scheme->tree-il->my custom language. In the define-language form there is a #:decompilers field that handles how tree-il is transformed into my desired output. <benl>that handler matches on the various records defined in (language tree-il), but I'm out on a limb here because I am just reading source code trying to understand how this all works. <benl>ah, wait, I think I figured it out. There are these generated getters like conditional-alternate that let me manipulate the tree-il before I process it further. <ekaitz>sometimes just saying it out loud helps <ekaitz>ACTION is happy to be a rubber duck <dsmith>jcowan, Didn't know that about other langs and () for array indexing. <dsmith>Those are all mainframey langs. I worked on those machines, but didn't write code for them. <dsmith>Well, I would write hand loops in machine code to test out I/O. Still remember 0x47 is branch on condition! Ugh! <rlb>Ahh, srfi-154 and srfi-13 describe string-unfold slightly differently, so perhaps ours is fine according to srfi-13. srfi-154 explicitly allows make-base to return a string *or* char, but srfi-13 doesn't mention the type, and ours doesn't allow a char. <rlb>(I was using it to compare utf8 vs main, etc.) <jcowan>rlb: I assume you mean SRFI 153? <rlb>Do we have any precedent for adding a partial srfi implementation? Say I wanted to add string->vector and vector->string right now, but wasn't up for adding all of srfi-152 yet. I could just put that in a new (srfi srfi-152) if we allow incomplete (srfi *) modules. Otherwise, I'd have to find an initial home "elsewhere". <jcowan>or no, that doesn't make sense either. Damned SRFI numbers are anti-mnemonic. <rlb>jcowan: I was looking at 152 actually --- not sure I at first even noticed I'd hit 152 rather than 13 net (docs-wise). <rlb>Or maybe I just assumed that 13 and 152 string-unfold were the same last night. Either way, that's why I was confused. <rlb>(I may want string->vector and vector->string to help with replacing string-set! in modules/. So far I'd only dealt with it in libguile/.) <rlb>As mentioned, they differ in that srfi-152 explicitly supports char or string as a return value in places where srfi-13 only mentions char, and our implementation does enforce char atm. <rlb>i.e. string-unfold's make-final <rlb>What I really want is some integrated (likely two-pass, bulk) way to take a pile of string fragments and make a final string, where the string fragments come from a mutable "buffer" type, i.e. #vu32() or vector of characters, or... <rlb>e.g. (string-append-fragments '(f1 f2 f3 ...)) where could be a vector of chars, for example, and then the implementation could traverse them all once to compute the final (utf8 size) and then allocate once, and built he result. <jcowan>I was trying to track down where allowing make-final to return a flexible value comes from. <rlb>Use case is say (read-string port) so it can read the input in fragments, then stitch them together. Alternative could be a traditional realloc/enlarge loop, but that's potentially more expensive. <jcowan>You're right that it's not in SRFI 13 but 13 does say that (lambda (x) "") is the default value, which means it must accept at least strings. <rlb>Say for example, for read-string :) <rlb>(read-string port count) I mean. <rlb>So when you hit the count in the stop? fn, you can just return the final char. <jcowan>If your srfi-13 insists that make-final can only return a character, it doesn't conform to SRFI 13 <rlb>OK, well the actual prose of srfi-13 doesn't clarify (I think?), where srfi-152 does. <rlb>i.e. 13 just says 'Make-final is applied to the terminal seed value (on which p returns true) to produce the final/rightmost portion of the constructed string. It defaults to (lambda (x) "").' <rlb>srfi 152 adds "It is an error for make-final to return anything other than a character or string." <jcowan>The fact that the default value returns a string implies that the function may return a string. <rlb>And 13's "mapper" only says char, where 152's says "char or string". <rlb>I'm guessing that's what the implementor of our string-unfold saw and added the char check. <rlb>Anyway, we can easily change it if that's what srfi-13 meant. <jcowan>The order of development is 13 -> 130 (cursors) -> 135 (immutable texts) -> 152. <jcowan>Allowing mapper to return a string was introduced in 135; allowing make-final to return a char or string was introduced in 135. <jcowan>So allowing mapper to return a string was definitely a change (introduced by Clinger, the author of 135). <jcowan>But I think allowing make-final to return a string was implicitly present in 13 already <rlb>Hopefully they're all compatible, and so we can just implement one, and it'll just be fancier than some earlier srfis specified, when you import the less-fancy srfi. <rlb>i.e. hopefully we only need one. <rlb>Seems likely from 13 and 152 <jcowan>Hopefully nobody uses 13's make-final <rlb>So do you know of a "preferred" way to efficiently handle the typical "incrementally reading pieces to build a final string situation" across the srfis? A "vectors->string" could do it (or a growing reallocation of a vector and then vector->string or...), but maybe a srfi had something else in mind? <rlb>(We can of course just use a list of chars and then reverse-list->string, but that's going to be notably more space, and (likely) scattered in RAM.) <rlb>Want a call that internally can traverse all the fragments once to compute the final allocation size, then allocate, then populate the (in this case utf8) string. <jcowan>I would say construct a pair-based tree of strings and then concatenate them in two passes at the end (count, allocate the result, copy into it). <rlb>Can't quite do it with a (slightly odd) string-tabulate call because it can't assume that the (proc i) chars would be the same across two passes (of course). <rlb>I need somewhere to put each read-char that doesn't require an allocation, and string-set! now does. <rlb>So I need a vector or... <jcowan>A u32 vector would be better, probably <rlb>i.e. we can't use strings as buffers anymore (without O(n) cost) <rlb>Right, but I can't efficiently convert that in the end. <rlb>Hence wanting (vectors->string ...) <rlb>For us a plain, general purpose vector would be fine since a char fits into a cons cell. <jcowan>but it says nothing about implementation <rlb>We have a gap buffer, but it's current implementation has the same problem --- uses string-set! :) <rlb>i.e. we need to fix that too. <jcowan>SRFI 118 provides an API for variable-length strings <rlb>Bascially, I need some primitive function to do the bulk build at the end, from some flavor of mutable char "collection" (general vector, homogoneous vector, etc.). <rlb>Ahh, OK, I'll take a look. <rlb>And for us general vector is likely the most efficient since read-char gives you a char, and a char fits into a general vectors "cell" without any allocation/conversion. <jcowan>That's true. Indeed, that's why I added string<->vector to R7RS. <rlb>Right, and now we just need "one level up" :) <rlb>Worst case, I can just add something like that for internal use until/unless there's a srfi, or we want to make something of our own public. <jcowan>Is that going to take a list of vectors, then? If so, it might as well be generalized to a tree <rlb>For now, all I'd need is a list, but doesn't preclude fancier. <rlb>Though it might also be likely be on the C side for now, so might just start with list. <rlb>Thanks for mentioning srfi-118 --- that might also be plausible for this case, and would mean taking the "realloc loop" path, which I think may have less peak memory use, but potentially higher cost (for resize copying). <jcowan>One possibility is to use a fill pointer (a CL concept whereby a string/vector can have a "current size" as well as a "max size". <jcowan>The copying isn't that expensive as long as the new size is twice the old size (up to some maximum) <jcowan>obvs if you have a 2 GB string and it's full you can't just double it <rlb>...and the (vectors->string ...) approach requires transient 2x space (and then one copy of everything to the destination string). <rlb>old: I just discovered we may already have a "solution" to internal only bindings from SCM_DEFINE. See the steal-bindings! at the end of boot-9.scm. I think that means you can just SCM_DEFINE whatever you like, and then yank it into another module (removing it from (guile)) right there, during startup. <rlb>At least that's what it looks like atm. <rlb>I wonder if that means we didn't need (and maybe can remove) %boot-9-shared-internal-state. <avigatori>ekaitz: I am not sure what Java is like. I just mean separating definition from implementation <janus>avigatori: i am not a very good schemer, but doesn't your premise assume typing? <ekaitz>we don't do that kind of thing much, honestly <janus>but clojure probably has a bunch of infrastructure for it <janus>maybe that would provide some inspiration <janus>(it's not scheme of course, but you know that) <avigatori>janus: maybe. Maybe I am using the wrong word. I mostly mean how to coordinate different "shapes" implementations have to adhere to. Maybe that's accepting records with field x and y or something <janus>avigatori: there is something called 'contracts' which provides a way to check pre- and post-conditions. since you can ask whether something is a particular record or not, you can 'check' that in a pre-condition <ekaitz>avigatori: you could also use GOOPs but if you are using records... i don't think there's anything in scheme by default for that <ekaitz>you can match the type and decide what you do in a record <avigatori>well, maybe I should ask, how does scheme solve the problems where interfaces are usually used? For example a program that can be extend in some way. How do schemers ensure that both sides speak the same language? <ekaitz>how do you do that in any other dynamic language? <rlb>There's no enforcement, but as mentioned, pattern matching and/or goops and/or documentation (or some system built on top). <rlb>(match ...) style pattern matching, etc. <janus>avigatori: maybe you would like Typed Racket as Racket is almost Scheme... <rlb>There's nothing like clj's interfaces built in. <rlb>I don't know whether there might be relevant srfis offhand. <avigatori>matching on a record seems closest. Tho I guess alists are closest to the kinds of records I had in mind originally <janus>avigatori: are you going to use srfi-9 records with ice-9 match then? <avigatori>janus: I am not sure yet. I am mostly investigating how modular software is done in scheme/different concerns are kept separate <janus>well other scheme impl's don't have ice-9 i think? so it's not canon <Arsen>ekaitz: the "half indentation" isn't half indentation, it's consistent indentation <Arsen>not indenting {} under an if is inconsistent with the rest of the schemes that do that <Arsen>it's also, again, how lisp does it, which is usually the explanation <ekaitz>i mean, i don't need a justification for it, I just use it and don't complain much <ekaitz>in fact I got used to it and kind of like it <ekaitz>I even started to use it in my non-gnu projects <ekaitz>i guess becoming a gnu maintainer makes that to your brain <mwette>The goops package in guile has generic functions, where you implement a method of the same name for the type signature. See the goops section in the manual. <mwette>So you could define a generic function "area" and have different implementation methods of "area" for <circle>, <square>, ... <rlb><integer> <string> too <rlb>(i.e. all the types become represented) <Arsen>ekaitz: heh I went through the same proces eventually too <mwette>The danger is that you do something like redefine "list" as a generic, sort of a hidden side effect of using goops. <old>rlb: right I saw that steal thing the other day wrt to some psyntax stuff <old>I wonder tho if tree-il optimizer can still do optimization for stolen bindings <rlb>old: hmm, dunno but for at least &message, &irritants, &bytestring-error and bytestring-error, I'd guess that might not matter? And we can actually move those to (ice-9 exceptions internal) if we want to --- all tests pass. <rlb>(where the internal module is created/populated during boot-9) <rlb>If we can do something similar for read-bytestring-content, we can drop the %boot-8-shared-internal-content hack. All else equal, that seems preferable. <rlb>(Unless there's some optimization issue that matters.) <rlb>And I guess that paste comment's no longer relevant --- I was "redirecting", until I realized that no one needs those bindings to ever exist in (guile). <rlb>old: wondered if we might want any special naming convention for "intangible" modules like that (i.e. with no backing file). Suppose we could put them all under (guile internal ...) or always name them (... internal) or whatever. Or maybe that's not a big concern. <rlb>Oh, maybe they should just go under boot-9, i.e. (ice-9 boot-9 {exceptions,read,...}). Since I'd imagine someone might well want (srfi srfi-N internal) or other (... internal) as a real module at some point. <rlb>That makes their relationship the boot-9 boot process clear.