IRC channel logs

2026-05-09.log

back to list of logs

<rlb>I have been surprised in a number of cases where we didn't have any tests...
<rlb>Less surprised when we didn't have non-ascii and/or crazier unicode tests...
<rlb>(Have started introducing more of those via https://codeberg.org/rlb/guile/src/branch/utf8/test-suite/test-suite/data.scm )
<mwette>Apparently, pcre is recommended.
<rlb>where did you mean?
<mwette>it was an ai response to web search for "does linux regex library work on utf-8 strings"
<rlb>Ahh, right --- pcre *could* work. (For now I've used that for clj's regex support in lokke.)
<rlb>ACTION looks to remember what the current state is in utf8...
<rlb>mwette: oh, right, I think that it's probably just "whatever regex(3) does" as usual -- i.e. I think we just convert our string (used to be latin1/utf-32, now always utf-8) to a "locale string" and then hand that to regex_exec(3), etc.
<mwette>OK. So not a big issue.
<cow_2001>why would you check if something's "pair?" and then that it's not "null?" if it's a "pair?" it cannot be a "null?", right?
<cow_2001> https://codeberg.org/guile/guile/pulls/117/files#diff-9310b651841e9094160358f22da8ed6d4faaf31c
<rlb>sounds right to me offhand, i.e. I'd imagine that not null is no longer needed.
<rlb>Without knowing the context there, I'm guessing it's dropping list? for performance.
<mwette>a string is neither
<rlb>Looks like ice-9 match doesn't support multiple values right now?
<cow_2001>turns out that if you have whitespace+ in non-terminals, they add up with the whitespace+ in the terminals ~;~
<cow_2001>i sprinkled whitespace* all over the place and changed some to whitespace+ and stuff stopped working
<cow_2001>it is at times like these i wish i had a proper desk to which i could smash my face into
<ArneBab>old: now there
<ArneBab>old: I don’t know explicit flonum benchmarks. Maybe you could adapt the nbody benchmark from the benchmarksgame: https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/nbody-racket-1.html https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/nbody-racket-2.html
<old>merged the bump of gnulib version
<old>one will need to call autogen
<mwette>ty old
<rlb>...open-input-string doesn't need to copy the contents if the source string is read-only.
<JohnCowan>Float benchmarks would tend to measure only hardware performance and type inference
<old>I'm mostly wondering if adding flonum? as a primcall in tree-il would help these benchmarks
<old>Anyway. It seems to help in some cases. Now flonum? and fixnum? are both primitives understood by tree-il
<old>However, I keep wondering if this the correct type of optimization we want.
<old>Instead of marking these rnrs functions as primitives, it would be interesting if the compiler at the CPS level could infer the type and predicaee check automatically
<old>For example, flonum? now gets compiled to heap-object? + flonum?
<old>However, if one write flonum? by hand: (define (my-flonum? x) (and (real? x) (inexact? x)))
<old>then no primcall is emitted and CPS failed to see that the check is equivalent to flonum?
<old>instead the compiler emits: fixnum? + heap-object? + heap-number? + compnum? + flonum?
<old>Not sure what kind of optimization pass this would be called, I'm no compiler expert yet
<rlb>Is there something like (char->number c) matching (string->number c)? Say you have a char, and you want to convert it to a number (as string->number would) without having to allocate a string.
<rlb>ACTION suspects not
<old>rlb: you mean a digit to number ?
<old>Or the ordinal value of a character
<rlb>right #\4 -> 4
<old>like 0 -> 48 in ASCII
<rlb>as (string->number "4") would
<old>ah okay
<old>hm
<rlb>Though I just realized that in this case it's probably not important.
<old>well in C we have the '0' - c hack :-)
<rlb>i.e. the case in question only cares about ascii
<old>but in Scheme Idk
<rlb>string->number iirc may be locale-aware
<rlb>(which is why I was wondering)
<rlb>(Arabic digits, etc.)
<old>I'm wondering if the: '0' - c hack works in all encoding
<old>in that, is the distance between the 0 character and some other digit always that digit ?
<old>for all encoding
<old>hm
<old>i suspect not
<rlb>I *think* srfi-207 means just 0-9a-fA-F, even though its reference implementation uses string->integer in at least one place.
<rlb>ACTION was just attempting to remove some more repeated string-refs.