IRC channel logs

2026-04-23.log

back to list of logs

<rlb>Does anyone know why we uc_tolower (uc_toupper (x)) in say string-compare-ci instead of just uc_tolower (c)? I'm wondering if I knew and have forgotten, e.g. https://codeberg.org/guile/guile/src/commit/5c9a3e931c00d7a0582529a2b047038c5cc79c6c/libguile/srfi-13.c#L1104-L1105
<jcowan>You shouldn't, because there are many lower-case characters without upper-case equivalents.
<jcowan>The Right Thing is to use the Unicode fold-case algo, which also deals with the special case of Cherokee.
<rlb>So maybe that's a bug? (I was worried about Chesterton's fence a bit...).
<rlb>Hmm, I wonder if for srfi-13, converting to lowercase before searching via u8_tolower (string ...) might be wrong since it explicitly specifies "simply based upon comparing the encoding values used for the characters".
<rlb>i.e. it might require per char uc_tolower() conversion.
<rlb>(still don't know about the toupper/lower round trip in the equality functions)
<rlb>I guess wrt the srfi it depends on what uc_tolower does, i.e. srfi-13 specifies UnicdoeData.html conversion, so if that's what uc_tolower does, then great. I'd assume it's not what u8_tolower does since it's intended to consider the character's context.
<jcowan>SRFI 13 doesn't do ci comparisons.
<rlb>(the latter being the function that operates on an entire string --- all from libunistring)
<rlb>It does.
<rlb>string-contains-ci
<rlb>string-suffix-length-ci, etc.
<rlb>string-ci=
<rlb>unless I misunderstand what you mean
<jcowan>You're right: brain fart
<rlb>But it seems to explciitly specify "non contextual" ci
<rlb>For now I think I'll assume that's what uc_tolower does, and we can worry about it later.
<rlb>i.e. that's a separable issue, and what we're already doing.
<jcowan>IMO case folding of strings should be full folding (contra SRFI 13 and R6RS)
<rlb>ouch --- basically no string-contains-ci tests in the test-suite...
<rlb>will improve...
<rlb>a bunch of indirect string-contains tests because other tests uses it, but very few, if any, explicit string-contains tests either.
<rlb>...emacs scheme mode here seems to indent while wrong (at least compared to our docs). i.e indends the body like a function rather than a let.
<JohnCowan>rlb: while is not RnRS standard syntax, so scheme-mode defaults to assuming it's a procedure call
<rlb>ahh, didn't recall it wasn't standard, thanks.
<rlb>(should have guessed, though :) )
<rlb>...I converted (ice-9 gap-buffer) to avoid string-ref/string-set!, but after seeing that it has no tests, I'm considering leaving it alone for now, rather than try to invent a test suite right this minute.
<rlb>(i.e. dropping the optimization)
<rlb>Though I suppose it's particularly not suitable to O(n) string-set!.
<rlb>(without the conversion to vector of char)
<jcowan>It definitely should be converted to vectors
<rlb>Sure, but without tests, no idea if we'll be releasing it broken. I *think* it's correct, but..
<rlb>So I think either someone will need to write some tests, or we'll need to just plan to fix that later.
<rlb>(it might be me, but my pile of things to do, even just for utf8, is still "high")
<rlb>Just finished adding the tests for string-contains(-ci) that didn't exist. I'll likely push those back in the series, and then add them main, since we want them there too, sooner.
<spk121>rlb: I saw in the chat log your question about uc_tolower (uc_toupper (c)).
<sneek>Welcome back spk121, you have 1 message!
<sneek>spk121, rlb says: thanks --- mostly (aside from curiosity) just wondered if ice-9/test.scm wasn't actually used/tested by anything, whether we wanted to keep it (in addition to r4rs.test), but not a big deal either way.
<spk121>Here's the original discussion: https://lists.gnu.org/archive/html/guile-devel/2011-03/msg00111.html
<rlb>spk121: *nice*, and thanks --- and I think that matches my current interpretation, though it didn't immediately dawn on me that wide/latin was one of the things constraining us here before. Sounds like if we don't do it in the first pass, we could consider eventually reviewing/improving the case handling now that we can use u8_toupper (string....).
<rlb>In any case I'll keep that in mind from now on as I hack on it.
<rlb>Also amused to see that discussion --- I hadn't before, I don't think (and I probably should have before embarking on this adventure...).
<spk121>It is weird looking at stuff I wrote 15 years ago. I was a different person then.
<identity>is there some fancy pattern for something like (match value ('not-what-you-want (values)) (what-you-want #;(use what-you-want)))? if you replace 'not-what-you-want with #f, you can do (cond (value => (λ (what-you-want) …))), but you can not do (cond ((symbol=? value 'not-what-you-want) => (λ (what-you-want) #;(what-you-want is just #t here))))
<avigatori>o/
<mwette>identity: can you defined the box you want? sounds like (lambda (v) (if (= v 'not-wanted) (values) something-else))
<mwette>Instead of the algorithm, what is the set of input-output pairs?
<identity>mwette: i was just wondering if there is a more concise way to do something if the value is not a specific symbol, and do nothing otherwise
<rlb>"exciting", TIL (unintentionally) that SCM_UNBNDP arguments are not *undefined*, and they are equal? to other things...when run via ./check-guile, but not from the repl or via "guile some-file".
<rlb>(I'd accidentally let one escape via a bug in string-contains, and fortunately eventually noticed that tests were passing when they shouldn't have been.)
<rlb>No idea why, and of course "don't do that then", but still seems wrong.
<rlb>Easy to reproduce by adding this, and then printing out (equal? x (guile-unbound)) from a ./check-guile foo.test. It's #t here on main... https://paste.debian.net/hidden/50276beb
<rlb>old: any idea why that might be? Wondered if check-guile handles evaluation a bit differently or something. I'm not going to pursue it right now, but I'll make a note to worry about it later, i.e. at least file an issue or something.
<old>hmm
<rlb>(I'd originally vaguely assume that unbound arguments would be *undefined*, but their (object-address)es are different, at least from check-guile.)
<rlb>fwiw
<old>how do you get guile-unbound ?
<rlb>It does risk false positives in the tests, so probably worth addressing at some point.
<rlb>See the paste.
<mwette>identity: in other words, do nothing if the value is a specific symbol, and process otherwise. Why not (cond ((= v specific-symbol) (values)) (else ...))
<old>ahh okay
<rlb>i.e. optional argument not provided
<rlb>"bad behavior" to return that of course
<rlb>but still, probably want pass-if-equal to notice, for example.
<rlb>ACTION did
<mwette>(where I was sloppy with, e.g., `=')
<old>rlb: see (system base types internal)
<old>106 (undefined undefined? #b111111111111 #b100100000100)
<identity>mwette: i would have to do (let ((v (expression))) (unless (= v 'symbol) (use v))) which seems pretty long-winded compared to (cond ((expression) => use))
<old>114 (unspecified #f #b111111111111 #b100000000100)
<rlb>From ./check-guile?
<rlb>It only happens when run via ./check-guile foo.test
<rlb>(as yet)
<rlb>Not via repl, and not vial "guile foo.scm"
<rlb>EIDA
<rlb>EIDEA
<rlb>Here's a trivial foo.test that should do it:
<old>hm
<mwette>OK.
<old>are you sure it's not because check-guile is using another version of Guile?
<rlb> (format (current-error-port) "huh: ~s\n" (equal? 'x (guile-unbound)))
<rlb>It supposed to use meta/guile, and I haven't seen anything to suggest it's not.
<rlb>And I do see it responding to my changes in libguile/ etc.
<old>what's equal? returning?
<rlb>Anyway, it's trivial to reproduce here with that paste/patch.
<rlb>#t
<old>I guess that's because you are doing: (equal? 'x)
<old>effectively
<rlb>i.e. if you apply that patch to main, build, and then run that foo.test, you may see the same thing.
<rlb>If not, that'd be interesting too :)
<old>but it's normal that it returns #t in that case
<old>that's not unexpected. Are you saying this differ when using check-guile ?
<rlb>why? It doesn't from the repl.
<rlb>i.e unbound shouldn't be equal? to 'x?
<old>when do you: (equal? 'x *unbound*)
<rlb>oooooooooooooooooooooh.
<old>the C version of equal will probably (I haven't check), check if the second argument is unbound
<rlb>wow, and thanks
<old>now with check-guile, you are probably using the *compiled* variant of equal? which act a bits differently
<rlb>It's just *aliasing*.
<old>yes
<rlb>Right, well glad I asked, and sorry for the distraction.
<old>so: (equal? 'x (guile-unbound)), is effectively the same thing as calling (equal? 'x), on the C side
<mwette>identity: It looks like you are trying to bind expression `value'. So => is the way to do that, I guess.
<old>np :-)
<rlb>unbound is not "out of band" :)
<rlb>I (obviously) just didn't think about the fact that ended up just manually constructing an "optional variable" that way. Drawback to allowing the "unbound" out into the wild :)
<rlb>Heh, suppose we could have pass-if-equal assert that none of the arguments are unbound, but that actually would require leaking the unbound value to the scheme side somehow (if it's not already).
<rlb>fwiw, I created this mess because the srfi-13 arg helper MY_VALIDATE_SUBSTRING_SPEC leaves the scheme-side args unbound; it only sets the default values for the c vars, and so a "return start1" when the string-contains "needle" is empty doesn't return 0, it returns unbound, while cstart1 *has* been defaulted to 0 by the macro (obvious, of course, in hindsight).
<old>rlb: you can leak unbound as a number if you want
<old>use object-address on it
<old>now you have a number that is the unbound object
<rlb>oh, hah, ouch :)
<rlb>Not sure whether we want that check --- it's trivial then, I suppose.
<rlb>But maybe I'm the only one who's likely to have made this particular mess.
<old>well I would avoid messing with object-address and internal C bits if possible :p
<rlb>...maybe I'll try it locally and see if any tests fail, just out of curiosity.
<mwette>I wonder if this is the line removing private bindings: https://codeberg.org/guile/guile/src/branch/main/module/language/cps/closure-conversion.scm#L414
<dsmith>sneek, botsnack
<sneek>:)
<dsmith>!uptime
<sneek>I've been faithfully serving for 13 days
<sneek>This system has been up 1 week, 6 days, 9 hours, 43 minutes
<rlb>what's the more efficient "bind multiple values" approach these days, or are they all similar?
<rlb>nvm, module's already using let-values
<rlb>...we have no tests for (scheme base) string-for-each