IRC channel logs
2026-04-23.log
back to list of logs
<jcowan>You shouldn't, because there are many lower-case characters without upper-case equivalents. <jcowan>The Right Thing is to use the Unicode fold-case algo, which also deals with the special case of Cherokee. <rlb>So maybe that's a bug? (I was worried about Chesterton's fence a bit...). <rlb>Hmm, I wonder if for srfi-13, converting to lowercase before searching via u8_tolower (string ...) might be wrong since it explicitly specifies "simply based upon comparing the encoding values used for the characters". <rlb>i.e. it might require per char uc_tolower() conversion. <rlb>(still don't know about the toupper/lower round trip in the equality functions) <rlb>I guess wrt the srfi it depends on what uc_tolower does, i.e. srfi-13 specifies UnicdoeData.html conversion, so if that's what uc_tolower does, then great. I'd assume it's not what u8_tolower does since it's intended to consider the character's context. <jcowan>SRFI 13 doesn't do ci comparisons. <rlb>(the latter being the function that operates on an entire string --- all from libunistring) <rlb>string-suffix-length-ci, etc. <rlb>unless I misunderstand what you mean <rlb>But it seems to explciitly specify "non contextual" ci <rlb>For now I think I'll assume that's what uc_tolower does, and we can worry about it later. <rlb>i.e. that's a separable issue, and what we're already doing. <jcowan>IMO case folding of strings should be full folding (contra SRFI 13 and R6RS) <rlb>ouch --- basically no string-contains-ci tests in the test-suite... <rlb>a bunch of indirect string-contains tests because other tests uses it, but very few, if any, explicit string-contains tests either. <rlb>...emacs scheme mode here seems to indent while wrong (at least compared to our docs). i.e indends the body like a function rather than a let. <JohnCowan>rlb: while is not RnRS standard syntax, so scheme-mode defaults to assuming it's a procedure call <rlb>ahh, didn't recall it wasn't standard, thanks. <rlb>(should have guessed, though :) ) <rlb>...I converted (ice-9 gap-buffer) to avoid string-ref/string-set!, but after seeing that it has no tests, I'm considering leaving it alone for now, rather than try to invent a test suite right this minute. <rlb>(i.e. dropping the optimization) <rlb>Though I suppose it's particularly not suitable to O(n) string-set!. <rlb>(without the conversion to vector of char) <jcowan>It definitely should be converted to vectors <rlb>Sure, but without tests, no idea if we'll be releasing it broken. I *think* it's correct, but.. <rlb>So I think either someone will need to write some tests, or we'll need to just plan to fix that later. <rlb>(it might be me, but my pile of things to do, even just for utf8, is still "high") <rlb>Just finished adding the tests for string-contains(-ci) that didn't exist. I'll likely push those back in the series, and then add them main, since we want them there too, sooner. <spk121>rlb: I saw in the chat log your question about uc_tolower (uc_toupper (c)). <sneek>Welcome back spk121, you have 1 message! <sneek>spk121, rlb says: thanks --- mostly (aside from curiosity) just wondered if ice-9/test.scm wasn't actually used/tested by anything, whether we wanted to keep it (in addition to r4rs.test), but not a big deal either way. <rlb>spk121: *nice*, and thanks --- and I think that matches my current interpretation, though it didn't immediately dawn on me that wide/latin was one of the things constraining us here before. Sounds like if we don't do it in the first pass, we could consider eventually reviewing/improving the case handling now that we can use u8_toupper (string....). <rlb>In any case I'll keep that in mind from now on as I hack on it. <rlb>Also amused to see that discussion --- I hadn't before, I don't think (and I probably should have before embarking on this adventure...). <spk121>It is weird looking at stuff I wrote 15 years ago. I was a different person then. <identity>is there some fancy pattern for something like (match value ('not-what-you-want (values)) (what-you-want #;(use what-you-want)))? if you replace 'not-what-you-want with #f, you can do (cond (value => (λ (what-you-want) …))), but you can not do (cond ((symbol=? value 'not-what-you-want) => (λ (what-you-want) #;(what-you-want is just #t here)))) <mwette>identity: can you defined the box you want? sounds like (lambda (v) (if (= v 'not-wanted) (values) something-else)) <mwette>Instead of the algorithm, what is the set of input-output pairs? <identity>mwette: i was just wondering if there is a more concise way to do something if the value is not a specific symbol, and do nothing otherwise <rlb>"exciting", TIL (unintentionally) that SCM_UNBNDP arguments are not *undefined*, and they are equal? to other things...when run via ./check-guile, but not from the repl or via "guile some-file". <rlb>(I'd accidentally let one escape via a bug in string-contains, and fortunately eventually noticed that tests were passing when they shouldn't have been.) <rlb>No idea why, and of course "don't do that then", but still seems wrong. <rlb>old: any idea why that might be? Wondered if check-guile handles evaluation a bit differently or something. I'm not going to pursue it right now, but I'll make a note to worry about it later, i.e. at least file an issue or something. <rlb>(I'd originally vaguely assume that unbound arguments would be *undefined*, but their (object-address)es are different, at least from check-guile.) <old>how do you get guile-unbound ? <rlb>It does risk false positives in the tests, so probably worth addressing at some point. <mwette>identity: in other words, do nothing if the value is a specific symbol, and process otherwise. Why not (cond ((= v specific-symbol) (values)) (else ...)) <rlb>i.e. optional argument not provided <rlb>"bad behavior" to return that of course <rlb>but still, probably want pass-if-equal to notice, for example. <mwette>(where I was sloppy with, e.g., `=') <old>rlb: see (system base types internal) <old>106 (undefined undefined? #b111111111111 #b100100000100) <identity>mwette: i would have to do (let ((v (expression))) (unless (= v 'symbol) (use v))) which seems pretty long-winded compared to (cond ((expression) => use)) <old>114 (unspecified #f #b111111111111 #b100000000100) <rlb>It only happens when run via ./check-guile foo.test <rlb>Not via repl, and not vial "guile foo.scm" <rlb>Here's a trivial foo.test that should do it: <old>are you sure it's not because check-guile is using another version of Guile? <rlb> (format (current-error-port) "huh: ~s\n" (equal? 'x (guile-unbound))) <rlb>It supposed to use meta/guile, and I haven't seen anything to suggest it's not. <rlb>And I do see it responding to my changes in libguile/ etc. <old>what's equal? returning? <rlb>Anyway, it's trivial to reproduce here with that paste/patch. <old>I guess that's because you are doing: (equal? 'x) <rlb>i.e. if you apply that patch to main, build, and then run that foo.test, you may see the same thing. <rlb>If not, that'd be interesting too :) <old>but it's normal that it returns #t in that case <old>that's not unexpected. Are you saying this differ when using check-guile ? <rlb>why? It doesn't from the repl. <rlb>i.e unbound shouldn't be equal? to 'x? <old>when do you: (equal? 'x *unbound*) <rlb>oooooooooooooooooooooh. <old>the C version of equal will probably (I haven't check), check if the second argument is unbound <old>now with check-guile, you are probably using the *compiled* variant of equal? which act a bits differently <rlb>It's just *aliasing*. <rlb>Right, well glad I asked, and sorry for the distraction. <old>so: (equal? 'x (guile-unbound)), is effectively the same thing as calling (equal? 'x), on the C side <mwette>identity: It looks like you are trying to bind expression `value'. So => is the way to do that, I guess. <rlb>unbound is not "out of band" :) <rlb>I (obviously) just didn't think about the fact that ended up just manually constructing an "optional variable" that way. Drawback to allowing the "unbound" out into the wild :) <rlb>Heh, suppose we could have pass-if-equal assert that none of the arguments are unbound, but that actually would require leaking the unbound value to the scheme side somehow (if it's not already). <rlb>fwiw, I created this mess because the srfi-13 arg helper MY_VALIDATE_SUBSTRING_SPEC leaves the scheme-side args unbound; it only sets the default values for the c vars, and so a "return start1" when the string-contains "needle" is empty doesn't return 0, it returns unbound, while cstart1 *has* been defaulted to 0 by the macro (obvious, of course, in hindsight). <old>rlb: you can leak unbound as a number if you want <old>use object-address on it <old>now you have a number that is the unbound object <rlb>Not sure whether we want that check --- it's trivial then, I suppose. <rlb>But maybe I'm the only one who's likely to have made this particular mess. <old>well I would avoid messing with object-address and internal C bits if possible :p <rlb>...maybe I'll try it locally and see if any tests fail, just out of curiosity. <sneek>I've been faithfully serving for 13 days <sneek>This system has been up 1 week, 6 days, 9 hours, 43 minutes <rlb>what's the more efficient "bind multiple values" approach these days, or are they all similar? <rlb>nvm, module's already using let-values <rlb>...we have no tests for (scheme base) string-for-each