IRC channel logs

<old>soo apparently hashing of bytevector is just yielding the pointer to that value and not hashing the content eh

<rlb>there are some other hash "questions" I've noticed, and I think made notes to return to...

<rlb>srfi-152 tests found some problems in the utf8 branch

<rlb>(but not pushed)

<rlb>ACTION thanks jcowan

<rlb>jcowan: one minor thing I hit (and may have been intentional) is that I had the test env be just scheme base (not (guile)), and it turned out that one of the make-string invocations in there expects rnrs make-string, not srfi-152 make-string (i.e. provides only one arg).

<rlb>I just changed it to (make string x #\0) for now.

<rlb>old: and iirc the utf8 branch currently changes the string hash in a way that may need deliberation.

<old>logically, you should just hash the underlying byte in the buffer + the encoding type ?

<rlb>I'm not sure we ever included the type.

<rlb>fwiw, the current utf8 flavor: https://codeberg.org/rlb/guile/src/commit/fea4d06abec358aed684f0d360f454de0c3f930e/libguile/hash.c#L124-L159

<old>I guess we should. Hashing bytes content that have semantic ought to yield different thing no?

<old>different semantic*

<rlb>Sure, perhaps, there's very likely still aliasing potential though, I'd guess.

<rlb>can also include the size if we like

<jcowan>What hash function are you using?

<rlb> https://codeberg.org/guile/guile/src/commit/5c9a3e931c00d7a0582529a2b047038c5cc79c6c/libguile/hash.c#L55

<jcowan>That's kind of old. You might want to switch to xxHash (2023), or if you think that's too recent, FarmHash (2014).

<graywolf>Hi! :) Is it possible to get a value of binding during a expansion? For datums, there is syntax->datum, but I need the actual bound value of an identifier.

<mwette>did you try module-reflection? (module-ref (current-module) (syntax->datum sym))

<graywolf>that will not work if the symbol is bound via (let), e.g. in top level (let ((for-type <foo>)) (define-xx for-type))

<mwette>Then what you are after sounds a bit hokey, because it may not be "live". You are in expand mode.

<mwette>Usually, if you want bindings available for macro expansion you need then inside an eval-when form.

<mwette>them

<old>jcowan: How about MurmurHash3 >?

<old>that's usually my goto

<jcowan>old: That's fine too

<graywolf>mwette: hm; that seems to imply that that it is not really possible to write a syntax form that would, in my case, define additional binding based on fields of records. I wanted to use record-type-fields to get the list, but now that does not seem possible

<JohnCowan>old: Unfortunately if you want pure Scheme and reasonable efficiency you haveneed

<JohnCowan>efficiency, you needto stick to 3w

<JohnCowan>32-bitalgos only

<JohnCowan> $#@& keyboard

<mwette>graywolf: If you need to know the (record type) value of a variable in a let binding outside the context of the macro, seems undoable to me (right now).

<rlb>one things we'll want to consider with the utf8 changes is sharing. We don't make promises there, but in the current docs we do indicate that operations often return shared strings. Though I think I've probably been leaning toward not sharing in a lot of the new algorithms.

<jcowan>If you are going to support it at all, I'd look to SRFI 13's guidelines; they are well thought out.

<rlb>And I wonder if for say (string-split s ...) you typically want every returned part to hold on to the entire original source. I guess as long as you know, you can break the connection via subsequent substring/copy, but...

<old>JohnCowan: I would keep the hashing in C

<old>my testing shows that even with very aggresive optimization and JIT, I get about 30 Mib/s top for Scheme sha256sum

<old>vs 500 for C version

<old>probably due to SIMD

<old>wonder if Common Lisp is doing well in that respect

<rlb>I hadn't really been thinking about it, and I can change the algorithms, with some effort if/when we know what we want in each case.

<rlb>...I'm now starting to lean toward thinking utf8 should be sharing more often.

<jcowan>Java originally always held on to the entire string, but it became a memory leak: people would read in whole files and then pick out a particular piece to work on.

<rlb>yep --- and with utf8, for the C-implemented functions, we're always dealing with valid, utf8 pointers, so the copying, etc. is fast (memcpy). Then it's mostly "just" the allocation cost, memory pressure/use, etc.

<rlb>This came up because right now I was implementing string-split and needed to pick.

<rlb>i.e. (string-split giant ":" #f 1)

<rlb>(where the first colon comes early)

<rlb>and you only care about the first bit

<rlb>But current string-split shares, and our general string docs suggest you should expect that.

<rlb> https://www.gnu.org/software/guile/manual/html_node/Strings.html

<rlb>jcowan: am I right to read the docs as (string-split "" "" 'prefix/suffix) -> '()

<jcowan>I don't know. SRFI 152 is silent about splitting on an empty delimiter. But I think you are probably right.

<rlb>Right, seems like it clearly specifies for 'infix and 'strict-infix i.e. when the string is empty, just doesn't say anything about empty string and prefix/suffix.

<rlb>(I assumed that the empty string prescription obviated the delimiter.)

<rlb>(for the two specified cases)

<rlb>For larger delimiters for (current) string-split substring/sharing becomes more expensive because we use memmem, and that of course doesn't track character offsets, so we have to scan for that separately after a match.

<rlb>no, wait, I have to do that anyway

<rlb>sharing or not (same as for string-contains)

<rlb>nvm

<rlb>jcowan: how about (string-split "::" ":" 'prefix). Does that produce '("" "")? i.e. do 'prefix and 'suffix only suppress one leading or trailing empty string?

<rlb>ACTION also plans to codify all this in our tests...

<jcowan>I'd expect so. The idea of 'prefix is that when you split /usr/bin/bash on "/", you get ("usr" "bin" "bash") and not ("" "usr" "bin" "bash")

<rlb>OK, thanks, and just remembered that I could/should have also checked the reference implementation...

<rlb>I think we now have (utf8) srfi-152, plus or minus some more tests and whatever fixes that requires.

<jcowan>so the answer to your second question is yes, (string-split "::" ":" 'prefix/suffix) => ("" "")

IRC channel logs

2026-04-25.log