IRC channel logs

<rlb>Our current utf8_string_hash (of course) decodes as it hashes, but once all strings are utf8, I'm wondering if it's fine to just hash the bytes directly via JENKINS_LOOKUP3_HASHWORD2 (u8, u8_bn, ret). This does produce a difference since the current code seeds the hash with the char count, but JENKINS_LOOKUP3_HASHWORD2 will seed it with the byte count.

<rlb>Hmm, nevermind, that's not quite right.

<rlb>Or rather, more significantly, the thing I need to figure out is what the constraints are for scm_i_utf8_string_hash --- e.g. must its hash of a utf8 encoded SOMETHING match the hash of a UTF-32 encoded SOMETHING, or (as it currently appears) is it only used with reference to utf-8 values.

<rlb>Since the only consumer left in the utf8 branch is from_utf8_symboln, I suspect it's the latter, and so as long as its compatible with scm_i_ascii_string_hash, maybe we *can* just change it to the raw bytes hash (seeded with the byte length).

<rlb>Though that might also introduce a "byte code transition", i.e. if we embed the literal hash values in the .go files, which I vaguely recall we may.

<rlb>(which is fine for the utf8 branch --- already two of those in the series)

<Arsen>seems that guile 3.0.11 tests fail on ZFS

<Arsen>ZoL at least

<Arsen>FAIL: ports.test: SEEK_DATA while in hole - arguments: (expected-value 4096 actual-value 10)

<Arsen>FAIL: ports.test: SEEK_HOLE while in hole - arguments: (expected-value 10 actual-value 4100)

<rlb>Arsen: yeah; I have some likely fixes that I should get back to --- they avoid the test on filesystems that don't have the expected behavior.

<rlb>For now, you can of course just skip that test on zfs (comment it out, or whatever).

<rlb>Sparse-file related behaviors "vary".

<jcowan>Are there even any guarantees about sparse files? My impression has always been that they are an under-the-table optimization that may or may not exist.

<Arsen>rlb: thanks

<Arsen>I just reran the tests on tmpfs

<Arsen>that worked

<rlb>jcowan: I don't know of any offhand, and I've really just dealt with observable behaviors. One way of course is to just test the target fs before the actual test (using "independent" methods). But I've also adjusted tests for specific filesystems for potentially surprising behaviors: https://codeberg.org/bup/bup/src/branch/main/dev/sparse-size#L41-L48

<rlb>(For guile I've ported the path-fs helper mentioned there to scheme as part of the fix I was considering.)

<rlb>Arsen: great

<jcowan>I vaguely remember that odbm depended on them, but that's pretty dead; I see that HFS+ never supported them.

<rlb>I think they're often assumed, and iirc zfs does "support" sparseness, and so does btrfs, it's just that their semantics are different (and that "info delay" is part of it for zfs).

<rlb>well, by assumed I just meant that a lot of image-related tooling probably expects the fs to support sparseness and so doesn't worry about creating giant empty images...

<rlb>but dunno

<rlb>(granularities may also vary widly iirc)

<jcowan>Fair point. You certainly don't want a 100GB virtual disk with only a few GB of data in it to take up 100 GB of actual disk.

<jcowan>Although if I were designing such a thing I'd do my own block remapping rather than relying on sparseness.

<jcowan>(add a few zeros to that remark nowadays)

<rlb>Sure, you'd typically use say qcow, or whatever, depending on the tool, but images are LCD...

<rlb>:)

<jcowan>What is LCD?

<rlb>sorry, Lowest Common Denominator.

<jcowan>Ah.

<rlb>Or Least

<jcowan>We now live in a world where 128TB disks exist, although there is no MSRP for them

<jcowan>I guess you have to negotiate each deal separately

<rlb>"if you have to ask"

<jcowan>That too.

<jcowan>I have a few times eaten in restaurants where the menus had no prices, although that was just swank -- no actual bargaining was involved

<jcowan>(Apparently ndbm expects sparseness too. Not that these ancient libraries wouldn't *work* on non-sparse filesystems, just that they would gobble up a lot of space.)

<rlb>right -- create a sparse file, mmap it rw and go.

<rlb>(as an option)

<jcowan>That's what LMDB does in a more sophisticated way. Its main drawback is that on 32-bit systems it severely limits the size of your file.

<jcowan>In my career there have been been many flipflops between whether primary or secondary storage is larger.

<jcowan>It would not be easy to build a machine with 2^48 bytes of memory, whether RAM or disk.

<dsmith>sneek, botsnack

<sneek>:)

<dsmith>!uptime

<sneek>I've been faithfully serving for 21 days

<sneek>This system has been up 3 weeks, 22 hours, 13 minutes

IRC channel logs

2026-04-05.log