IRC channel logs

<rlb>Yes, the behavior of SEEK_DATA SEEK_HOLE varies across platforms and filesystems.

<rlb>Not that I learned that the hard way or anything.

<rlb>(If it ends up being important, I can poke around and try to recall some details -- suspect some of it's in the bup "git log", etc.)

<rlb>I seem to recall, for example, sensitivity to data/hole sizes, which isn't all that surprising.

<dsmith>Yeah, related to page size and block size I suspect.

<dsmith>fbsd does not have abstract sockets, so skipping those is appropriate

<dsmith>Should repeated make check runs append to existing *.{log,trs} files?

<dsmith>Nevermind. Even a single run seems to have the same results repeated 4 times. Of course I might be missing something...

<rlb>hmm, not sure - I thought the trs/log files were clobbered, but note that it puts them in a different place for make check vs ./check-guile

<rlb>i.e. I think ./check-guile writes to ./check-guile.log

<dsmith>rlb, If/when you get a chance, check 00-socket.trs Looks like 4 copies of identical results to me.

<rlb>dsmith: did you happen to notice if it was all .trs, or just that one? (I'll take a look later.)

<rlb>It looks like each iteration contains additional lines, i.e. each is a prefix of the next.

<rlb>./check-guile 00-socket.test only has one set of results

<rlb>Also, I have 5 groups, not 4.

<rlb>(oh, nvm regarding "only has one set", was looking at ./check-guile.log, not trs)

<dsmith>rlb, Seems to be only 00-socket.test that is doing it. It's throwing off the final summary I'm pretty sure.

<dsmith>Oh, and that "final summary" thing that started working? It's back to no final summary on an error. What must have happened is a reverted some changes to the script, but it was still using the .go file with the changes...

<rlb>I can isolate it via "./check-guile --trs-file 00-socket.trs 00-socket.test", so suspect it'll be reasonably easy to figure out.

<rlb>...

<dsmith>So with that.. if a .test prevents the final summary, a make -i check keeps going an does produce a summary.

<rlb>I'll have to think harder about that issue more generally, and/or the overall semantics.

<dsmith>So running ./check-guile --.... has a log file with 44 PASS and summary with 44 passing. The trs file summary also says 44 passing, but PASS appears 180 times.

<dsmith>The final make summary is doing a grep for PASS over the .trs files.

<dsmith>(Actually for '^:test-result: PASS' )

<dsmith>Very odd

<dsmith>Not critical. Doesn't hurt anything really. But... odd.

<rlb>Be nice if the harness eventually listed the failed tests at the end.

<rlb>"might be nice"

<rlb>wonder why the "connect abstract" test does a (display "connect abstract")

<rlb>dsmith: git grep primitive-fork-if-available 00-socket.test :)

<rlb>No idea what's going on there yet (just noticed it), but guessing it's involved.

<civodul>rlb: re “fluids are GC’d”, since GC is asynchronous, i think the test should check that fluids are *eventually* GC’d

<civodul>where “eventually” means that it could take some time

<dsmith>rlb, Hah! Was wondering if the thing was somehow running the test results in forks instead of the main process

<dsmith>rlb, https://bpa.st/Y7VA

<dsmith>rlb, Certinaly not the right fix, but does stop the duplication

<dsmith>It might have to do with buffering.. Maybe a flush before the forks?

<dsmith>rlb, This is better methinks: https://bpa.st/JDTA

<dsmith>rlb, The (display "connect abstract") was added in 01b686b701

<dsmith>I suspect the force-output's before every fork was an attempt to fix. Or more likely actually did the job before the .trs support was added...

<dsmith>rlb, And in fact, replacing the force-output's with flush-all-ports as in https://bpa.st/Y4IA also fixes the issue

<dsmith>sneek, seen civodul

<sneek>I think I remember civodul in #guile 3 hours and 36 minutes ago, saying: where “eventually” means that it could take some time.

<dsmith>sneek, guile-software?

<sneek>Someone once said guile-software is http://sph.mn/foreign/guile-software.html Send an email to tantalum, <sph at posteo dot eu> to make changes or post a new project.

<dsmith>*Next* Friday is an important date. All bot output will be piped though a filter. Just for fun.

<civodul>:-)

<dsmith>civodul, In your reply to rlb about "eventually" being GC'd, are you suggesting that a delay/sleep is appropriate?

<civodul>dsmith: i’m suggesting that the test should retry instead of assuming that the thing is GC’d right away, if that’s what it does

<dsmith>ok

<rlb>dsmith: oh, hah, of course -- the buffers. Very glad you already figured that out; I wonder how long it would have taken for that to dawn on me :)

<rlb>dsmith: not that it's important right now, but I also suspect we might be better off without the "00-" hack there to try to ensure that the test runs before any other test that might create threads (don't we create threads at startup regardless now?).

<rlb>...with the parallel test harness it's no longer an issue for "make check", and I suspect we'd be better off otherwise just saying that some test files must be run in a separate process.

<rlb>dsmith: wrt that fluid gc test, interesting that adding a (sleep 0) fixes it, but a (usleep 1000000) does not.

<rlb>If we want those two to be interchangable, they're not atm.

<rlb>(on that front)

<rlb>Hmm, looking at the code, they really should be the same I think. Curious.

<dsmith>rlb, How about two gc's? (gc)(gc)

<rlb>Yeah, I don't understand -- I added an fprintf to both scm_std_sleep and scm_std_usleep (which is effectively all sleep and usleep do), and they both call scm_select with the same args for a 0 sleep, but (sleep 0) causes the test to pass, and (usleep 0) doesn't :/

<rlb>Unless I misread the C code (I guess).

<dsmith>Hmm,

<rlb>I'm assuming the compiler doesn't have differing custom bits for either at higher levels?

<dsmith>Maybe the extra math adds a little bit of something? (I'm assuming there's a bit of calucation...)

<dsmith>About the 00-*.test. Yes, probably not needed. Note that each .test is a different process. That might have been different before.

<rlb>Oh, well even if I set the sleep to 1 second for each, I still get the same result(!?).

<rlb>This is with the test adjusted along the lines civodul suggested, to loop untill the (g) returns a fluid (or 100 iterations).

<rlb>With a one or zero second sleep, it's fine, and succeeds on the first iteration, but with a one or zero second usleep, it never succeeds, even after 100 iterations.

<dsmith>Ok, that is *very* curious

<rlb>yeah, makes no sense at all -- if you're bored, wouldn't mind a double-check (if you didn't already) wrt scmsigs.c sleep/usleep being roughly identical.

<rlb>Looked that way to me.

<dsmith>Yes, looking at it now. Only real difference is the / and %

<rlb>which is in C...

<rlb>And I fprintf'ed the actual tv struct values, and they were both 0 for the 0 case for both sleep and usleep...

<rlb>Of course I can "fix" the test by just using (sleep 0), but that's pretty unsatisfying -- something seems off.

<rlb>Of course if it also only happens on s390x, it's more "esoteric".

<dsmith>Ok. Swiss army chainsaw time. What happens when you run either under strace. With the arg that show how long the syscall took.

<dsmith>Which is the -T option

<dsmith>strace -eselect -T -f -o outfile [the-test]

<dsmith>But this only happens on ibm big iron, right? Maybe something funky in the size of the struct fields? And BE too?

<dsmith>Hmm. But the args are unsigned.. so sign extention funkyness shouldn't be an issue.

<rlb>sure, but still both cases fprintf tv.tv_sec and tv.tv_usec as 0 with %ld (which fprintf complaints said they were), so didn't think it likely they differed, whatever they were.

<rlb>(between sleep 0 and usleep 0)

<dsmith>Well, *something* is different *somewhere*..

<dsmith>There is also a pipe involved.

<rlb>:)

<rlb>I tried the select, but it's odd (or I'm just missing something) -- I don't think I'm seeing any output for the test selects.

<rlb>i.e. like it's not "following"

<rlb>for "select ... ./check-guile fluids.test"

<dsmith>The -f is follow forks

<rlb>right

<rlb>I see some selects before it gets to the test, but not during the test (I think).

<rlb>I added a (sleep 0) (sleep 1) (sleep 2) to the test and I never see anything from strace after it pauses for the first time.

<rlb>(I can tell because of the lower level fprints in scm_std_*sleep.

<rlb>possible there are multiple "selects"?

<rlb>(I haven't used -e a lot)

<rlb>i.e. select_64 or whatever...

<rlb>(internal)

<dsmith>Right. Just thought it would be nice to limit the stuff to wade through

<rlb>sure

<rlb>Hmm, can I disassemble the code to see what at least the byte-compiler's ending up with...

<rlb>(for each case)

<dsmith>Maybe -enet will do

<rlb>It's pselect6 fwiw.

<dsmith>Thanks.

<dsmith>Yeah, on my amd64 Trixie

<dsmith>[pid 3719662] pselect6(4, [3], NULL, NULL, {tv_sec=1, tv_nsec=0}, NULL) = 0 (Timeout) <1.001322>

<dsmith>[pid 3719662] pselect6(4, [3], NULL, NULL, {tv_sec=1, tv_nsec=0}, NULL) = 0 (Timeout) <1.001451>

<rlb>OK, switching it to pselect6 does help narrow it down when leaving off the -o so the fprintfs will be interleaved, making it easier to see what happens where (plus or minus any buffering), and I can see that the select args are identical.

<dsmith>Which is what I would expect

<rlb>for (sleep 0) vs (usleep 0) so the difference is probably "somewhere else"...

<rlb>Though as yet, I have little clue as to what.

<rlb>Hmm, to your point, I guess they might generate slightly different garbage, though.

<rlb>I'll make them the same on that front :)

<dsmith>For the (sleep 0)(usleep 0) case, you said it was hanging?

<rlb>Adding a (sleep 0) works, a (usleep 0) doesn't.

<rlb>i.e. the fluid never appears in the guardian for the latter.

<rlb>Not surprising, but I just changed scm_std_sleep to call scm_std_usleep(x * 1000000), and that doesn't change anything.

<rlb>but reinforces the assumption that it must be "somewhere else"

<dsmith>But it does sleep for the same amount of time?

<dsmith>Not sure I understand. changing sleep to call usleep behaves the same as sleep or as usleep?

<rlb>if the loop has a (sleep 0), then the fluid will show up immediately in the guardian, but never shows up for a (usleep 0) or even a (usleep 1000000)

<dsmith>ok

<dsmith>Are there any other weird things about this arch? Unexplained bugs?

<rlb>not that I know of offhand

<rlb>I may try to see what the compiled code looks like...

<dsmith>Thread local storage?

<dsmith>How about using (yield) instead of *sleep ?

<dsmith>Or just go ahead and use (sleep 0) and call it a day....

<rlb>dsmith: I'd tried yield, and it didn't work.

<rlb>oh.

<rlb>I wonder if it could be just that the final code, jitted or not (hmm, have to check if we jit on s390x) differs in the two cases enough to put the fluid somewhere that the conservative gc mistakes for live somehow...

<rlb>(wild guess)

<rlb>And yes, while this is surprising, if (sleep 0) works for the test "everywhere", then the sleep/usleep issue is separable.

<rlb>disassembled code is also identical (plus or minus the guile-user object pointer)

<rlb>and disabling the jit has no effect

<rlb>dsmith: I have a strong suspicion now.

<rlb>I tried changing scmsigs.c usleep to do exactly the same thing as sleep i.e. scm_from_uint, etc. instead of scm_from_ulong (since 0 works either way), and that fixed the problem. My guess is that it's what I wondered before, i.e. the long C value (same size as a pointer on s390x) is nudging the behavior of the gc somehow.

<rlb>i.e. though I've no idea why that would nudge it wrt the *fluid*'s liveness.

<rlb>In any case, that's weird enough that I'm going to probably set the question aside for now and just use (sleep).

<dsmith>Wow. That's amazing.

<dsmith>Spooky action at a distance indeed.

<rlb>I suppose it could be some other knock-on affect, say gcc optimizes more broadly, differently wrt stacks/registers when that's an int, but just guessing...

<rlb>I'm going to go with (sleep 0), and maybe a (make-string 1000) instead of the make-fluid, for "pressure", and add a comment about the weirdness on s390x, and then call it a day...

<rlb>Thanks again for the help.

<rlb>Oh, and I also tried putting the fluid in the 9000'th slot in a 10k vector that itself was dropped (code structure wise) after the fluid was handed to the guardian (i.e. only bound in a nested ephemeral let), but that didn't help either. Of course I imagine that might be due to optimizations...

<rlb>(We don't appear to need any pressure at all anyway -- the (sleep 0) alone does the job atm.)

IRC channel logs

2025-09-12.log