IRC channel logs

2024-04-19.log

back to list of logs

<rlb>wingo: if I were to poke at some flavor of "stricter loading", any feeling about how much we must consider the current behavior wrt backward compatibility, i.e. on the spectrum from "fine to just be stricter" to opt-in only, until the next X release, etc.? And no worries if you don't have any strong inclinations yet.
<abcdw>sneek: later tell dthompson can you elaborate on module introspection api, please. The one I'm aware of won't show not yet loaded modules (they still present on load path).
<sneek>Okay.
<abcdw>sneek: botsnack
<sneek>:)
<rlb>abcdw: guile doesn't know anything about not yet loaded modules -- it only knows about a module after it looks around, finds, and successfully loads a file containing one.
<rlb>abcdw: I imagine dthompson was just talking about looking around at what's already been loaded via the introspection functions.
<rlb>ACTION guesses
<wingo>rlb: i think there are not too many back-compat worries. i think it's reasonable to go direct to interpreter if autocompilation fails
<tonyg>hm, trouble building guile on osx sonoma
<tonyg>it complains about something pthreadsish
<tonyg>scmsigs.c:305:10: error: expected expression
<tonyg> once = SCM_I_PTHREAD_ONCE_INIT;
<tonyg> ^
<tonyg>does anyone here run guile on osx regularly?
<tonyg>i wonder if i might have found some issue around `atomic-box-swap!` (but not CAS)
<tonyg>... on osx M3 hardware, and the issue doesn't manifest on e.g. AMD x86_64 linux
<tonyg>would anyone be able to have a go at running t5.scm from https://gist.github.com/tonyg/da4419aa1f0b9c5e48902e7218aed097 ?
<tonyg>it should just keep on trucking on e.g. x86_64
<tonyg>but I see it fail after anywhere from around half a billion to a few billion increments of the counter on osx sonoma M3
<tonyg>this is guile 3.0.9
<tonyg>(I haven't been able to get guile to build from git for me yet so can't compare there -- this is guile from homebrew)
<tonyg>the maybe-equivalent C program doesn't fail in the same way. ... GC?? :-(
<tonyg>i emailed the bug list. hope that's the right thing to do for things like this
<tonyg>hmm doesn't seem to manifest on M1 running 14.1.1
<tonyg>oh no wait PHEW yes it does
<tonyg>so that's M1 on 14.1.1 and M3 Pro on 14.4.1
<tonyg>ok and a self-built guile also fails on the M3 Pro machine, as does guile-next from aconchillo's homebrew tap
<wingo>tonyg: so. try running with GUILE_JIT_THRESHOLD=-1 in the environment
<wingo>does it still fail?
<tonyg>wingo: hi. cool ok will do
<wingo>that will turn off the jit
<tonyg>haha well it's slower
<wingo>i guess that is good ;)
<tonyg>so far no explosions...
<wingo>would be bad if it were not the case
<tonyg>hehehe
<wingo>i have heard that at some point apple changed the instructions that should be emitted for CAS on their aarch64
<tonyg>notably, using CAS doesn't provoke the issue
<tonyg>the C code that uses atomic_exchange uses SWPAL
<tonyg>still no explosions with the JIT switched off
<wingo>so, on your platform, can you disassemble scm_atomic_box_compare_and_swap_x and identify the subsection that does the CAS?
<tonyg>the CAS? not the plain swap?
<wingo>which one was the issue? CAS or atomic swap?
<tonyg>CAS works fine -- atomic swap, not
<wingo>ah
<wingo>interesting, i have looked into this a couple times focussing on CAS and couldn't figure it out
<wingo>but maybe i was looking in the wrong place
<wingo>in that case disassemble scm_atomic_box_swap_x
<wingo>and compare to libguile/lightening/lightening/aarch64-cpu.c:swap_atomic
<tonyg>ok. (are you able to see the fault reproduce with the little test case I posted?)
<wingo>i don't have an aarch64 machine, sadly
<tonyg>ah ok
<wingo>well. not sadly
<wingo>i just meant to say i don't have one ;)
<tonyg>haha
<Arsen>there's an M1 Darwin box on the cfarm.. no doubt you'd be accepted as the guile maintainer
<tonyg>hm objdump -d on guile doesn't have those symbols... i'll go find the .o files
<wingo>tx but i have enough children for the time being ;)
<tonyg>so the jit looks to be using (I'm shaky on aarch64 instructions) a LDAXR-STLXR pair for plain swap. the C stdatomic.h version uses SWPAL
<wingo>fwiw that test does not fail on x86-64
<tonyg>yes the test keeps running forever on x86_64 afaict
<wingo>tonyg: could be that we should use SWPAL as well
<wingo>do you know what the difference is?
<tonyg>heh not yet
<tonyg>i'm just checking to see if i can disassemble what guile is using in C, one sec
<wingo>i think i stole the ldaxr/stlxr thing from an old presentation of jfbastien, plus disassembling the code that gcc produced
<wingo>but that wasn't for an m1; could be there is some ABI consideration there
<wingo> https://github.com/zephyrproject-rtos/zephyr/issues/32133
<tonyg>i think the c code uses the CASAL instruction too instead of whatever lightening is doing
<tonyg>wingo: that link looks relevant. i wonder if sticking a DMB in there would do the trick
<tonyg>if moving outright to CASAL / SWPAL isn't a starter
<tonyg>... let's see.
<tonyg>harrumph. can't disassemble llvm bitcode apparently because llvm-dis not present in the xcode toolchain i suppose? still relearning the ropes of osx, got a mac just a couple of weeks ago after several years of not having one
<tonyg>where the heck is write_Rd_bitfield coming from
<wingo>sympathies
<wingo>though they are very nice machines, i understand
<tonyg>oh they're amazing
<tonyg>a pale vision of how computing could be
<tonyg>(I say this as a stalwart linux user)
<wingo>:)
<tonyg>really though grep is giving nothing at all for write_Rd_bitfield. maybe it's some dependent library, hmm
<tonyg>OH
<tonyg>Oh ok. i wasn't thinking like a lisp programmer
<tonyg>macro-generating macros lol
<mwette>I have M2 mac w/ macport (gcc, clang, etc) installed. Can I help?
<tonyg>mwette: yay! yes please -- can you try https://gist.github.com/tonyg/da4419aa1f0b9c5e48902e7218aed097#file-t5-scm ?
<tonyg>it should eventually stop producing numbers and yield an error, if the guile jit is enabled
<mwette>running ...
<mwette>up to 7e9
<tonyg>mwette: and then an error? yay, ok so M2 also fails in the same way as M1 and M3
<tonyg>wingo: so I crudely made the JIT emit SWPAL instead of the load/store pair for the plain swap, and it seems so far to be working...
<mwette>oops it has an old guile. wait
<tonyg>mwette: how old?
<tonyg>wingo: I'll try CASAL too
<mwette>2.2.7 no jit - I tought macports was newer
<tonyg>ah ok, weird that it should give an error at all in that case!
<tonyg>oh maybe i misunderstood
<tonyg>were you saying "up to 7e9" *so far* and still going, perhaps?
<mwette>I bet macport didn't dist guile3 because it fails. Let me build
<mwette>yes, so far. no jit. I need to build 3.0.9 on that laptop. (On a x86_64 here).
<tonyg>cool
<wingo>would be happy to patch these :) https://github.com/wingo/fibers/issues/83
<mwette>it does not build: fails on integers.c: SSIZE_MAX not defined
<mwette>Did you build w/ clang?
<tonyg>mwette: I have the xcode command-line tools and homebrew. it's a fairly new machine so not much homebrew installed yet. so i presume it's clang/llvm
<tonyg>actually i know it's llvm, sorry yes, because it is producing bitcode
<mwette>configure failed w/ clang from macports; trying /usr/bin/clang
<mwette>The macport clang configure error was: cannot create executables
<tonyg>so wingo -- insert dog-has-no-idea-what-its-doing meme -- but here's a patch that uses SWPAL and seems to get at least this little test case running https://gist.github.com/tonyg/da4419aa1f0b9c5e48902e7218aed097#file-possible-patch-for-swpal-patch
<tonyg>I tried something similar for CASAL `#define A64_CASAL 0xc8e0fc00` but I messed up the surrounding context management I think
<tonyg>so it didn't work at all, failed in direct, new and exciting ways
<tonyg>OK I am out of time for the afternoon. ... not sure what next steps need to be? I'll stick around in here and ofc there's the email thread i started on the bugs list
<mwette>Mine is still building. I'm trying to generate atomic.s from atomic.c
<mwette>I generated atomic.s from atomic.c for {gcc,clang}x{lto,nolto}. I will post to bug report when that happens. I can post to debbugs once I see a report for this.
<mwette>and t5.scm crashed at 2e7
<tonyg>mwette: great!
<tonyg>ok so here's a suggestion that might form the basis of a fix (cc wingo): https://gist.github.com/tonyg/da4419aa1f0b9c5e48902e7218aed097#file-possible-patch-for-swpal-and-casal-patch
<tonyg>i kinda guessed how it is supposed to work so someone who knows more about lightening than me should probably take over from here
<mwette>tonyg: yw, the odd part is compiler assembly output is all .byte ops: gone are the days of ld,st,mov i guess
<ArneBab_>rlb: regarding loading: in my opinion a clear no-go is anything that would break lilypond.
<ArneBab_>rlb: if a change could break lilypond, the approach needs to be: first get a lilypond released that won’t break after the change, then do the change. Though if the change breaks lilypond, chances are that it will also break other tools that are not public.
<ArneBab_>rlb: as a cautionary tale: the declarative modules by default cost me hours of debugging until I finally found out why my chickadee-REPL didn’t work. You cannot replace bindings in your game when it’s in a declarative module. And it also broke multi-version support for dryads-wake, because the old Guile I had on a laptop broke with #:declarative #f while in the new Guile I couldn’t experiment efficiently on the REPL without that.
<mwette>wingo: 3.0.10, maybe `tmpnam' disabled by default? (currently deprecated, since ~3.0.2, IIRC)
<dthompson>I think that would have to be saved for 3.1
<civodul>agreed
<mwette>got it