IRC channel logs

2023-09-30.log

back to list of logs

<damo22>wleslie: ^
<damo22>If the NT flag is set and the processor is in IA-32e mode, the IRET instruction causes a general protection exception.
<damo22> IF EIP is not within CS limit THEN #GP(0); FI;
<damo22>so should we be calling cli before iret?
<youpi>we're not supposed to have IF set before we get to iret
<youpi>calling cli would just be papering over the actual bug
<damo22>what about interrupt.S
<damo22>should that be calling cli on every code path?
<damo22>before ret
<damo22>why do we call eoi before the handler?
<youpi>we should be calling cli only when we're at a point where nested interrupts are handled properly
<youpi>we have to call the eoi before calling the pmap update, otherwise if there are two of them quickly, we may miss the second one, and thus miss the second updae
<damo22>but it looks like standard isa/pci interrupts are being handled also after eoi
<youpi>you mean for the non-ipi case?
<damo22>yes
<youpi>I'm talking only about the pmap_update case
<youpi>other ones may have other constraints
<damo22>i know about pmap update case
<damo22>its supposed to be inverted
<damo22>to catch the second update
<youpi>what where?
<damo22>interrupt.S
<youpi>in the pmap_udpate case?
<damo22>no
<damo22>i mean, it is correct currently in the code for pmap_update
<damo22>but im suspicious of non-ipi case
<youpi>see e.g. 362c84a08a1b8f1eb7f9c1c37c6ed7cece348ee4
<youpi>it's all about level-triger vs edge-trigger and all that
<youpi>it's tricky indeed
<damo22>yes
<damo22>i have evidence that iret is being called when IF is set
<damo22>but not always
<damo22>it could be a level/edge thing
<damo22>i think i fixed the fault but now i lost a level interrupt
<damo22>i think the hanging is lost interrupts
<damo22>what happens if you get an interrupt, ack it, then handle it and call cli + iret at the end?
<damo22>i seem to be losing a level interrupt when i do that
<damo22>i think i do this, and then the irq_ack function calls the eoi again
<damo22>how do you stop a level triggered interrupt from continuously interrupting?
<damo22>seems like you have to mask it right away and then ack it, then turn it on
<damo22>ok i wrapped the ioapic access with a simple lock and now its getting stuck! how??
<damo22>lock_data = 1
<damo22>and then it calls simple_lock
<damo22>how do you make a lock that is interrupt safe?
<damo22>ok found something
<damo22> 743 demo 20 0 151524 424 0 R 100.3 0.0 0:28.16 stress
<damo22> 744 demo 20 0 151524 424 0 R 100.0 0.0 0:28.23 stress
<damo22> 745 demo 20 0 151524 424 0 R 98.0 0.0 0:27.70 stress
<damo22>-smp 3
<damo22>thanks for quick review youpi!
<damo22>Starting OpenBSD Secure Shell server: sshd.
<damo22>Kernel General protection trap, eip 0xc100a999, code 0, cr2 c10a90a0
<damo22>Kernel Page fault trap, eip 0xc10a8ee7, code 2, cr2 a0, cr2 c10a90a0
<damo22>is that a double fault?
<solid_black>hello!
<AwesomeAdam54321>hello
<solid_black>I was able to make my (GNU/Linux) system boot again, as you can see
<solid_black>but it took some efforts, and some head-scratching and hair-pulling :)
<solid_black>I don't think I'll ever be booting Windows again on this machine
<AwesomeAdam54321>If that's the case, shouldn't you just wipe it?
<solid_black>well, maybe the Fedora people would update their shim to support nx, then I would be able to boot Windows again without breaking Linux
<solid_black>so 'ever' might have been too strong of a word
<solid_black>anyway, the call wasn't a complete and utter disaster, was it?
<AwesomeAdam54321>no
<solid_black>but it also was a lot less technical than what I expected
<AwesomeAdam54321>Do you think a linux-libre framework shim would be useful for the HURD?
<solid_black>if you mean an upstream-support port of Linux's programming environment into userland, then, as said during the call, I think that would be *great* for the Hurd
<AwesomeAdam54321>yeah that's what I meant, but I'm less clear about what upstream-support refers to
<AwesomeAdam54321>do you mean supported by Linux upstream?
<solid_black>Kent wanted to convince upstream Linux to break out parts of their codebase into libaries that could be used in userland
<solid_black>both run other Linux code -- notably file system implementations -- in userland (seems like they're coming round to the microkernel ideas!), and just because he says some of it, like the hash table and work queues, are very high-quality implementations that could be reused by other non-kernel-related software
<solid_black>yes, it'd be supported by Linux upstream, since Linux itself would also be using these same libraries
<solid_black>s/both run/both to run/
<gnucode>hey friends!
<gnucode>how's everybody's weekend going?
<solid_black>hey gnucode!
<gnucode>solid_black: what's going on brother?
<solid_black>I fixed my system
<gnucode>hahaha!
<gnucode>you used a usb rescue cd? Or you just used a binary editor?
<solid_black>it was actually an UEFI firmware update that Windows decided to install
<gnucode>:)
<solid_black>that apparently breaks the older version of the linux shim that Fedora is using
<gnucode>sounds about right.
<solid_black>and the only way to roll back UEFI firmware seems to be by using fwupd
<gnucode>that's cool.
<solid_black>as in, I couldn't do that from Windows
<gnucode>I just sent you an email by the way. I am trying to summarize our meeting with Kent, and I believe that I lack the technical language to explain it.
<gnucode>oh man. that sounds like a tricky procedure...
<solid_black>right, I was in process of reading youpi's reply and drafting my reply to that
<solid_black>I agree that posting to Phoronix is premature though
<gnucode>solid_black: youpi I will submit to your leadership. If you do not want me to post to phoronix, then I won't. :( But I do think telling people about a potentially cool idea is worth talking about. But I will go ahead and read Samuel's comments, and see if I can put myself and Samuel's shoes. I sure he has good reasons for why we should not yet post to phoronix.
<solid_black>why has literally nobody ever told me that we want epoll upstream?
<youpi>well, the debian packages build failures speak for themselves?
<youpi>I usually assume that whatever one hacks, probably deserves upstreaming
<youpi>it's really rare that something is really useful only for its author
<solid_black>speaking of Debian paclage failures, could you please look into the gtk4 build failing?
<solid_black>Debian GNU/Linux is already on 4.12, and GNU/Hurd is stuck on like 4.8 I think
<youpi>I just don't have time
<youpi>people have to understand that
<youpi>I.can't.be.the.one.who.fixes.everything
<solid_black>I understand :|
<solid_black>well then, which ones of my other 5 billion Hurd-related projetcs you want upstream? :)
<youpi>well, all?
<youpi>but by order of priority, of course
<solid_black>GHurdFileMonitor and epoll would be the most important ones then?
<youpi>for the time being, we can apparently afford not having the filemonitor yet
<youpi>but pipewire is become a concern for building packages, yes
<youpi>becoming*
<youpi>sometimes I can disable a build-dep to skip the issue, but that's not sustainable
<solid_black>do mutter and the like hard-require pipewire these days?
<solid_black>or what is it that needs it?
<youpi>I don't remember which package I recently had to disable the build-dep
<youpi>but I see it pop up more and more
<youpi>(to my delight, actually, considering how pulseaudio is really not a good stack)
<youpi>openal-soft actually
<solid_black>ACTION writes an on-list reply
<youpi>which is a build-dep for various stuff and eventually for things like ffmpeg
<youpi>and from there, basically the whole world
<damo22>i think there is a subtle bug with smp, such that we are losing level triggered interrupts sometimes
<damo22>i am pretty sure that is causing the hangs
<damo22>not IPI related
<damo22>and i think the problem occurs when we get an interrupt from within an interrupt
<damo22>if i only call ioapic_irq_eoi() during irq_ack, i get a stuck level interrupt:
<damo22>ISR 58(level)
<damo22>IRR 48 58(level)
<damo22>i think the problem is, 58(level) triggers, and then i get a clock interrupt and 58(level) somehow triggers again
<damo22>but for the same event
<damo22>do you have to mask before handling?
<youpi>usually you want to mask, yes
<youpi>so that if the interrupt raises again, you don't nest when re-entering
<damo22>how do you call a static inline function from asm?
<youpi>you don't
<damo22>the assembler gets confused if you include it from a header
<youpi>asm can't grok C
<damo22>bah
<youpi>that's why you have #ifdef __ASSEMBLER__ around C code in headers
<youpi>that's also why in various pieces of software we use extern inlines, and compile an extern instance
<damo22>im getting closer, but no cigar
<damo22>$ nproc
<damo22>6
<damo22>EIP c100a613(iret) EFLAGS 00010286
<damo22>interesting, i got an actual stack overflow
<damo22>(gdb) l *0xc100a619
<damo22>0xc100a619 is at ../i386/i386/locore.S:761.
<damo22>760 stack_overflowed:
<damo22>761 ud2
<damo22>Kernel General protection trap, eip 0xc100a619, code 0, cr2 c10aae70
<damo22>no i didnt
<damo22>source out of sync
<damo22>:(
<damo22>youpi: is there an asm instruction i can use that will break into gdb?
<damo22>the problem is i get a kernel trap and it tries to recover and ruins the original backtrace
<youpi>damo22: just thinking of it: the percpu rework will be useful anyway, so patches are welcome
<youpi>one almost never wants a [NCPUS] array, that's horrible for cache coherency between CPUs :)
<youpi>(exceptions include read-only arrays)
<janneke>grub-install fails on guix, like so:
<janneke>/gnu/store/b0ani8jjgp21qkgr514880081hizyap5-grub-minimal-2.06/sbin/grub-install --no-floppy --target=i386-pc --boot-directory //boot /dev/hd0' exited with status 1; output follows:
<janneke> Installing for i386-pc platform.
<janneke> /gnu/store/b0ani8jjgp21qkgr514880081hizyap5-grub-minimal-2.06/sbin/grub-install: error: cannot find a GRUB drive for part:1:device:hd0. Check your device.map.
<janneke>what is a device.map, where does it live?
<janneke>ideas?
<youpi>there is a section about it in the grub documentation
<youpi>it's in /boot/grub like other grub configuration files
<youpi>normally grub doesn't need it because it can use uuid etc.
<youpi>so it's a bit worrying that it'd need it
<youpi>but you can indeed try to add a line for your mapping between grub and hurd
<janneke>it's not used/needed on debian?
<janneke>hmm, doesn't look like it
<janneke>hmm, running grub-probe on debian also gives that error, whether I use /dev/hd0 or /dev/wd0
<janneke>maybe it's a qemu thing
<janneke>hmm, on debian, fdisk -l says
<janneke>Disk identifier: 0x00704677
<janneke>but on guix. it says: Disk identifier: 0x00000000
<janneke>on debian:
<janneke>cat /boot/grub/device.map
<janneke>(hd0,1) /dev/wd0s1
<janneke># grub-probe /dev/wd0s1
<janneke>grub-probe: warning: the device.map entry `hd0,1' is invalid. Ignoring it.
<janneke>that's all so weird
<janneke>hmm, device.map seems to want
<janneke>(hd0),2 /dev/wd0s2
<janneke>that's so unlike the documentation...
<youpi>definitely sounds like a bug
<janneke>possibly the line is ignored without warning, on debian it also works without device.map
<youpi>as I mentioned above, normally grub doesn't need it
<janneke>yes thanks -- i'll go and try with "noide" on guix too
<damo22>youpi: i followed the suggestion in my last email, but i am still getting EIP c100a613 EFLAGS 00010286
<damo22>where that is general protection fault on iret
<damo22>how is it possible that IF is set when iret is called?
<damo22>clock and ipi are both raised when this is happening
<damo22>hmm, i changed something and now i have booted 6 times with only 2 hangs and no faults
<solid_black>yay for more 64-bit fixes!
<damo22>boom
<solid_black>hi :)
<damo22>hello
<damo22>smp is almost working
<damo22>it boots but hangs randomly
<solid_black>did you ever figure out how to use qemu's record/replay functionality?
<damo22>never needed it
<solid_black>I've been using rr in userland a lot lately, it's super useful
<solid_black>and I've been thingking we could implement the same for the Hurd (userland) too, if I ever get to rewriting rpctrace
<damo22>you have a lot of projects
<solid_black>I do :|
<damo22>maybe try to focus on one and complete it
<damo22>:D
<damo22>does "int3" break into gdb?
<solid_black>kind of
<solid_black>at least in Unix userland, it results in SIGTRAP being sent to your process
<solid_black>which gdb treats as a breakpoint
<damo22>i want an asm instruction that i can put in the code that will drop me into gdb when its running
<damo22>so i can get a clean backtrace
<solid_black>`Debugger()`?
<damo22>i dont want it to try to recover from a trap
<solid_black>oh, that one drops you into kdb
<solid_black>I mean, int3 would be the first thing I would try
<solid_black>whether or not it works, idk
<damo22>ok
<solid_black>the second thing I would try is just making an infinite loop and then Ctrl-C'ing it once it hangs there
<damo22>yeah
<solid_black>or, can't you set a breakpoint from the gdb side?
<damo22>its difficult to do that
<damo22>because the segmentation is different after boot
<damo22>so you cant set a breakpoint early
<solid_black>??
<solid_black>ah
<damo22>it cant locate the address
<solid_black>for that, we've been doing an infinite loop on a volatile flag
<damo22>?
<solid_black>that you Ctrl-C, set to true from the debugger, then continue
<solid_black>volatile int resume_me = 0; while (!resume_me);
<damo22>haha
<solid_black>insert this somewhere during boot, perhaps just before jumping to userland
<damo22>nice idea
<damo22>i need to figure out why it hangs
<damo22>it could be a lost interrupt
<damo22>like, its waiting for an interrupt to be handled so it can continue, but the interrupt gets lost so it never happens
<damo22>and it just sits idle
<damo22>a kernel that randomly hangs is pretty useless
<damo22>:D
<damo22>thing is, the machine is totally idle
<damo22>its blocking on something and waiting for something patiently
<damo22>but unresponsive
<damo22>i will recompile with kdb
<damo22>../i386/intel/pmap.c: In function ‘pmap_whatis’:
<damo22>../i386/intel/pmap.c:2635:70: error: ‘l4i’ undeclared (first use in this function); did you mean ‘l3i’?
<damo22> 2635 | db_printf("PDE %d %d for pmap %p\n", l4i, l3i, p);
<damo22> | ^~~
<youpi>that's where we need a CI
<youpi>fixed
<damo22>thanks
<solid_black>fwiw, there was a allegedly compatible reimplementation of gitlab-runner in Python
<youpi>was it maintained?
<solid_black>last commit was 3 weeks ago
<solid_black>but I have no idea how well it works, if at all
<solid_black> https://gitlab.com/cunity/gitlab-emulator
<damo22>hmm, you could run a python gitlabrunner process in hurd?
<solid_black>that's the idea, yes
<solid_black>either yes, or getting the official Go one running
<solid_black>s/yes/that/
<damo22>does the official one compile on hurd?
<solid_black> https://gitlab.com/cunity/gitlab-emulator/-/blob/main/emulator/gitlabemu/cirunner/runner.py doesn't look complicated at all
<damo22>will hurd run comfortably on 1GB memory?
<youpi>that's how I test it
<damo22>i can extend zammit.org to 2 cores and run a qemu ci there
<damo22>but i dont want to pay for more memory
<damo22>it will have 1GB
<damo22>does a gitlab runner need to run on a public facing server?
<damo22>or can you run it just with outbound access?
<youpi>I don't remember
<solid_black>you can run it with only outbound access
<damo22>well, if that is the case, i have a spare low powered machine i could install hurd onto
<damo22>it could sit there all day doing ci
<solid_black>so would you be able to offer your runner to glib/gtk people?
<solid_black>this means a few build jobs a day
<damo22>anyone who wants to ci on hurd
<damo22>i can attach another disk so that wont be a problem if it boots
<damo22>s/that/space
<damo22>so long as the net traffic is low
<solid_black>gitlab-runner is even in Debian, but of course apt complains about git/git-man again
<solid_black>ah, that's because apt doesn't know I have git installed
<solid_black>that's broken ext2fs again
<damo22>bbl
<solid_black>it totally just works!
<solid_black>I have installed gitlab-runner from Debian, registered it with my GitLab instance, configured a CI job to run on it, and it just worked!
<adamnr>hi hurd