IRC channel logs
2023-09-30.log
back to list of logs
<damo22>If the NT flag is set and the processor is in IA-32e mode, the IRET instruction causes a general protection exception. <damo22> IF EIP is not within CS limit THEN #GP(0); FI; <damo22>so should we be calling cli before iret? <youpi>we're not supposed to have IF set before we get to iret <youpi>calling cli would just be papering over the actual bug <damo22>should that be calling cli on every code path? <damo22>why do we call eoi before the handler? <youpi>we should be calling cli only when we're at a point where nested interrupts are handled properly <youpi>we have to call the eoi before calling the pmap update, otherwise if there are two of them quickly, we may miss the second one, and thus miss the second updae <damo22>but it looks like standard isa/pci interrupts are being handled also after eoi <youpi>you mean for the non-ipi case? <youpi>I'm talking only about the pmap_update case <youpi>other ones may have other constraints <damo22>i mean, it is correct currently in the code for pmap_update <damo22>but im suspicious of non-ipi case <youpi>see e.g. 362c84a08a1b8f1eb7f9c1c37c6ed7cece348ee4 <youpi>it's all about level-triger vs edge-trigger and all that <damo22>i have evidence that iret is being called when IF is set <damo22>i think i fixed the fault but now i lost a level interrupt <damo22>i think the hanging is lost interrupts <damo22>what happens if you get an interrupt, ack it, then handle it and call cli + iret at the end? <damo22>i seem to be losing a level interrupt when i do that <damo22>i think i do this, and then the irq_ack function calls the eoi again <damo22>how do you stop a level triggered interrupt from continuously interrupting? <damo22>seems like you have to mask it right away and then ack it, then turn it on <damo22>ok i wrapped the ioapic access with a simple lock and now its getting stuck! how?? <damo22>how do you make a lock that is interrupt safe? <damo22> 743 demo 20 0 151524 424 0 R 100.3 0.0 0:28.16 stress <damo22> 744 demo 20 0 151524 424 0 R 100.0 0.0 0:28.23 stress <damo22> 745 demo 20 0 151524 424 0 R 98.0 0.0 0:27.70 stress <damo22>Starting OpenBSD Secure Shell server: sshd. <damo22>Kernel General protection trap, eip 0xc100a999, code 0, cr2 c10a90a0 <damo22>Kernel Page fault trap, eip 0xc10a8ee7, code 2, cr2 a0, cr2 c10a90a0 <solid_black>I was able to make my (GNU/Linux) system boot again, as you can see <solid_black>but it took some efforts, and some head-scratching and hair-pulling :) <solid_black>I don't think I'll ever be booting Windows again on this machine <solid_black>well, maybe the Fedora people would update their shim to support nx, then I would be able to boot Windows again without breaking Linux <solid_black>anyway, the call wasn't a complete and utter disaster, was it? <solid_black>but it also was a lot less technical than what I expected <solid_black>if you mean an upstream-support port of Linux's programming environment into userland, then, as said during the call, I think that would be *great* for the Hurd <AwesomeAdam54321>yeah that's what I meant, but I'm less clear about what upstream-support refers to <solid_black>Kent wanted to convince upstream Linux to break out parts of their codebase into libaries that could be used in userland <solid_black>both run other Linux code -- notably file system implementations -- in userland (seems like they're coming round to the microkernel ideas!), and just because he says some of it, like the hash table and work queues, are very high-quality implementations that could be reused by other non-kernel-related software <solid_black>yes, it'd be supported by Linux upstream, since Linux itself would also be using these same libraries <gnucode>solid_black: what's going on brother? <gnucode>you used a usb rescue cd? Or you just used a binary editor? <solid_black>it was actually an UEFI firmware update that Windows decided to install <solid_black>that apparently breaks the older version of the linux shim that Fedora is using <solid_black>and the only way to roll back UEFI firmware seems to be by using fwupd <gnucode>I just sent you an email by the way. I am trying to summarize our meeting with Kent, and I believe that I lack the technical language to explain it. <gnucode>oh man. that sounds like a tricky procedure... <solid_black>right, I was in process of reading youpi's reply and drafting my reply to that <solid_black>I agree that posting to Phoronix is premature though <gnucode>solid_black: youpi I will submit to your leadership. If you do not want me to post to phoronix, then I won't. :( But I do think telling people about a potentially cool idea is worth talking about. But I will go ahead and read Samuel's comments, and see if I can put myself and Samuel's shoes. I sure he has good reasons for why we should not yet post to phoronix. <solid_black>why has literally nobody ever told me that we want epoll upstream? <youpi>well, the debian packages build failures speak for themselves? <youpi>I usually assume that whatever one hacks, probably deserves upstreaming <youpi>it's really rare that something is really useful only for its author <solid_black>speaking of Debian paclage failures, could you please look into the gtk4 build failing? <solid_black>Debian GNU/Linux is already on 4.12, and GNU/Hurd is stuck on like 4.8 I think <youpi>people have to understand that <youpi>I.can't.be.the.one.who.fixes.everything <solid_black>well then, which ones of my other 5 billion Hurd-related projetcs you want upstream? :) <youpi>but by order of priority, of course <solid_black>GHurdFileMonitor and epoll would be the most important ones then? <youpi>for the time being, we can apparently afford not having the filemonitor yet <youpi>but pipewire is become a concern for building packages, yes <youpi>sometimes I can disable a build-dep to skip the issue, but that's not sustainable <solid_black>do mutter and the like hard-require pipewire these days? <youpi>I don't remember which package I recently had to disable the build-dep <youpi>but I see it pop up more and more <youpi>(to my delight, actually, considering how pulseaudio is really not a good stack) <youpi>which is a build-dep for various stuff and eventually for things like ffmpeg <youpi>and from there, basically the whole world <damo22>i think there is a subtle bug with smp, such that we are losing level triggered interrupts sometimes <damo22>i am pretty sure that is causing the hangs <damo22>and i think the problem occurs when we get an interrupt from within an interrupt <damo22>if i only call ioapic_irq_eoi() during irq_ack, i get a stuck level interrupt: <damo22>i think the problem is, 58(level) triggers, and then i get a clock interrupt and 58(level) somehow triggers again <damo22>do you have to mask before handling? <youpi>usually you want to mask, yes <youpi>so that if the interrupt raises again, you don't nest when re-entering <damo22>how do you call a static inline function from asm? <damo22>the assembler gets confused if you include it from a header <youpi>that's why you have #ifdef __ASSEMBLER__ around C code in headers <youpi>that's also why in various pieces of software we use extern inlines, and compile an extern instance <damo22>EIP c100a613(iret) EFLAGS 00010286 <damo22>interesting, i got an actual stack overflow <damo22>0xc100a619 is at ../i386/i386/locore.S:761. <damo22>Kernel General protection trap, eip 0xc100a619, code 0, cr2 c10aae70 <damo22>youpi: is there an asm instruction i can use that will break into gdb? <damo22>the problem is i get a kernel trap and it tries to recover and ruins the original backtrace <youpi>damo22: just thinking of it: the percpu rework will be useful anyway, so patches are welcome <youpi>one almost never wants a [NCPUS] array, that's horrible for cache coherency between CPUs :) <youpi>(exceptions include read-only arrays) <janneke>grub-install fails on guix, like so: <janneke>/gnu/store/b0ani8jjgp21qkgr514880081hizyap5-grub-minimal-2.06/sbin/grub-install --no-floppy --target=i386-pc --boot-directory //boot /dev/hd0' exited with status 1; output follows: <janneke> /gnu/store/b0ani8jjgp21qkgr514880081hizyap5-grub-minimal-2.06/sbin/grub-install: error: cannot find a GRUB drive for part:1:device:hd0. Check your device.map. <janneke>what is a device.map, where does it live? <youpi>there is a section about it in the grub documentation <youpi>it's in /boot/grub like other grub configuration files <youpi>normally grub doesn't need it because it can use uuid etc. <youpi>so it's a bit worrying that it'd need it <youpi>but you can indeed try to add a line for your mapping between grub and hurd <janneke>hmm, running grub-probe on debian also gives that error, whether I use /dev/hd0 or /dev/wd0 <janneke>but on guix. it says: Disk identifier: 0x00000000 <janneke>grub-probe: warning: the device.map entry `hd0,1' is invalid. Ignoring it. <janneke>that's so unlike the documentation... <youpi>definitely sounds like a bug <janneke>possibly the line is ignored without warning, on debian it also works without device.map <youpi>as I mentioned above, normally grub doesn't need it <janneke>yes thanks -- i'll go and try with "noide" on guix too <damo22>youpi: i followed the suggestion in my last email, but i am still getting EIP c100a613 EFLAGS 00010286 <damo22>where that is general protection fault on iret <damo22>how is it possible that IF is set when iret is called? <damo22>clock and ipi are both raised when this is happening <damo22>hmm, i changed something and now i have booted 6 times with only 2 hangs and no faults <solid_black>did you ever figure out how to use qemu's record/replay functionality? <solid_black>I've been using rr in userland a lot lately, it's super useful <solid_black>and I've been thingking we could implement the same for the Hurd (userland) too, if I ever get to rewriting rpctrace <damo22>maybe try to focus on one and complete it <solid_black>at least in Unix userland, it results in SIGTRAP being sent to your process <damo22>i want an asm instruction that i can put in the code that will drop me into gdb when its running <damo22>i dont want it to try to recover from a trap <solid_black>the second thing I would try is just making an infinite loop and then Ctrl-C'ing it once it hangs there <damo22>because the segmentation is different after boot <damo22>so you cant set a breakpoint early <solid_black>for that, we've been doing an infinite loop on a volatile flag <solid_black>that you Ctrl-C, set to true from the debugger, then continue <solid_black>insert this somewhere during boot, perhaps just before jumping to userland <damo22>i need to figure out why it hangs <damo22>like, its waiting for an interrupt to be handled so it can continue, but the interrupt gets lost so it never happens <damo22>a kernel that randomly hangs is pretty useless <damo22>thing is, the machine is totally idle <damo22>its blocking on something and waiting for something patiently <damo22>../i386/intel/pmap.c: In function ‘pmap_whatis’: <damo22>../i386/intel/pmap.c:2635:70: error: ‘l4i’ undeclared (first use in this function); did you mean ‘l3i’? <damo22> 2635 | db_printf("PDE %d %d for pmap %p\n", l4i, l3i, p); <solid_black>fwiw, there was a allegedly compatible reimplementation of gitlab-runner in Python <damo22>hmm, you could run a python gitlabrunner process in hurd? <damo22>does the official one compile on hurd? <damo22>will hurd run comfortably on 1GB memory? <damo22>i can extend zammit.org to 2 cores and run a qemu ci there <damo22>but i dont want to pay for more memory <damo22>does a gitlab runner need to run on a public facing server? <damo22>or can you run it just with outbound access? <damo22>well, if that is the case, i have a spare low powered machine i could install hurd onto <damo22>it could sit there all day doing ci <solid_black>so would you be able to offer your runner to glib/gtk people? <damo22>i can attach another disk so that wont be a problem if it boots <damo22>so long as the net traffic is low <solid_black>gitlab-runner is even in Debian, but of course apt complains about git/git-man again <solid_black>ah, that's because apt doesn't know I have git installed <solid_black>I have installed gitlab-runner from Debian, registered it with my GitLab instance, configured a CI job to run on it, and it just worked!