IRC channel logs

<damo22>wleslie: ^

<damo22>If the NT flag is set and the processor is in IA-32e mode, the IRET instruction causes a general protection exception.

<damo22> IF EIP is not within CS limit THEN #GP(0); FI;

<damo22>so should we be calling cli before iret?

<youpi>we're not supposed to have IF set before we get to iret

<youpi>calling cli would just be papering over the actual bug

<damo22>what about interrupt.S

<damo22>should that be calling cli on every code path?

<damo22>before ret

<damo22>why do we call eoi before the handler?

<youpi>we should be calling cli only when we're at a point where nested interrupts are handled properly

<youpi>we have to call the eoi before calling the pmap update, otherwise if there are two of them quickly, we may miss the second one, and thus miss the second updae

<damo22>but it looks like standard isa/pci interrupts are being handled also after eoi

<youpi>you mean for the non-ipi case?

<damo22>yes

<youpi>I'm talking only about the pmap_update case

<youpi>other ones may have other constraints

<damo22>i know about pmap update case

<damo22>its supposed to be inverted

<damo22>to catch the second update

<youpi>what where?

<damo22>interrupt.S

<youpi>in the pmap_udpate case?

<damo22>no

<damo22>i mean, it is correct currently in the code for pmap_update

<damo22>but im suspicious of non-ipi case

<youpi>see e.g. 362c84a08a1b8f1eb7f9c1c37c6ed7cece348ee4

<youpi>it's all about level-triger vs edge-trigger and all that

<youpi>it's tricky indeed

<damo22>yes

<damo22>i have evidence that iret is being called when IF is set

<damo22>but not always

<damo22>it could be a level/edge thing

<damo22>i think i fixed the fault but now i lost a level interrupt

<damo22>i think the hanging is lost interrupts

<damo22>what happens if you get an interrupt, ack it, then handle it and call cli + iret at the end?

<damo22>i seem to be losing a level interrupt when i do that

<damo22>i think i do this, and then the irq_ack function calls the eoi again

<damo22>how do you stop a level triggered interrupt from continuously interrupting?

<damo22>seems like you have to mask it right away and then ack it, then turn it on

<damo22>ok i wrapped the ioapic access with a simple lock and now its getting stuck! how??

<damo22>lock_data = 1

<damo22>and then it calls simple_lock

<damo22>how do you make a lock that is interrupt safe?

<damo22>ok found something

<damo22> 743 demo 20 0 151524 424 0 R 100.3 0.0 0:28.16 stress

<damo22> 744 demo 20 0 151524 424 0 R 100.0 0.0 0:28.23 stress

<damo22> 745 demo 20 0 151524 424 0 R 98.0 0.0 0:27.70 stress

<damo22>-smp 3

<damo22>thanks for quick review youpi!

<damo22>Starting OpenBSD Secure Shell server: sshd.

<damo22>Kernel General protection trap, eip 0xc100a999, code 0, cr2 c10a90a0

<damo22>Kernel Page fault trap, eip 0xc10a8ee7, code 2, cr2 a0, cr2 c10a90a0

<damo22>is that a double fault?

<solid_black>hello!

<AwesomeAdam54321>hello

<solid_black>I was able to make my (GNU/Linux) system boot again, as you can see

<solid_black>but it took some efforts, and some head-scratching and hair-pulling :)

<solid_black>I don't think I'll ever be booting Windows again on this machine

<AwesomeAdam54321>If that's the case, shouldn't you just wipe it?

<solid_black>well, maybe the Fedora people would update their shim to support nx, then I would be able to boot Windows again without breaking Linux

<solid_black>so 'ever' might have been too strong of a word

<solid_black>anyway, the call wasn't a complete and utter disaster, was it?

<AwesomeAdam54321>no

<solid_black>but it also was a lot less technical than what I expected

<AwesomeAdam54321>Do you think a linux-libre framework shim would be useful for the HURD?

<solid_black>if you mean an upstream-support port of Linux's programming environment into userland, then, as said during the call, I think that would be *great* for the Hurd

<AwesomeAdam54321>yeah that's what I meant, but I'm less clear about what upstream-support refers to

<AwesomeAdam54321>do you mean supported by Linux upstream?

<solid_black>Kent wanted to convince upstream Linux to break out parts of their codebase into libaries that could be used in userland

<solid_black>both run other Linux code -- notably file system implementations -- in userland (seems like they're coming round to the microkernel ideas!), and just because he says some of it, like the hash table and work queues, are very high-quality implementations that could be reused by other non-kernel-related software

<solid_black>yes, it'd be supported by Linux upstream, since Linux itself would also be using these same libraries

<solid_black>s/both run/both to run/

<gnucode>hey friends!

<gnucode>how's everybody's weekend going?

<solid_black>hey gnucode!

<gnucode>solid_black: what's going on brother?

<solid_black>I fixed my system

<gnucode>hahaha!

<gnucode>you used a usb rescue cd? Or you just used a binary editor?

<solid_black>it was actually an UEFI firmware update that Windows decided to install

<gnucode>:)

<solid_black>that apparently breaks the older version of the linux shim that Fedora is using

<gnucode>sounds about right.

<solid_black>and the only way to roll back UEFI firmware seems to be by using fwupd

<gnucode>that's cool.

<solid_black>as in, I couldn't do that from Windows

<gnucode>I just sent you an email by the way. I am trying to summarize our meeting with Kent, and I believe that I lack the technical language to explain it.

<gnucode>oh man. that sounds like a tricky procedure...

<solid_black>right, I was in process of reading youpi's reply and drafting my reply to that

<solid_black>I agree that posting to Phoronix is premature though

<gnucode>solid_black: youpi I will submit to your leadership. If you do not want me to post to phoronix, then I won't. :( But I do think telling people about a potentially cool idea is worth talking about. But I will go ahead and read Samuel's comments, and see if I can put myself and Samuel's shoes. I sure he has good reasons for why we should not yet post to phoronix.

<solid_black>why has literally nobody ever told me that we want epoll upstream?

<youpi>well, the debian packages build failures speak for themselves?

<youpi>I usually assume that whatever one hacks, probably deserves upstreaming

<youpi>it's really rare that something is really useful only for its author

<solid_black>speaking of Debian paclage failures, could you please look into the gtk4 build failing?

<solid_black>Debian GNU/Linux is already on 4.12, and GNU/Hurd is stuck on like 4.8 I think

<youpi>I just don't have time

<youpi>people have to understand that

<youpi>I.can't.be.the.one.who.fixes.everything

<solid_black>I understand :|

<solid_black>well then, which ones of my other 5 billion Hurd-related projetcs you want upstream? :)

<youpi>well, all?

<youpi>but by order of priority, of course

<solid_black>GHurdFileMonitor and epoll would be the most important ones then?

<youpi>for the time being, we can apparently afford not having the filemonitor yet

<youpi>but pipewire is become a concern for building packages, yes

<youpi>becoming*

<youpi>sometimes I can disable a build-dep to skip the issue, but that's not sustainable

<solid_black>do mutter and the like hard-require pipewire these days?

<solid_black>or what is it that needs it?

<youpi>I don't remember which package I recently had to disable the build-dep

<youpi>but I see it pop up more and more

<youpi>(to my delight, actually, considering how pulseaudio is really not a good stack)

<youpi>openal-soft actually

<solid_black>ACTION writes an on-list reply

<youpi>which is a build-dep for various stuff and eventually for things like ffmpeg

<youpi>and from there, basically the whole world

<damo22>i think there is a subtle bug with smp, such that we are losing level triggered interrupts sometimes

<damo22>i am pretty sure that is causing the hangs

<damo22>not IPI related

<damo22>and i think the problem occurs when we get an interrupt from within an interrupt

<damo22>if i only call ioapic_irq_eoi() during irq_ack, i get a stuck level interrupt:

<damo22>ISR 58(level)

<damo22>IRR 48 58(level)

<damo22>i think the problem is, 58(level) triggers, and then i get a clock interrupt and 58(level) somehow triggers again

<damo22>but for the same event

<damo22>do you have to mask before handling?

<youpi>usually you want to mask, yes

<youpi>so that if the interrupt raises again, you don't nest when re-entering

<damo22>how do you call a static inline function from asm?

<youpi>you don't

<damo22>the assembler gets confused if you include it from a header

<youpi>asm can't grok C

<damo22>bah

<youpi>that's why you have #ifdef __ASSEMBLER__ around C code in headers

<youpi>that's also why in various pieces of software we use extern inlines, and compile an extern instance

<damo22>im getting closer, but no cigar

<damo22>$ nproc

<damo22>6

<damo22>EIP c100a613(iret) EFLAGS 00010286

<damo22>interesting, i got an actual stack overflow

<damo22>(gdb) l *0xc100a619

<damo22>0xc100a619 is at ../i386/i386/locore.S:761.

<damo22>760 stack_overflowed:

<damo22>761 ud2

<damo22>Kernel General protection trap, eip 0xc100a619, code 0, cr2 c10aae70

<damo22>no i didnt

<damo22>source out of sync

<damo22>:(

<damo22>youpi: is there an asm instruction i can use that will break into gdb?

<damo22>the problem is i get a kernel trap and it tries to recover and ruins the original backtrace

<youpi>damo22: just thinking of it: the percpu rework will be useful anyway, so patches are welcome

<youpi>one almost never wants a [NCPUS] array, that's horrible for cache coherency between CPUs :)

<youpi>(exceptions include read-only arrays)

<janneke>grub-install fails on guix, like so:

<janneke>/gnu/store/b0ani8jjgp21qkgr514880081hizyap5-grub-minimal-2.06/sbin/grub-install --no-floppy --target=i386-pc --boot-directory //boot /dev/hd0' exited with status 1; output follows:

<janneke> Installing for i386-pc platform.

<janneke> /gnu/store/b0ani8jjgp21qkgr514880081hizyap5-grub-minimal-2.06/sbin/grub-install: error: cannot find a GRUB drive for part:1:device:hd0. Check your device.map.

<janneke>what is a device.map, where does it live?

<janneke>ideas?

<youpi>there is a section about it in the grub documentation

<youpi>it's in /boot/grub like other grub configuration files

<youpi>normally grub doesn't need it because it can use uuid etc.

<youpi>so it's a bit worrying that it'd need it

<youpi>but you can indeed try to add a line for your mapping between grub and hurd

<janneke>it's not used/needed on debian?

<janneke>hmm, doesn't look like it

<janneke>hmm, running grub-probe on debian also gives that error, whether I use /dev/hd0 or /dev/wd0

<janneke>maybe it's a qemu thing

<janneke>hmm, on debian, fdisk -l says