IRC channel logs
2025-07-19.log
back to list of logs
<damo22>youpi: ok i read your review, it seems the second patch is not needed? <youpi>for level-triggered, we do need to have the io/apic masked before doing the eoi <damo22>but we arent really masking in ioapic_irq_eoi <youpi>we are masking in the device/intr.c management <damo22>i dont understand how the device/intr management flows to eoi <youpi>well, as written in the comment? <youpi>could you re-send your mail to my mail? <youpi>I don't know why I didn't receive it <damo22>if the masking only takes place upon user intr registration, is that enough? <damo22>with PIC, masking takes place wrapped around the eoi? <youpi>for in-kernel handlers, it'll be just like in bsd <youpi>since one eois after the handler, not before <youpi>yes but for edge we don't need to mask, we don't risk getting interrupted again on early eoi <damo22>i think we still have a problem with interrupts after my v3 2/2 patch <damo22>because it doesnt fix the failure to identify on real hw <youpi>are we sure that this identification failure is an interrupt issue? <damo22>i saw the same problem on qemu when i was developing this changeset <damo22>when i put a breakpoint on __disable_irq in gdb it continued <damo22>i might be able to reproduce even now <damo22>Breakpoint 1 at 0xc1025630: file ../i386/i386/irq.c, line 60. <damo22>Thread 1 hit Breakpoint 1, __disable_irq (irq_nr=10) at ../i386/i386/irq.c:60 <damo22>before starting gdb, the boot was stuck with -smp 2 <damo22>i am running gnumach full smp master + reverted slave pset <damo22>then i delete the breakpoints and the boot continues <damo22>does attaching a breakpoint cause it to trap instead of interrupt? <youpi>breakpoints are implemented as traps, yes <damo22>does IF change if a trap is entered? <youpi>really, don't hope for proper exact semantic on this with qemu+gdb <youpi>it's probably not possible for them to get things exactly right <damo22>considering the changes we just made, do we perhaps not need __disable_irq in queue_intr? <youpi>we do, otherwise theinterrupt will raise again on eoi <damo22>but IF is cleared throughout the handler <damo22>could we move the masking to be wrapped just around the eoi, then there isnt such a large window when we can miss interrupts? <youpi>no, for user-level handlers the handler is called in userspace, with IF <youpi>so we *have* to keep the interrupt masked until all userland drivers have coped with the devices raising it <youpi>otherwise it'll just keep re-raising <damo22>what if it the driver expects to reraise it once or twice <youpi>the driver is supposed to cope with anything the board has to say <youpi>and if the board still has stuff to say, unprocessed by the driver, it'll keep the line up <damo22>if we are masking it very early, the line can be high but wont raise an interrupt until it is unmasked <damo22>maybe the line will go down before it has a chance to raise <youpi>if the device has something to say, it'll have the line up <youpi>that will trigger an interrupt *anyway* <youpi>(IF is cleared way before that, on interrupt entry) <damo22>so the IF is clear while __disable_irq is called therefore the cpu does not respond to any interrupts or get notified of irqN even if the line is bouncing like crazy so we can miss some <youpi>interrupts are not something that one has to count <youpi>the interrupt handlers are not supposed to be called as many times as the board raised it <youpi>they are only supposed to be called at least once after a board raised it <damo22>my point is, if a particular irq is masked out, cpu can not see it being raised while that is the case so how can it respond at least once to that one? <youpi>because we eventually unmask <youpi>and then the interrupt raises <youpi>a board *keeps* the line up until it is handled <youpi>that's very different from edge, for which you could miss indeed <youpi>and thus want to eoi before handling to avoid missing new interrupts <damo22>could we call handler -> eoi -> call handler again for level triggered ones? <youpi>eoi will raise the interrupt, if there is one <damo22>somehow we are missing interrupts <youpi>(not sure which "handler" you are talking about, a kernel or user one, I was assuming kernel above) <damo22>rumpdisk hangs during probe of ahcisata <damo22>when i put breakpoint on __disable_irq and step through that, it recovers <damo22>probably because i gave it a chance to reraise the intr <youpi>does the probing need several interrupts to happen? <youpi>is the code that produce the second interrupt in the interrupt handler, or something else? <youpi>does the second interrupt not happen for sure? (breakpoint on __disable_irq could simply have hidden a race in code that is completely unrelated to the interrupt mechanism) <youpi>did you also try with an smp kernel but just one core? <youpi>so it's most probably a race condition when the userland handler runs on an AP <damo22>pci interrupts are only raising on BSP, can the userland handler get scheduled onto an AP? <youpi>it's just a userland thread receiving an RPC <damo22>could it be a synchronisation issue in device/intr.c? <youpi>or in the userland driver itself <damo22>i dont see how that could be the case, if its only running on one core it shouldnt matter which one <youpi>but you can have concurrency between the driver running on an AP, and interrupt management running on the BSP <damo22>yes, that is what i meant, maybe we need to wait somewhere in intr.c? <youpi>does rump really use only one thread at all? <youpi>afaik it uses at least a separate thread for intr processing <damo22>it has many threads but only knows about one kernel virtual cpu <youpi>if it means it assumes that there's no concurrency, that's wrong <damo22>i think it means it does locking and threading differently <youpi>and thus races are only to be expected <damo22>RUMP_LOCKS_UP=1 at compile time would make it assume less concurrency but we dont do that <damo22>i believe we are using smp version of locking in rump but only consuming one core at a time <youpi>if rump has several threads, it'll use more than one core <damo22>from what i can tell, we dont let multiple interrupts raise at all until each one is handled <youpi>my point is that the kernel interrupt handling code runs on the BSP, while the rump intr thread, and the other rump threads may run concurrently on APs <damo22>can we mitigate that in gnumach alone? <youpi>races in rump are races in rump <youpi>unless you force it to run on just one core, its threads will spread <damo22>maybe we could have a way to pin tasks to single cores <youpi>pthread_setaffinity_np is generally useful yes <azert>saw Milos work on the mailing list, dunno if he is here? <azert>I think that it is nice to put journaling support into libdiskfs. But the journal filesystem itself should be a separates component <azert>since its format most commonly changes <azert>I’m not even sure that the way you plug a specific journal format to a specific filesystem such as ext3 can be abstracted away, and is not fs specific. Would be nice to hear the opinion of an expert <azert>ext3 for instance has two version of formats for its journal <azert>if we end up using a “Hurd” journal format, you still want to be eventually able to mount and use Linux ext3 filesystems