IRC channel logs

<damo22>so the IF is clear while __disable_irq is called therefore the cpu does not respond to any interrupts or get notified of irqN even if the line is bouncing like crazy so we can miss some

<youpi>that is *fine*

<youpi>interrupts are not something that one has to count

<youpi>the interrupt handlers are not supposed to be called as many times as the board raised it

<damo22>ok

<youpi>they are only supposed to be called at least once after a board raised it

<damo22>my point is, if a particular irq is masked out, cpu can not see it being raised while that is the case so how can it respond at least once to that one?

<youpi>because we eventually unmask

<youpi>and then the interrupt raises

<youpi>a board *keeps* the line up until it is handled

<youpi>so it won't disappear

<damo22>i see

<youpi>that's very different from edge, for which you could miss indeed

<youpi>and thus want to eoi before handling to avoid missing new interrupts

<damo22>could we call handler -> eoi -> call handler again for level triggered ones?

<youpi>what for?

<youpi>eoi will raise the interrupt, if there is one

<damo22>somehow we are missing interrupts

<youpi>(not sure which "handler" you are talking about, a kernel or user one, I was assuming kernel above)

<youpi>in which testcase?

<damo22>-smp 2 with full smp

<youpi>what is the symptom?

<damo22>rumpdisk hangs during probe of ahcisata

<damo22>when i put breakpoint on __disable_irq and step through that, it recovers

<damo22>probably because i gave it a chance to reraise the intr

<youpi>does the probing need several interrupts to happen?

<damo22>yes

<youpi>is the code that produce the second interrupt in the interrupt handler, or something else?

<damo22>i will check

<youpi>does the second interrupt not happen for sure? (breakpoint on __disable_irq could simply have hidden a race in code that is completely unrelated to the interrupt mechanism)

<youpi>did you also try with an smp kernel but just one core?

<damo22>yes, -smp 1 works fine

<youpi>so it's most probably a race condition when the userland handler runs on an AP

<damo22>pci interrupts are only raising on BSP, can the userland handler get scheduled onto an AP?

<youpi>sure

<youpi>it's just a userland thread receiving an RPC

<damo22>could it be a synchronisation issue in device/intr.c?

<youpi>or in the userland driver itself

<damo22>i dont see how that could be the case, if its only running on one core it shouldnt matter which one

<youpi>but you can have concurrency between the driver running on an AP, and interrupt management running on the BSP

<damo22>we use RUMP_NCPU=1

<youpi>what does this actually do?

<damo22>yes, that is what i meant, maybe we need to wait somewhere in intr.c?

<youpi>does rump really use only one thread at all?

<damo22>no

<youpi>afaik it uses at least a separate thread for intr processing

<damo22>it has many threads but only knows about one kernel virtual cpu

<youpi>what does that mean?

<youpi>if it means it assumes that there's no concurrency, that's wrong

<damo22>i think it means it does locking and threading differently

<youpi>then it'll be wrong

<youpi>and thus races are only to be expected

<damo22>RUMP_LOCKS_UP=1 at compile time would make it assume less concurrency but we dont do that

<damo22>i believe we are using smp version of locking in rump but only consuming one core at a time

<youpi>if rump has several threads, it'll use more than one core

<damo22>from what i can tell, we dont let multiple interrupts raise at all until each one is handled

<youpi>we mask them separately

<youpi>but that's unrelated

<youpi>my point is that the kernel interrupt handling code runs on the BSP, while the rump intr thread, and the other rump threads may run concurrently on APs

<damo22>can we mitigate that in gnumach alone?

<youpi>most probably not

<youpi>races in rump are races in rump

<youpi>unless you force it to run on just one core, its threads will spread

<damo22>ok

<damo22>maybe we could have a way to pin tasks to single cores

<youpi>pthread_setaffinity_np is generally useful yes

<damo22>is that implemented in hurd?

<youpi>nope

<azert>saw Milos work on the mailing list, dunno if he is here?

<azert>I think that it is nice to put journaling support into libdiskfs. But the journal filesystem itself should be a separates component

<azert>since its format most commonly changes

<azert>I’m not even sure that the way you plug a specific journal format to a specific filesystem such as ext3 can be abstracted away, and is not fs specific. Would be nice to hear the opinion of an expert

<azert>ext3 for instance has two version of formats for its journal

<azert>if we end up using a “Hurd” journal format, you still want to be eventually able to mount and use Linux ext3 filesystems

IRC channel logs

2025-07-19.log