IRC channel logs

<junlingm>youpi: I intend to use the reply port of the client side device_open/read/write calls to identify userland programs. I think a userland program uses a unique port for all the reply port for client side device_* calls. So if a device opens two irq devices, htere would be two user_intr_t with the same dst_port (storing the reply port), so search_intr cannot distinguish them just from the port. We need either th device_t or the interrupt id.

<youpi>if you call device_open() twice, you get two different ports

<youpi>don't you?

<junlingm>you get two different device ports. But the same device port is used for all client programs opening the same irq device.

<junlingm>so device ports cannot be used to identify userland callers.

<youpi>ah, you mean the same mach_device_t

<junlingm>yes, and the device_t is referenced in the mach_device_t, so there is a unique device_port per device, am I right?

<junlingm>device port, i.e., device_t

<youpi>I have to say that this is bringing a mess

<youpi>how do you then iterate over user processes which need a notification?

<junlingm>each device_open/read/write call from userland must provide a reply port for receiving results in case the call is blocking. We use that reply port to identify whcih program called.

<youpi>err, but then that requires programs to have already called device_read when the interrupt comes

<youpi>I wasn't understanding why your patch, now I start understanding, and I don't like that

<youpi>if userland applications are not actually getting a separate port *from the point of view of the kernel*, the device_read() approach looks hairy to me

<junlingm>we can register a notification from device_open if we want, but even then, if there is no read, there is no request.

<youpi>and I'd rather use a separate set of IPC (which doesn't have to belong to device.defs, though)

<youpi>when you register for notification, you don't need to make requests

<youpi>that's the point

<youpi>userland provides the port that serves as identifier for thenotification

<junlingm>Yeah. That is probably a simpler solution: just move device_intr_register/ack to something like irq_register/ack, and move the routines to something like irq.defs

<junlingm>another way to to do the read/write would be requesting a exclusive read/write from the kernel, and do the multiplexing from /dev/irq* nodes. But that may be an overkill, as uyou pointed out earlier.

<jrtc27>isn't irq.defs what we were proposing right at the start?..

<junlingm>jrtc27: I wouldn't know. I was not in the discussion.

<youpi>jrtc27: that was an alternative

<youpi>if device_read/write() could be done, it would have been nicer

<youpi>(done in a simple way, I mean)

<junlingm>it actually simplifies userland code by simply doing a while read ... write loop. And it works, as I tried it. The program is that it makes implicit assumptions on the client side behavior that is not documented anywhere.

<junlingm>thwe problem is...

<junlingm>not program.

<youpi>userland drivers are usually not programmed as a while read ... write loop for interrups

<youpi>they get an asynchronous notification, and call something to ack the notification

<junlingm>the notification thread is, I think. We replace message passing loop by a rad/write loop.

<youpi>plus, if your userland program has *several* irqs to monitor, you don't have such a read/write loop

<youpi>yes but then the notification thread is not much more complex than a read/write loop

<youpi>some more code, yes, but in the principle it's really about the same

<jrtc27>(and you _force_ everyone to be linked with pthreads)

<youpi>?

<jrtc27>if you need a thread

<youpi>no, you can receive messages the way you want

<jrtc27>previously you could have used signal handlers + longjmp

<youpi>device_read() vs notification doesn't change that

<jrtc27>now you either have to poll or have a thread

<junlingm>the client side notification is typically done in a thrad, isn't it?

<youpi>jrtc27: ?

<jrtc27>oh, maybe I'm confused

<youpi>you can mach_msg a device_read() the way you want

<youpi>to send the request

<youpi>and get the response later, as you wish

<jrtc27>I thought this was trying to expose it through read(2)/write(2)

<youpi>junlingm: typically, but there's no obligation

<youpi>jrtc27: no, no

<junlingm>I am actually thinking about read(2)/write(2), but that required firesystem part on the hurd side for /dev/irq* nodes

<junlingm>the good thing is that a cleint may simply select and then read/write.

<junlingm>youpi: no that does not require the read to be called before interrupt is sent.

<youpi>how do you know you will have to notify that user?

<youpi>how do you know which user you have notified or not?

<junlingm>If the read is not called, then there is no registration. The first read call registers a notification.

<jrtc27>youpi's point is then you can miss interrupts

<junlingm>the probllem is in the small time intervall between the write (acking) and the read (queuing).

<youpi>and such a problem usually means there's a basic issue in the overall approach

<youpi>"the first read call registers a notification": you mean an actual mach notification or something else? on which port?

<jrtc27>if you want a read/write interface, open to register, close to unregister, read to get the next interrupt and write to ack an interrupt is the only really sane way to do it

<jrtc27>whatever you call those operations

<junlingm>without a first read, there is no registration. If there is a registration, then the client may miss an interrupt, but the interrupt is stored in the user_intr->interrupts, and will be delivered later.

<jrtc27>what do you mean by that second bit?

<jrtc27>does the client miss the interrupt or not?

<youpi>I don't understand why you would need a first read to register anything

<jrtc27>and if so how does the kernel know whether to store it to deliver later or to drop it because the client's never reading it again

<youpi>device_open() should be already doing it, can't it?

<junlingm>the undeliverable interrupts are stored, and when notificastion is possible, it will be delivered.

<junlingm>I belive this is the whole point of user_intr_t's interrupts field.

<youpi>it's the idea yes

<youpi>but then again, how do you know to which user you have delivered an interrupt, and to which you haven't? How do you identify users?

<junlingm>so now the missed ones are still there in interrupts counter.

<youpi>yes, sure

<youpi>we already had the issue

<youpi>but what I'm talking about, is how you make sure that the interrupt gets delivered to all users calling device_read

<junlingm>The read blocks by storing a io_req_t, which we store in suer_intr_t, it behaves like the old dst_port (notification pport).

<youpi>at some point your users won't have called read yet

<youpi>and still you could have an irq getting raised

<youpi>you need to know that you'll have to answer to all uesrs' device_read request()

<youpi>how do you know that?

<junlingm>but to know which user_intr_t to store, we need the client reply port and the interrupt id and the dst_port to match against.

<youpi>how do you know which ones have done such device_read() (and thus you already signalled), and the ones that haven't

<youpi>what port?

<youpi>the reply port of the device_read()?

<junlingm>once we signal, we clear the io_req_t stored in user_intr_t,

<junlingm>yes. that port.

<youpi>but that doesn't necessarily exist for all users

<youpi>how do you identify them then?

<junlingm>they do. They have to provide a port in case the call is blocking.

<youpi>again, you need to notify*all* users when you get an irq

<youpi>not just one

<youpi>so you need to reply to exactly *one* device_read() request from each user, for each irq

<junlingm>each user is independent. We use that reply port to distinguish which user called.

<youpi>that reply port doesn't exist until device_read() is called

<youpi>at some point you'll have users that have already called device_read(), and others that don't, and then you get the irq, what do you do?

<junlingm>that reply port never changes,

<youpi>how do you know that you'll have to reply immediately to those that haven't called device_read() yet?

<youpi>?

<youpi>sure it does

<junlingm>it is not created anew. All device read uses the same port.

<youpi>the user can chose whatever reply port it likes

<youpi>it can be craeted anew

<youpi>quite often glibc caches it, but there is *no* guarantee here

<youpi>(I guess we at last got to the point that you have missed)

<junlingm>yes that is my complain earlier that the code depends on undocumented behavior of device read

<youpi>??

<youpi>which undocumented behavior?

<junlingm>that all device_* supply the same reply port for a client.

<jrtc27>you're thinking the wrong way round

<junlingm>the code means my proposed patch.

<jrtc27>if it's not documented then you can't assume it

<jrtc27>e.g. to take a stupid example you don't document that read(2) always takes the same buffer

<jrtc27>because that's not true

<youpi>it's not undocumented actually

<youpi>it is documented that the reply port can be whatever

<jrtc27>even better :)

<youpi>so you can't assume anything about it

<junlingm>I know. so I think without he help of filesystem, we canot do so. for r4ead(2)/write(2), we havce peropen structs and we can distinguish. For kernel side device_* code, we do not.

<youpi>jrtc27's example is the same

<youpi>if you know MPI, MPI_Send tends to behave the same, and the MPI library does implement optimizations to make it fast, but it still has to cope with changing buffers

<youpi>junlingm: yes, because mach_device_t are singletons actually

*jrtc27 doesn't know the details of MPI and hopes to never get near that kind of thing... :)

<youpi>jrtc27: it's basically just like BSD sockets' send()/recv()

*junlingm hope I never need to deel with MPI in the future, too.

<jrtc27>yeah, but with more magic I assume

<youpi>in high performance computing code, you'll often use the same buffer over and over when doing send()/recv()

<youpi>not much more magic actually

<youpi>you have tags which act like udp/tcp ports

<jrtc27>I see

<youpi>it's very much like udp with message ordering

<junlingm>another aproach I was thinking is that read returns a request ID, and write uses that request id to ack.

<jrtc27>so UDT?

<junlingm>it can be a simple integer.

<youpi>you just don't need to care about IP addresses, nodes are just numbered from 0 to n-1

<youpi>junlingm: that becomes hairy again

<youpi>jrtc27: UDT ?

<jrtc27>UDP-based Data Transfer Protocol

<youpi>well yes

<junlingm>because other programs can blank write and screw legit programs?

<junlingm>blanket write

<youpi>plus you have interoperability on values

<youpi>you can send an array of floats

<youpi>and get an array of floats on the other side, even with differing archs

<jrtc27>ah now there's the magic I was talking about :P

<youpi>yes, but in practice we don't use it :)

<jrtc27>sneaky ser/des stuff in the middle

<youpi>we just send bytes, so the layer doesn't have to care

<youpi>and the layer optimizes for thats

<youpi>so that's what people continue doing

<jrtc27>yeah so long as it's layered properly it's fine you can just bypass the higher-level magic

<junlingm>another approach is to use async open to register, which must provide a meaningful port. But that makes think even more ugly.

<youpi>ah, also, you have easy async requests

<youpi>you can MPI_Irecv sevearl times to get several packets

<youpi>and then MPI_Test or Wait to test/wait when they're finished

<youpi>sort of like the overlapped operations of windows

<youpi>(though I'm unsure if you can actually make several overlapped calls over the same handle, on windows)

<junlingm>youpi: ^^^ use async open to register a unique notification port?

<youpi>there's a mechanism for that: notifications :)

<junlingm>ok irq.defs then.

<junlingm>I guess the making deliver_intr_user a linux ieq handler part may stay?

<youpi>I'm having a look now, yes

<youpi>since we'll want that long-term anyay

<junlingm>I am actually not very familiar with mig. So I prefer to leave the irq.defs change to other more knowledgable people.

***Server sets mode: +nt

<damo22>junlingm: i hope you intend to fix rumpkernel and libddekit once you completely change the irq device handling, i am not understanding what you guys are doing on the ML

<youpi>jrtc27: btw, "only" ~1000 binnmus for the errno_location relocation

<jrtc27>youpi: cool, that's not so bad

<youpi>yeah

<junlingm>damo22: we decided not to change the irq handling. Maybe rename device_intr_ack/register to another file.

<junlingm>I am working on decouple irqdev handling and linux irq handling, but for now, that would not change user land interfacce.

IRC channel logs

2020-08-05.log