IRC channel logs

2020-08-05.log

back to list of logs

<junlingm>youpi: I intend to use the reply port of the client side device_open/read/write calls to identify userland programs. I think a userland program uses a unique port for all the reply port for client side device_* calls. So if a device opens two irq devices, htere would be two user_intr_t with the same dst_port (storing the reply port), so search_intr cannot distinguish them just from the port. We need either th device_t or the interrupt id.
<youpi>if you call device_open() twice, you get two different ports
<youpi>don't you?
<junlingm>you get two different device ports. But the same device port is used for all client programs opening the same irq device.
<junlingm>so device ports cannot be used to identify userland callers.
<youpi>ah, you mean the same mach_device_t
<junlingm>yes, and the device_t is referenced in the mach_device_t, so there is a unique device_port per device, am I right?
<junlingm>device port, i.e., device_t
<youpi>I have to say that this is bringing a mess
<youpi>how do you then iterate over user processes which need a notification?
<junlingm>each device_open/read/write call from userland must provide a reply port for receiving results in case the call is blocking. We use that reply port to identify whcih program called.
<youpi>err, but then that requires programs to have already called device_read when the interrupt comes
<youpi>I wasn't understanding why your patch, now I start understanding, and I don't like that
<youpi>if userland applications are not actually getting a separate port *from the point of view of the kernel*, the device_read() approach looks hairy to me
<junlingm>we can register a notification from device_open if we want, but even then, if there is no read, there is no request.
<youpi>and I'd rather use a separate set of IPC (which doesn't have to belong to device.defs, though)
<youpi>when you register for notification, you don't need to make requests
<youpi>that's the point
<youpi>userland provides the port that serves as identifier for thenotification
<junlingm>Yeah. That is probably a simpler solution: just move device_intr_register/ack to something like irq_register/ack, and move the routines to something like irq.defs
<junlingm>another way to to do the read/write would be requesting a exclusive read/write from the kernel, and do the multiplexing from /dev/irq* nodes. But that may be an overkill, as uyou pointed out earlier.
<jrtc27>isn't irq.defs what we were proposing right at the start?..
<junlingm>jrtc27: I wouldn't know. I was not in the discussion.
<youpi>jrtc27: that was an alternative
<youpi>if device_read/write() could be done, it would have been nicer
<youpi>(done in a simple way, I mean)
<junlingm>it actually simplifies userland code by simply doing a while read ... write loop. And it works, as I tried it. The program is that it makes implicit assumptions on the client side behavior that is not documented anywhere.
<junlingm>thwe problem is...
<junlingm>not program.
<youpi>userland drivers are usually not programmed as a while read ... write loop for interrups
<youpi>they get an asynchronous notification, and call something to ack the notification
<junlingm>the notification thread is, I think. We replace message passing loop by a rad/write loop.
<youpi>plus, if your userland program has *several* irqs to monitor, you don't have such a read/write loop
<youpi>yes but then the notification thread is not much more complex than a read/write loop
<youpi>some more code, yes, but in the principle it's really about the same
<jrtc27>(and you _force_ everyone to be linked with pthreads)
<youpi>?
<jrtc27>if you need a thread
<youpi>no, you can receive messages the way you want
<jrtc27>previously you could have used signal handlers + longjmp
<youpi>device_read() vs notification doesn't change that
<jrtc27>now you either have to poll or have a thread
<junlingm>the client side notification is typically done in a thrad, isn't it?
<youpi>jrtc27: ?
<jrtc27>oh, maybe I'm confused
<youpi>you can mach_msg a device_read() the way you want
<youpi>to send the request
<youpi>and get the response later, as you wish
<jrtc27>I thought this was trying to expose it through read(2)/write(2)
<youpi>junlingm: typically, but there's no obligation
<youpi>jrtc27: no, no
<junlingm>I am actually thinking about read(2)/write(2), but that required firesystem part on the hurd side for /dev/irq* nodes
<junlingm>the good thing is that a cleint may simply select and then read/write.
<junlingm>youpi: no that does not require the read to be called before interrupt is sent.
<youpi>how do you know you will have to notify that user?
<youpi>how do you know which user you have notified or not?
<junlingm>If the read is not called, then there is no registration. The first read call registers a notification.
<jrtc27>youpi's point is then you can miss interrupts
<junlingm>the probllem is in the small time intervall between the write (acking) and the read (queuing).
<youpi>and such a problem usually means there's a basic issue in the overall approach
<youpi>"the first read call registers a notification": you mean an actual mach notification or something else? on which port?
<jrtc27>if you want a read/write interface, open to register, close to unregister, read to get the next interrupt and write to ack an interrupt is the only really sane way to do it
<jrtc27>whatever you call those operations
<junlingm>without a first read, there is no registration. If there is a registration, then the client may miss an interrupt, but the interrupt is stored in the user_intr->interrupts, and will be delivered later.
<jrtc27>what do you mean by that second bit?
<jrtc27>does the client miss the interrupt or not?
<youpi>I don't understand why you would need a first read to register anything
<jrtc27>and if so how does the kernel know whether to store it to deliver later or to drop it because the client's never reading it again
<youpi>device_open() should be already doing it, can't it?
<junlingm>the undeliverable interrupts are stored, and when notificastion is possible, it will be delivered.
<junlingm>I belive this is the whole point of user_intr_t's interrupts field.
<youpi>it's the idea yes
<youpi>but then again, how do you know to which user you have delivered an interrupt, and to which you haven't? How do you identify users?
<junlingm>so now the missed ones are still there in interrupts counter.
<youpi>yes, sure
<youpi>we already had the issue
<youpi>but what I'm talking about, is how you make sure that the interrupt gets delivered to all users calling device_read
<junlingm>The read blocks by storing a io_req_t, which we store in suer_intr_t, it behaves like the old dst_port (notification pport).
<youpi>at some point your users won't have called read yet
<youpi>and still you could have an irq getting raised
<youpi>you need to know that you'll have to answer to all uesrs' device_read request()
<youpi>how do you know that?
<junlingm>but to know which user_intr_t to store, we need the client reply port and the interrupt id and the dst_port to match against.
<youpi>how do you know which ones have done such device_read() (and thus you already signalled), and the ones that haven't
<youpi>what port?
<youpi>the reply port of the device_read()?
<junlingm>once we signal, we clear the io_req_t stored in user_intr_t,
<junlingm>yes. that port.
<youpi>but that doesn't necessarily exist for all users
<youpi>how do you identify them then?
<junlingm>they do. They have to provide a port in case the call is blocking.
<youpi>again, you need to notify*all* users when you get an irq
<youpi>not just one
<youpi>so you need to reply to exactly *one* device_read() request from each user, for each irq
<junlingm>each user is independent. We use that reply port to distinguish which user called.
<youpi>that reply port doesn't exist until device_read() is called
<youpi>at some point you'll have users that have already called device_read(), and others that don't, and then you get the irq, what do you do?
<junlingm>that reply port never changes,
<youpi>how do you know that you'll have to reply immediately to those that haven't called device_read() yet?
<youpi>?
<youpi>sure it does
<junlingm>it is not created anew. All device read uses the same port.
<youpi>the user can chose whatever reply port it likes
<youpi>it can be craeted anew
<youpi>quite often glibc caches it, but there is *no* guarantee here
<youpi>(I guess we at last got to the point that you have missed)
<junlingm>yes that is my complain earlier that the code depends on undocumented behavior of device read
<youpi>??
<youpi>which undocumented behavior?
<junlingm>that all device_* supply the same reply port for a client.
<jrtc27>you're thinking the wrong way round
<junlingm>the code means my proposed patch.
<jrtc27>if it's not documented then you can't assume it
<jrtc27>e.g. to take a stupid example you don't document that read(2) always takes the same buffer
<jrtc27>because that's not true
<youpi>it's not undocumented actually
<youpi>it is documented that the reply port can be whatever
<jrtc27>even better :)
<youpi>so you can't assume anything about it
<junlingm>I know. so I think without he help of filesystem, we canot do so. for r4ead(2)/write(2), we havce peropen structs and we can distinguish. For kernel side device_* code, we do not.
<youpi>jrtc27's example is the same
<youpi>if you know MPI, MPI_Send tends to behave the same, and the MPI library does implement optimizations to make it fast, but it still has to cope with changing buffers
<youpi>junlingm: yes, because mach_device_t are singletons actually
*jrtc27 doesn't know the details of MPI and hopes to never get near that kind of thing... :)
<youpi>jrtc27: it's basically just like BSD sockets' send()/recv()
*junlingm hope I never need to deel with MPI in the future, too.
<jrtc27>yeah, but with more magic I assume
<youpi>in high performance computing code, you'll often use the same buffer over and over when doing send()/recv()
<youpi>not much more magic actually
<youpi>you have tags which act like udp/tcp ports
<jrtc27>I see
<youpi>it's very much like udp with message ordering
<junlingm>another aproach I was thinking is that read returns a request ID, and write uses that request id to ack.
<jrtc27>so UDT?
<junlingm>it can be a simple integer.
<youpi>you just don't need to care about IP addresses, nodes are just numbered from 0 to n-1
<youpi>junlingm: that becomes hairy again
<youpi>jrtc27: UDT ?
<jrtc27>UDP-based Data Transfer Protocol
<youpi>well yes
<junlingm>because other programs can blank write and screw legit programs?
<junlingm>blanket write
<youpi>plus you have interoperability on values
<youpi>you can send an array of floats
<youpi>and get an array of floats on the other side, even with differing archs
<jrtc27>ah now there's the magic I was talking about :P
<youpi>yes, but in practice we don't use it :)
<jrtc27>sneaky ser/des stuff in the middle
<youpi>we just send bytes, so the layer doesn't have to care
<youpi>and the layer optimizes for thats
<youpi>so that's what people continue doing
<jrtc27>yeah so long as it's layered properly it's fine you can just bypass the higher-level magic
<junlingm>another approach is to use async open to register, which must provide a meaningful port. But that makes think even more ugly.
<youpi>ah, also, you have easy async requests
<youpi>you can MPI_Irecv sevearl times to get several packets
<youpi>and then MPI_Test or Wait to test/wait when they're finished
<youpi>sort of like the overlapped operations of windows
<youpi>(though I'm unsure if you can actually make several overlapped calls over the same handle, on windows)
<junlingm>youpi: ^^^ use async open to register a unique notification port?
<youpi>there's a mechanism for that: notifications :)
<junlingm>ok irq.defs then.
<junlingm>I guess the making deliver_intr_user a linux ieq handler part may stay?
<youpi>I'm having a look now, yes
<youpi>since we'll want that long-term anyay
<junlingm>I am actually not very familiar with mig. So I prefer to leave the irq.defs change to other more knowledgable people.
***Server sets mode: +nt
<damo22>junlingm: i hope you intend to fix rumpkernel and libddekit once you completely change the irq device handling, i am not understanding what you guys are doing on the ML
<youpi>jrtc27: btw, "only" ~1000 binnmus for the errno_location relocation
<jrtc27>youpi: cool, that's not so bad
<youpi>yeah
<junlingm>damo22: we decided not to change the irq handling. Maybe rename device_intr_ack/register to another file.
<junlingm>I am working on decouple irqdev handling and linux irq handling, but for now, that would not change user land interfacce.