IRC channel logs
2020-08-05.log
back to list of logs
<junlingm>youpi: I intend to use the reply port of the client side device_open/read/write calls to identify userland programs. I think a userland program uses a unique port for all the reply port for client side device_* calls. So if a device opens two irq devices, htere would be two user_intr_t with the same dst_port (storing the reply port), so search_intr cannot distinguish them just from the port. We need either th device_t or the interrupt id. <youpi>if you call device_open() twice, you get two different ports <junlingm>you get two different device ports. But the same device port is used for all client programs opening the same irq device. <junlingm>so device ports cannot be used to identify userland callers. <youpi>ah, you mean the same mach_device_t <junlingm>yes, and the device_t is referenced in the mach_device_t, so there is a unique device_port per device, am I right? <youpi>I have to say that this is bringing a mess <youpi>how do you then iterate over user processes which need a notification? <junlingm>each device_open/read/write call from userland must provide a reply port for receiving results in case the call is blocking. We use that reply port to identify whcih program called. <youpi>err, but then that requires programs to have already called device_read when the interrupt comes <youpi>I wasn't understanding why your patch, now I start understanding, and I don't like that <youpi>if userland applications are not actually getting a separate port *from the point of view of the kernel*, the device_read() approach looks hairy to me <junlingm>we can register a notification from device_open if we want, but even then, if there is no read, there is no request. <youpi>and I'd rather use a separate set of IPC (which doesn't have to belong to device.defs, though) <youpi>when you register for notification, you don't need to make requests <youpi>userland provides the port that serves as identifier for thenotification <junlingm>Yeah. That is probably a simpler solution: just move device_intr_register/ack to something like irq_register/ack, and move the routines to something like irq.defs <junlingm>another way to to do the read/write would be requesting a exclusive read/write from the kernel, and do the multiplexing from /dev/irq* nodes. But that may be an overkill, as uyou pointed out earlier. <jrtc27>isn't irq.defs what we were proposing right at the start?.. <junlingm>jrtc27: I wouldn't know. I was not in the discussion. <youpi>jrtc27: that was an alternative <youpi>if device_read/write() could be done, it would have been nicer <youpi>(done in a simple way, I mean) <junlingm>it actually simplifies userland code by simply doing a while read ... write loop. And it works, as I tried it. The program is that it makes implicit assumptions on the client side behavior that is not documented anywhere. <youpi>userland drivers are usually not programmed as a while read ... write loop for interrups <youpi>they get an asynchronous notification, and call something to ack the notification <junlingm>the notification thread is, I think. We replace message passing loop by a rad/write loop. <youpi>plus, if your userland program has *several* irqs to monitor, you don't have such a read/write loop <youpi>yes but then the notification thread is not much more complex than a read/write loop <youpi>some more code, yes, but in the principle it's really about the same <jrtc27>(and you _force_ everyone to be linked with pthreads) <youpi>no, you can receive messages the way you want <jrtc27>previously you could have used signal handlers + longjmp <youpi>device_read() vs notification doesn't change that <jrtc27>now you either have to poll or have a thread <junlingm>the client side notification is typically done in a thrad, isn't it? <youpi>you can mach_msg a device_read() the way you want <youpi>and get the response later, as you wish <jrtc27>I thought this was trying to expose it through read(2)/write(2) <youpi>junlingm: typically, but there's no obligation <junlingm>I am actually thinking about read(2)/write(2), but that required firesystem part on the hurd side for /dev/irq* nodes <junlingm>the good thing is that a cleint may simply select and then read/write. <junlingm>youpi: no that does not require the read to be called before interrupt is sent. <youpi>how do you know you will have to notify that user? <youpi>how do you know which user you have notified or not? <junlingm>If the read is not called, then there is no registration. The first read call registers a notification. <jrtc27>youpi's point is then you can miss interrupts <junlingm>the probllem is in the small time intervall between the write (acking) and the read (queuing). <youpi>and such a problem usually means there's a basic issue in the overall approach <youpi>"the first read call registers a notification": you mean an actual mach notification or something else? on which port? <jrtc27>if you want a read/write interface, open to register, close to unregister, read to get the next interrupt and write to ack an interrupt is the only really sane way to do it <jrtc27>whatever you call those operations <junlingm>without a first read, there is no registration. If there is a registration, then the client may miss an interrupt, but the interrupt is stored in the user_intr->interrupts, and will be delivered later. <jrtc27>what do you mean by that second bit? <jrtc27>does the client miss the interrupt or not? <youpi>I don't understand why you would need a first read to register anything <jrtc27>and if so how does the kernel know whether to store it to deliver later or to drop it because the client's never reading it again <youpi>device_open() should be already doing it, can't it? <junlingm>the undeliverable interrupts are stored, and when notificastion is possible, it will be delivered. <junlingm>I belive this is the whole point of user_intr_t's interrupts field. <youpi>but then again, how do you know to which user you have delivered an interrupt, and to which you haven't? How do you identify users? <junlingm>so now the missed ones are still there in interrupts counter. <youpi>but what I'm talking about, is how you make sure that the interrupt gets delivered to all users calling device_read <junlingm>The read blocks by storing a io_req_t, which we store in suer_intr_t, it behaves like the old dst_port (notification pport). <youpi>at some point your users won't have called read yet <youpi>and still you could have an irq getting raised <youpi>you need to know that you'll have to answer to all uesrs' device_read request() <junlingm>but to know which user_intr_t to store, we need the client reply port and the interrupt id and the dst_port to match against. <youpi>how do you know which ones have done such device_read() (and thus you already signalled), and the ones that haven't <youpi>the reply port of the device_read()? <junlingm>once we signal, we clear the io_req_t stored in user_intr_t, <youpi>but that doesn't necessarily exist for all users <youpi>how do you identify them then? <junlingm>they do. They have to provide a port in case the call is blocking. <youpi>again, you need to notify*all* users when you get an irq <youpi>so you need to reply to exactly *one* device_read() request from each user, for each irq <junlingm>each user is independent. We use that reply port to distinguish which user called. <youpi>that reply port doesn't exist until device_read() is called <youpi>at some point you'll have users that have already called device_read(), and others that don't, and then you get the irq, what do you do? <youpi>how do you know that you'll have to reply immediately to those that haven't called device_read() yet? <junlingm>it is not created anew. All device read uses the same port. <youpi>the user can chose whatever reply port it likes <youpi>quite often glibc caches it, but there is *no* guarantee here <youpi>(I guess we at last got to the point that you have missed) <junlingm>yes that is my complain earlier that the code depends on undocumented behavior of device read <youpi>which undocumented behavior? <junlingm>that all device_* supply the same reply port for a client. <jrtc27>you're thinking the wrong way round <jrtc27>if it's not documented then you can't assume it <jrtc27>e.g. to take a stupid example you don't document that read(2) always takes the same buffer <youpi>it's not undocumented actually <youpi>it is documented that the reply port can be whatever <youpi>so you can't assume anything about it <junlingm>I know. so I think without he help of filesystem, we canot do so. for r4ead(2)/write(2), we havce peropen structs and we can distinguish. For kernel side device_* code, we do not. <youpi>jrtc27's example is the same <youpi>if you know MPI, MPI_Send tends to behave the same, and the MPI library does implement optimizations to make it fast, but it still has to cope with changing buffers <youpi>junlingm: yes, because mach_device_t are singletons actually *jrtc27 doesn't know the details of MPI and hopes to never get near that kind of thing... :) <youpi>jrtc27: it's basically just like BSD sockets' send()/recv() *junlingm hope I never need to deel with MPI in the future, too. <jrtc27>yeah, but with more magic I assume <youpi>in high performance computing code, you'll often use the same buffer over and over when doing send()/recv() <youpi>not much more magic actually <youpi>you have tags which act like udp/tcp ports <youpi>it's very much like udp with message ordering <junlingm>another aproach I was thinking is that read returns a request ID, and write uses that request id to ack. <youpi>you just don't need to care about IP addresses, nodes are just numbered from 0 to n-1 <youpi>junlingm: that becomes hairy again <jrtc27>UDP-based Data Transfer Protocol <junlingm>because other programs can blank write and screw legit programs? <youpi>plus you have interoperability on values <youpi>you can send an array of floats <youpi>and get an array of floats on the other side, even with differing archs <jrtc27>ah now there's the magic I was talking about :P <youpi>yes, but in practice we don't use it :) <jrtc27>sneaky ser/des stuff in the middle <youpi>we just send bytes, so the layer doesn't have to care <youpi>and the layer optimizes for thats <youpi>so that's what people continue doing <jrtc27>yeah so long as it's layered properly it's fine you can just bypass the higher-level magic <junlingm>another approach is to use async open to register, which must provide a meaningful port. But that makes think even more ugly. <youpi>ah, also, you have easy async requests <youpi>you can MPI_Irecv sevearl times to get several packets <youpi>and then MPI_Test or Wait to test/wait when they're finished <youpi>sort of like the overlapped operations of windows <youpi>(though I'm unsure if you can actually make several overlapped calls over the same handle, on windows) <junlingm>youpi: ^^^ use async open to register a unique notification port? <youpi>there's a mechanism for that: notifications :) <junlingm>I guess the making deliver_intr_user a linux ieq handler part may stay? <youpi>since we'll want that long-term anyay <junlingm>I am actually not very familiar with mig. So I prefer to leave the irq.defs change to other more knowledgable people. ***Server sets mode: +nt
<damo22>junlingm: i hope you intend to fix rumpkernel and libddekit once you completely change the irq device handling, i am not understanding what you guys are doing on the ML <youpi>jrtc27: btw, "only" ~1000 binnmus for the errno_location relocation <junlingm>damo22: we decided not to change the irq handling. Maybe rename device_intr_ack/register to another file. <junlingm>I am working on decouple irqdev handling and linux irq handling, but for now, that would not change user land interfacce.