IRC channel logs
2025-06-16.log
back to list of logs
<damo22>the problem with rumpnet is the immediate reading of a single packet, i need to turn that off and let it queue up multiple packets <damo22>getting lots of these errors on rcvd packets <damo22>youpi: where is the mach_msg_server for network rcv packets? <damo22>heh, i got it to work with 1.5MB/s on smp 8 <azert>damo22: is rumpnet multithreaded? <azert>Would switching to multiple rcv threads solve the timeout problems? <damo22>it seems to hang if i make the mach_msg timeout nonzero <azert>Indeed it depends where is the bottleneck <damo22>it hard locks up the smp system, even ddb does not work <damo22>actually i was scping from host and pulling packets back to the host <azert>Interrupts are handled on the rcv thread, right? <damo22>all the receive thread does is deliver the messages coming from the nic <damo22>i just got 267MB copied at 1.8MB/s and then it locked up <azert>Does it lock up because the message queue is full? <azert>points to the gnumach being stuck <azert>and gnumach sends a message for each interrupt, right ? <azert>I hope you can solve this without having to mess with scheduling <damo22>i set the timeout to 1 second for each mach_msg() it delivers per packet <damo22>im not getting timeouts anymore but i get the hangs <azert>I would keep the timeout to zero <damo22>why does the timeout of zero actually cause timeouts <azert>The kernel shouldn’t block for any reason <azert>Then I’d try to solve the timeouts <azert>those are performances related <azert>maybe moving to a multithreaded rcv helps <azert>It’s better a timeout then a hang <damo22>yes it is, but i dont understand why <damo22>shouldnt a non-zero timeout cause a timeout, not an infinite timeout <azert>queue, unless the MACH_SEND_TIMEOUT option is used. If a port has several blocked senders, then any of them may queue the next message when space in the queue becomes available, with the proviso that a blocked sender will not be indefinitely starved. These options modify MACH_SEND_MSG. If MACH_SEND_MSG is not also specified, they are ignored. <azert>If the kernel blocks, you get the hang <damo22>i dont understand MACH_SEND_TIMEOUT option <damo22>what options should i use to ensure the kernel never blocks <azert>I understand from the docs that the timeout should be zero <azert>and I think you need to be as quick as possible to handle the interrupts on the rumpnet side <youpi>damo22: why are you setting MACH_SEND_TIMEOUT? <youpi>along with timeout=0, that requests immediate send <damo22>i dont know, i copied the code from the other net driver <youpi>then it's the reception part that needs fixing <youpi>but don't be surprised to get EMACH_SEND_TIMED_OUT if you set MACH_SEND_TIMEOUT, it's meant for it <damo22>yes, i am trying to understand the mach_msg() call <damo22>its been a long time, i dont recall <damo22>i think i got it from netdde but im not sure <youpi>eth-multiplexer happens to set it, I don't know why <youpi>that being said, losing packets is fine, that tells tcp to back-off <damo22>so do i just use MACH_SEND_MSG with a timeout of 0? <youpi>and ignore the errors which are merely due to too fast bandwidth for the software stack, yes <youpi>may I remind that smp is *difficult* <youpi>don't bother trying to optimize for smp first <youpi>to get actual parallelism on smp, you need a multi-channel network card, distributed irqs, distributed message queuing and whatnot <youpi>going smp or multithreaded is never a simple solution to performance <damo22> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND <damo22> 552 root 17 -3 394544 264224 0 R 33.4 12.8 0:53.31 rumpnet.s+ <damo22> 7 root 6 -14 343076 211352 0 R 27.2 10.2 0:49.84 rumpdisk <damo22> 803 demo 20 0 171776 4624 0 R 17.9 0.2 0:32.12 sshd-sess+ <damo22> 549 root 5 -15 187348 2468 0 S 6.3 0.1 0:09.90 pfinet <damo22> 14 root -1 -21 170352 948 0 S 6.0 0.0 0:07.46 pflocal <azert>youpi: maybe deliver_intr in gnumach also needs a timeout of zero? <youpi>you do *not* want to lose interrupt delivery message <youpi>otherwise the interrupt count will get bogus <azert>what happens when the queue is full? <azert>how is that fine? gnumach will block, right? <damo22>2GB copied at 5.0MB/s with UP so far <youpi>the interrupt is masked until userland unblocks it <azert>maybe gnumach could write down some state, bailout and retry later instead <azert>damo22: can you profile rumpnet? <damo22>i dont know if we compile rump with profiling enabled <youpi>it was working at some ponit for ext2fs etc. <youpi>that's indeed useful to track down overhead <damo22>why would wire_task_self() fix the vm_pages_phys from returning 0 <youpi>because it forces *all* pages to be always allocated <damo22>is that why it uses so much ram? <youpi>rumpnet can probably just wire down the buffer for which it wants to dma <damo22>why cant pci-userspace do that in dmalloc <damo22>whenever it asks for dma it could wire that down <youpi>because nobody implemented it? <youpi>in rumpdisk we have to wire down the wohle process anyway <damo22>ah because it provides the disk for swap memory <damo22>2.5GB file matches sha256sums on each end so the copy worked <damo22>is there any reason we cant wire down all calls to vm_allocate_contiguous in gnumach? <damo22>is there a case where you want a chunk of contiguous memory but dont mind if its swapped out? <youpi>they are already made non-pageable <damo22> kr = vm_map_pageable(map, vaddr, vaddr + size, VM_PROT_READ | VM_PROT_WRITE, TRUE, TRUE); <damo22>so only VM_PROT_NONE memory is pageable? <damo22>why do we unwire the requested memory just below that <youpi>see d6030cdfc49e9aa10819a5438a5ae313a4538f42 <damo22>ok so why are you not unwiring the memory pages between vm_page_atop(size) and npages <damo22>since they were extra pages we didnt need <youpi>because we didn't wire them? <damo22>so why are the desired pages wired in the first place if you just unwire them after? <youpi>I don't know, I didn't write that code <damo22>so do we have a contradiction in the memory management? non-pageable memory must be unwired, but contiguous requests for physical memory need to be non-pageable and wired down? <youpi>"non-pageable memory must be unwired" ?? <youpi>d6030cdfc49e9aa10819a5438a5ae313a4538f42 is not saying that <damo22>the commit adds a call to vm_page_unwire() for the pages just allocated, that memory is non-pageable (wired) <damo22>so im confused why we need to unwire the pages <damo22>ah we are "releasing one wiring" of the pages <azert>If i understand correctly, I’d revert that commit and make instead sure that wired memory don’t get passed around by device_read <youpi>that'd bring yet more data copy <youpi>while it should be feasible to get wiring right <azert>Is it feasible? Can you make wiring and copy-on-write work together? <azert>But to maximize performances, using shared memory would still be a win <damo22>if a task wires a memory region, gnumach keeps a ref count of how many tasks wired the page? <damo22>does gnumach call vm_page_wire() against its own map? it shouldnt right? <damo22>unless the pages are for gnumach to use