IRC channel logs
2023-07-26.log
back to list of logs
<damo22>interesting, timer_lock is hogging a lot of time <Guest25>damo22: are you aware of the bugs I reported in rumpusbdisk ? <damo22>can you please use a proper nickname when using this IRC channel for reporting bugs <damo22>how can we follow up with guestX <azert>so, one thing is that if rumpUSBdisk check if gnumach is driving SATA <azert>that prevents using it in a condition that is legit <azert>this is the error: Kernel is already driving a SATA device, skipping probing rump USB disks <azert>I think that part needs to be conditionally excluded if RUMP is driving USB <damo22>yeah, currently it shares device name with sata, we probably need to make new nodes for it im not sure <azert>because you can name it as you want on the hurd side <azert>maybe you need to make nodes for the usb controllers, I don't know about that <damo22>i cant split the usb stack into 2 very easily <azert>you could just compile most of the usb stack into a single translator <damo22>im not sure how to make the controller into a separate translator <azert>yeah that would need a new API, it's a very diffiuclt task since USB is very difficult <damo22>because the driver probes the devices and only probes successfuly when the device driver is linked in <azert>why not linking in the whole rump usb stack? <damo22>but that wont work for other usb devices <azert>yea but things like audio cards <damo22>you cant drive the usb stack twice <damo22>it needs to be a separate translator that can attach devices <azert>ok, this is a separate issue, something hard to solve <damo22>i think netbsd has concept of ugenhc <azert>yes I've read about that, it could be a solution <azert>the most easy solution is to not split the stack at all <damo22>yeah but it only supports one kind of device <damo22>theres a bug with the memory mapping <azert>I have a second bug to report <damo22>i think it needs page allocation with alignment to more than a page <azert>my second bug is the following: if you type settrans -ap NOT_EXISTING_FILE /hurd/rumpusbdisk <azert>"i think it needs page allocation with alignment to more than a page" seems like an easy fix <damo22>settrans: NOT_EXISTING: No such file or directory <damo22>it also prints a bunch of bootup for rumpusbdisk <damo22>which looks like start up of mach <azert>for me it ends in a kernel trap <damo22>in terms of lock contention in SMP, i am getting (highest contention first) timer_lock -> db_lock -> vm_page_queue_lock -> vm_page_queue_free_lock <damo22>but the first two are probably because i am using lock monitoring <azert>can you get how much time is spent waiting for these locks? <damo22>with smp 2, i get 88 seconds of locking in timer_lock and db_lock, but that is irrelevant, the 3rd biggest is 5 seconds vm_page_queue_lock <damo22>for a 88 second bootup procedure <damo22>i think the 88 seconds in timer lock are actually wasted measuring the timing of locks <azert>so you are convinced that most time is spent waiting for locks? <damo22>i think a lot of time is spent syncronising the timer <azert>what is the code that does that? <damo22>theres a bunch of __sync_synchronize() calls as well <azert>maybe those parts needs to be done only if the cpu is the master cpu? <damo22> * Time-of-day and time-out list are updated only <azert>i don't see how the lock is contended then <azert>could be an issue with the timeouts? <damo22>maybe the timeouts only work on cpu0 <damo22>so the rest of the cpus have to wait for cpu0 to get timer cycles <damo22>so what happens if a timeout occurs but cpu0 is stuck in a spinlock? <damo22>does the timeout happen after the spinlock is released? <azert>I don't see that issue in the code <azert>are timeouts often reset or inited by something? <damo22>i think they are used quite a bit <azert>because maybe the timer_lock is held by these actors and this is what slows down clock synchronisation <azert>by the way, are timeouts only handled by cpu0? <damo22> * Depress thread's priority to lowest possible for specified period. <damo22> * Intended for use when thread wants a lock but doesn't know which <damo22> * other thread is holding it. As with thread_switch, fixed <damo22> * priority threads get exactly what they asked for. Users access <damo22> * this by the SWITCH_OPTION_DEPRESS option to thread_switch. A Time <damo22> * of zero will result in no timeout being scheduled. <damo22>lapic timer is calibrated and set to expire on all cpus <damo22>but they fire independently and call hardclock on each cpu <damo22>hardclock calls clock_interrupt and only services timeouts on cpu0 <damo22>when we figure out what is wrong with smp it will help a lot <damo22>im working on it now, but its not easy <toaster5>we'll im not really good at programming a kernel so don't think I could help with that lol <damo22>i found that softclock is being serviced by all cpus.... that is bad <luckyluke>toaster5: if you can compile gjumach with some patch you can already try it in qemu, see the ml <luckyluke>damo22: I think a timer interrupt will almost always preempt kernel code, even if it's serving another interrupt (there are a few critical regions, e.g. syscall entry/exit on x86_64)