IRC channel logs

2023-07-26.log

back to list of logs

<damo22>interesting, timer_lock is hogging a lot of time
<Guest25>damo22: are you aware of the bugs I reported in rumpusbdisk ?
<damo22>can you please use a proper nickname when using this IRC channel for reporting bugs
<damo22>how can we follow up with guestX
<azert>is this ok?
<damo22>sure
<azert>so, one thing is that if rumpUSBdisk check if gnumach is driving SATA
<azert>that prevents using it in a condition that is legit
<azert>this is the error: Kernel is already driving a SATA device, skipping probing rump USB disks
<azert>I think that part needs to be conditionally excluded if RUMP is driving USB
<azert>do you agree?
<damo22>yeah, currently it shares device name with sata, we probably need to make new nodes for it im not sure
<azert>that's not a big issue
<azert>because you can name it as you want on the hurd side
<damo22>yea
<azert>maybe you need to make nodes for the usb controllers, I don't know about that
<damo22>i cant split the usb stack into 2 very easily
<azert>you could just compile most of the usb stack into a single translator
<damo22>im not sure how to make the controller into a separate translator
<azert>yeah that would need a new API, it's a very diffiuclt task since USB is very difficult
<damo22>because the driver probes the devices and only probes successfuly when the device driver is linked in
<azert>why not linking in the whole rump usb stack?
<damo22>i did
<damo22>but that wont work for other usb devices
<azert>yea but things like audio cards
<azert>why not?
<damo22>you cant drive the usb stack twice
<damo22>it needs to be a separate translator that can attach devices
<azert>ok, this is a separate issue, something hard to solve
<damo22>i think netbsd has concept of ugenhc
<damo22>i need to read about it
<azert>yes I've read about that, it could be a solution
<azert>the most easy solution is to not split the stack at all
<azert>just keep it monolitic
<damo22>yeah but it only supports one kind of device
<damo22>or all of them
<azert>with all of them
<azert>monolitic with all of them
<azert>and call it a day
<damo22>ok that can work
<damo22>not ideal
<azert>of course not ideal
<azert>but temporary
<damo22>anyway it wont boot on EHCI yet
<azert>why?
<damo22>theres a bug with the memory mapping
<azert>ah ok
<azert>I have a second bug to report
<damo22>i think it needs page allocation with alignment to more than a page
<damo22>mach doesnt support it yet
<azert>my second bug is the following: if you type settrans -ap NOT_EXISTING_FILE /hurd/rumpusbdisk
<azert>it kills gnumach
<azert>"i think it needs page allocation with alignment to more than a page" seems like an easy fix
<damo22>no it doesnt
<azert>try it
<damo22>settrans: NOT_EXISTING: No such file or directory
<azert>ok, then is my vm
<damo22>it also prints a bunch of bootup for rumpusbdisk
<damo22>which looks like start up of mach
<azert>for me it ends in a kernel trap
<damo22>ok
<damo22>that is weird
<damo22>in terms of lock contention in SMP, i am getting (highest contention first) timer_lock -> db_lock -> vm_page_queue_lock -> vm_page_queue_free_lock
<damo22>but the first two are probably because i am using lock monitoring
<azert>can you get how much time is spent waiting for these locks?
<damo22>yes
<azert>is it significant?
<damo22>yes
<damo22>with smp 2, i get 88 seconds of locking in timer_lock and db_lock, but that is irrelevant, the 3rd biggest is 5 seconds vm_page_queue_lock
<damo22>for a 88 second bootup procedure
<azert>what about the 4th?
<damo22>i think the 88 seconds in timer lock are actually wasted measuring the timing of locks
<damo22>and syncronising the timer
<damo22>the fourth is 2.7seconds
<azert>so you are convinced that most time is spent waiting for locks?
<damo22>im not sure
<damo22>i think a lot of time is spent syncronising the timer
<azert>what is the code that does that?
<damo22>kern/mach_clock.c
<damo22>s= splsched();
<damo22>simple_lock(timer_lock);
<damo22>...
<damo22>splx(s)
<damo22>theres a bunch of __sync_synchronize() calls as well
<azert>those are memory fenches
<damo22>clock_interrupt calls at 100Hz
<damo22>and it calls update_mapped_time
<azert>maybe those parts needs to be done only if the cpu is the master cpu?
<damo22> /*
<damo22> * Time-of-day and time-out list are updated only
<damo22> * on the master CPU.
<damo22> */
<damo22> if (my_cpu == master_cpu) {
<azert>i don't see how the lock is contended then
<damo22>thats an interesting point
<azert>could be an issue with the timeouts?
<damo22>maybe the timeouts only work on cpu0
<damo22>so the rest of the cpus have to wait for cpu0 to get timer cycles
<damo22>??
<azert>plausible theory
<damo22>so what happens if a timeout occurs but cpu0 is stuck in a spinlock?
<damo22>does the timeout happen after the spinlock is released?
<azert>I don't see that issue in the code
<azert>are timeouts often reset or inited by something?
<damo22>i think they are used quite a bit
<damo22>to implement delays
<azert>because maybe the timer_lock is held by these actors and this is what slows down clock synchronisation
<damo22>ok
<azert>by the way, are timeouts only handled by cpu0?
<damo22>maybe not
<damo22> * thread_depress_priority
<damo22> *
<damo22> * Depress thread's priority to lowest possible for specified period.
<damo22> * Intended for use when thread wants a lock but doesn't know which
<damo22> * other thread is holding it. As with thread_switch, fixed
<damo22> * priority threads get exactly what they asked for. Users access
<damo22> * this by the SWITCH_OPTION_DEPRESS option to thread_switch. A Time
<damo22> * of zero will result in no timeout being scheduled.
<damo22>lapic timer is calibrated and set to expire on all cpus
<damo22>but they fire independently and call hardclock on each cpu
<damo22>hardclock calls clock_interrupt and only services timeouts on cpu0
<toaster5>when will Hurd be ready for x64
<damo22>when we figure out what is wrong with smp it will help a lot
<toaster5>hm okay
<damo22>im working on it now, but its not easy
<toaster5>we'll im not really good at programming a kernel so don't think I could help with that lol
<damo22>i found that softclock is being serviced by all cpus.... that is bad
<luckyluke>toaster5: if you can compile gjumach with some patch you can already try it in qemu, see the ml
<luckyluke>damo22: I think a timer interrupt will almost always preempt kernel code, even if it's serving another interrupt (there are a few critical regions, e.g. syscall entry/exit on x86_64)