IRC channel logs

<damo22>interesting, timer_lock is hogging a lot of time

<Guest25>damo22: are you aware of the bugs I reported in rumpusbdisk ?

<damo22>can you please use a proper nickname when using this IRC channel for reporting bugs

<damo22>how can we follow up with guestX

<azert>is this ok?

<damo22>sure

<azert>so, one thing is that if rumpUSBdisk check if gnumach is driving SATA

<azert>that prevents using it in a condition that is legit

<azert>this is the error: Kernel is already driving a SATA device, skipping probing rump USB disks

<azert>I think that part needs to be conditionally excluded if RUMP is driving USB

<azert>do you agree?

<damo22>yeah, currently it shares device name with sata, we probably need to make new nodes for it im not sure

<azert>that's not a big issue

<azert>because you can name it as you want on the hurd side

<damo22>yea

<azert>maybe you need to make nodes for the usb controllers, I don't know about that

<damo22>i cant split the usb stack into 2 very easily

<azert>you could just compile most of the usb stack into a single translator

<damo22>im not sure how to make the controller into a separate translator

<azert>yeah that would need a new API, it's a very diffiuclt task since USB is very difficult

<damo22>because the driver probes the devices and only probes successfuly when the device driver is linked in

<azert>why not linking in the whole rump usb stack?

<damo22>i did

<damo22>but that wont work for other usb devices

<azert>yea but things like audio cards

<azert>why not?

<damo22>you cant drive the usb stack twice

<damo22>it needs to be a separate translator that can attach devices

<azert>ok, this is a separate issue, something hard to solve

<damo22>i think netbsd has concept of ugenhc

<damo22>i need to read about it

<azert>yes I've read about that, it could be a solution

<azert>the most easy solution is to not split the stack at all

<azert>just keep it monolitic

<damo22>yeah but it only supports one kind of device

<damo22>or all of them

<azert>with all of them

<azert>monolitic with all of them

<azert>and call it a day

<damo22>ok that can work

<damo22>not ideal

<azert>of course not ideal

<azert>but temporary

<damo22>anyway it wont boot on EHCI yet

<azert>why?

<damo22>theres a bug with the memory mapping

<azert>ah ok

<azert>I have a second bug to report

<damo22>i think it needs page allocation with alignment to more than a page

<damo22>mach doesnt support it yet

<azert>my second bug is the following: if you type settrans -ap NOT_EXISTING_FILE /hurd/rumpusbdisk

<azert>it kills gnumach

<azert>"i think it needs page allocation with alignment to more than a page" seems like an easy fix

<damo22>no it doesnt

<azert>try it

<damo22>settrans: NOT_EXISTING: No such file or directory

<azert>ok, then is my vm

<damo22>it also prints a bunch of bootup for rumpusbdisk

<damo22>which looks like start up of mach

<azert>for me it ends in a kernel trap

<damo22>ok

<damo22>that is weird

<damo22>in terms of lock contention in SMP, i am getting (highest contention first) timer_lock -> db_lock -> vm_page_queue_lock -> vm_page_queue_free_lock

<damo22>but the first two are probably because i am using lock monitoring

<azert>can you get how much time is spent waiting for these locks?

<damo22>yes

<azert>is it significant?

<damo22>yes

<damo22>with smp 2, i get 88 seconds of locking in timer_lock and db_lock, but that is irrelevant, the 3rd biggest is 5 seconds vm_page_queue_lock

<damo22>for a 88 second bootup procedure

<azert>what about the 4th?

<damo22>i think the 88 seconds in timer lock are actually wasted measuring the timing of locks

<damo22>and syncronising the timer

<damo22>the fourth is 2.7seconds

<azert>so you are convinced that most time is spent waiting for locks?

<damo22>im not sure

<damo22>i think a lot of time is spent syncronising the timer

<azert>what is the code that does that?

<damo22>kern/mach_clock.c

<damo22>s= splsched();

<damo22>simple_lock(timer_lock);

<damo22>...

<damo22>splx(s)

<damo22>theres a bunch of __sync_synchronize() calls as well

<azert>those are memory fenches

<damo22>clock_interrupt calls at 100Hz

<damo22>and it calls update_mapped_time

<azert>maybe those parts needs to be done only if the cpu is the master cpu?

<damo22> /*

<damo22> * Time-of-day and time-out list are updated only

<damo22> * on the master CPU.

<damo22> */

<damo22> if (my_cpu == master_cpu) {

<azert>i don't see how the lock is contended then

<damo22>thats an interesting point

<azert>could be an issue with the timeouts?

<damo22>maybe the timeouts only work on cpu0

<damo22>so the rest of the cpus have to wait for cpu0 to get timer cycles

<damo22>??

<azert>plausible theory

<damo22>so what happens if a timeout occurs but cpu0 is stuck in a spinlock?

<damo22>does the timeout happen after the spinlock is released?

<azert>I don't see that issue in the code

<azert>are timeouts often reset or inited by something?

<damo22>i think they are used quite a bit

<damo22>to implement delays

<azert>because maybe the timer_lock is held by these actors and this is what slows down clock synchronisation

<damo22>ok

<azert>by the way, are timeouts only handled by cpu0?

<damo22>maybe not

<damo22> * thread_depress_priority

<damo22> *

<damo22> * Depress thread's priority to lowest possible for specified period.

<damo22> * Intended for use when thread wants a lock but doesn't know which

<damo22> * other thread is holding it. As with thread_switch, fixed

<damo22> * priority threads get exactly what they asked for. Users access

<damo22> * this by the SWITCH_OPTION_DEPRESS option to thread_switch. A Time

<damo22> * of zero will result in no timeout being scheduled.

<damo22>lapic timer is calibrated and set to expire on all cpus

<damo22>but they fire independently and call hardclock on each cpu

<damo22>hardclock calls clock_interrupt and only services timeouts on cpu0

<toaster5>when will Hurd be ready for x64

<damo22>when we figure out what is wrong with smp it will help a lot

<toaster5>hm okay

<damo22>im working on it now, but its not easy

<toaster5>we'll im not really good at programming a kernel so don't think I could help with that lol

<damo22>i found that softclock is being serviced by all cpus.... that is bad

<luckyluke>toaster5: if you can compile gjumach with some patch you can already try it in qemu, see the ml

<luckyluke>damo22: I think a timer interrupt will almost always preempt kernel code, even if it's serving another interrupt (there are a few critical regions, e.g. syscall entry/exit on x86_64)

IRC channel logs

2023-07-26.log