<damo22>i cant figure out why my code hangs <damo22>maybe smp is not configured for interrupts? <damo22>im not sure if it boots into that <damo22>its failing on detecting devices <AlmuHS>wait, I'll try to explain how to do It <AlmuHS>do you need a copy of the gnumach binary which is running in the VM <AlmuHS>and, if you have the sources directory, It will works better <AlmuHS>then, in qemu, you have to configure some options in the script <AlmuHS>I've set -no-reboot and -no-shutdown to allow see the error when gnumach hangs <AlmuHS>to start debugging, you need to turn on the VM <AlmuHS>then, you have to enter in gdb, using remote debugging ***Server sets mode: +nt
<AlmuHS>then, you will know how to debug after this point <AlmuHS>as a note: if you starts gdb from build/ directory, you can see the exact line of code <AlmuHS>I usually use gdb to find the origin of some hangs <AlmuHS>other useful thing is the qemu monitor <AlmuHS>with my script, you can enter to this using "telnet 127.0.0.1 5555" <damo22>i proved that ioapic is set up correctly using info lapic and info iopaic <AlmuHS>in qemu monitor you can see the registers <AlmuHS>I usually check registers to try to find problems <AlmuHS>you can select an certain cpu using "cpu x" command <damo22>but i set breakpoint on ioapic_configure <AlmuHS>with kvm, the VM running via hardware <AlmuHS>so, to set a break, you have to use a hardware break, using the physical address of the function <AlmuHS>you can use "objdump -d gnumach | less" to find the address, but It's so lazy <AlmuHS>and, with a hardware break, you have to remove the break to continue the running <AlmuHS>you have to set the new hardware break, and remove the previous <AlmuHS>because, with a hardware break, gdb keep in the same address, even when you type "next" or "continue" <AlmuHS>but, if you disable kvm in the machine, you can use simple break <damo22>it takes a lot of effort to set up <AlmuHS>but, if you enable reboot, when gnumach hang, It will reboot very fast <AlmuHS>and you will can't see the problem <AlmuHS>I set -no-shutdown and -no-reboot <AlmuHS>other point is that you can use EIP value to find the last instruction before the hang <AlmuHS>using objdump -d gnumach | grep value <AlmuHS>but you need to use a copy of the same binary you are running <AlmuHS>different compilation can produce different address <AlmuHS>can you find the exact instruction? <AlmuHS>enter into the function, to find which instruction crash <damo22>request_region (0x00, 0x20, "dma1"); <AlmuHS>or even enter in this new function <AlmuHS>objdump -d gnumach | grep 00010060 <damo22>after a while debugging i get randomly: <damo22>Thread 1 received signal SIGQUIT, Quit. <damo22>delay (n=1000000) at ../i386/i386/loose_ends.c:46 <AlmuHS>you can set more breaks, and take care doing next <AlmuHS>advancing carefully, maybe you can find the instruction which crash <damo22>that is not causing the crash, but might be the cause of the delay <damo22>i386/i386at/kd.c: delay(1000000); <AlmuHS>when gnumach enter in loose_ends.c with delay, It is hanged <AlmuHS>you have to find the instructions previous to these <AlmuHS>set more breaks, and advance with next <AlmuHS>even you can use "step into" (s) to enter in the functions <damo22>how do you display all breakpoints <AlmuHS>you can delete break with "d" and the number of the break <AlmuHS>and set new break with "b function" <AlmuHS>you can even set a break in a exactly line into the function, I think that is with "b function+line" <damo22>its wierd when i break on the function where it crashed last time it crashes without breaking <damo22>when i call spl0(), it crashes regardless of the code after it <AlmuHS>you can set a hardware breakpoint using the function address, but I don't know if It solved the problem <AlmuHS>also, you can set a break in the caller of the function <damo22>ahh linux is trying to calibrate the delay <damo22>but ive already started apic timer <damo22>the smp code in linux assumes interrupts are routed only to BSP <damo22>but i have configured the apic to be physically delivering the interrupts to whichever lapic grabs it <AlmuHS>the IPI can be sent as broadcast (not recomended) or unicast <AlmuHS>but I don't know about the rest of APIC interrupts <damo22>im using physical fixed interrupts <damo22>i am delivering to ioapic[apic].apid_id <AlmuHS>each IRQ are sent to a specific cpu <damo22>yeah you can set it however you want <damo22>ioapic[apic].apic_id is not the lapic id <AlmuHS>maybe you can bypass that, in linux code, the apic irq only will be sent to bsp <damo22>i need to deliver to lapci not to an ioapic <damo22>the interrupts source is the ioapic and destination is lapic <AlmuHS>you can do that, in linux code, the destination will be always bsp lapic <AlmuHS>nope, but It's always the first in cpus array <AlmuHS>to get the apic_id of BSP you can use "machine_slot[0].apic_id" <damo22>i think its something to do with switching stacks for interrupts <AlmuHS>each cpu has its own interrupt stack <AlmuHS>in APs, the interrupt stacks are this <AlmuHS> * Addresses of bottom and top of interrupt stacks. <AlmuHS>vm_offset_t interrupt_stack[NCPUS]; <AlmuHS>vm_offset_t _int_stack_top[NCPUS]; <AlmuHS>extern char _intstack[]; /* bottom */ <AlmuHS>extern char _eintstack[]; /* top */ <AlmuHS>the AP intstack are reserved in model_dep.c, with a call to interrupt_stack_alloc() <AlmuHS>be careful to avoid use these stack before these are initialized <damo22>my timer starts delivering interrupts early <damo22>and i guess it tries to enable interrupts <AlmuHS>these stacks are necesary to execute the routine asociated with the irq <damo22>so its jumping to some garbage address then <AlmuHS>the interrupts has their own stack to avoid share default stack <AlmuHS>but, by default, each cpu has loaded the default stack <damo22>yeah but if your stack is not inited before i start using interrupts, it will brak <AlmuHS>at beginning of each interrupt, you might reload the eip with the interrupt stack <damo22>i have a custom handler for 0xff spurious interrupt <AlmuHS>maybe youpi can explain better (but now I think he is sleeping) <damo22>now its hung calibrating the timer <damo22>the linux timer is not advancing <damo22>but how do you force it to read a single lapic timer value <damo22>if the timer is stored inside each core <damo22>since the bsp always services all the interrupts, i can make the EOI update the time value <damo22>how do i ensure im on a specific cpu when i enable the lapic timer? <damo22>the time stamp seems to be reading from locore.S VA_ETC <damo22>but i cant find that symbol anywhere <damo22>ahh the lapic counter DECREMENTS until zero and then generates a timer irq <damo22>225 lapic->spurious_vector.r |= LAPIC_ENABLE | IOAPIC_SPURIOUS_BASE; <damo22>t_page_fault () at ../i386/i386/locore.S:439 <damo22>i cant seem to use a global during early setup <damo22>Thread 1 hit Breakpoint 1, lapic_init_ioapic () at ../i386/i386at/ioapic.c:225 <damo22>225 lapic->spurious_vector.r |= LAPIC_ENABLE | IOAPIC_SPURIOUS_BASE; <damo22>#0 lapic_init_ioapic () at ../i386/i386at/ioapic.c:225 <damo22>#1 ioapic_configure () at ../i386/i386at/ioapic.c:345 <damo22>#2 0xc10409a7 in extra_setup () at ../i386/i386at/acpi_rsdp.c:350 <damo22>#3 0xc1022bc7 in setup_main () at ../kern/startup.c:141 <damo22>#4 0xc1000074 in iplt_done () at ../i386/i386at/boothdr.S:92 <damo22>lapic-> points to some address that causes a page fault <damo22>PHYS=0xfee00000 virtual = 0xf9690000 <damo22>why do i get a page fault when i try to access this address <damo22>Backtrace stopped: Cannot access memory at address 0xfee00004 <damo22>Thread 1 hit Breakpoint 1, lapic_init_ioapic () at ../i386/i386at/ioapic.c:340 <damo22>$1 = {void (void)} 0xc100ca06 <ioapic_configure+134> <damo22>0xc100ca06 <ioapic_configure+134>: 0x0cc24ca1 <damo22>t_page_fault () at ../i386/i386/locore.S:439 <damo22>calling the function page faults? <damo22>Thread 1 hit Breakpoint 1, lapic_init_ioapic () at ../i386/i386at/ioapic.c:340 <damo22>#0 lapic_init_ioapic () at ../i386/i386at/ioapic.c:340 <damo22>#1 ioapic_configure () at ../i386/i386at/ioapic.c:340 <damo22>#2 0xc1022b9c in setup_main () at ../kern/startup.c:157 <damo22>#3 0xc1000074 in iplt_done () at ../i386/i386at/boothdr.S:92 <damo22>youpi: the LAPIC registers are not updating, somehow the VA is not mapped <youpi>I mean, which address exactly <youpi>did you use pmap_get_mapwindow to access it? <youpi>"its mapped to" -> how did it get mapped? <damo22>it got saved to a global i think <youpi>I mean which feature does extra_setup use to map it? <youpi>where is your tree actually? <damo22>i need to reboot my vm and then push <damo22>the last commit is trying to fix the timer but its broken <damo22>vm_map_physical(&virt, lapic_addr, sizeof(ApicLocalUnit), 0); <youpi>I don't see anything wrong in vm_map_physical <youpi>but you'd probably want to check that what pmap_enter does is indeed what you need <damo22>at what point should i init the apic? <damo22>and it has to be after the lapic and ioapic addresses are mapped <youpi>i386at_init and its callee are on the BSP <youpi>the thing is: you need vm mapping to map apics, so you can't do that before vm_mem_bootstrap() anyway <damo22>LVT0 0x00010000 active-hi edge masked Fixed (vec 0) <damo22>LVT1 0x00010000 active-hi edge masked Fixed (vec 0) <damo22>LVTPC 0x00000400 active-hi edge NMI <damo22>LVTERR 0x00010000 active-hi edge masked Fixed (vec 0) <damo22>LVTTHMR 0x00010000 active-hi edge masked Fixed (vec 0) <damo22>LVTT 0x00020030 active-hi edge periodic Fixed (vec 48) <damo22>Timer DCR=0x3 (divide by 16) initial_count = 14669 <damo22>SPIV 0x000001ff APIC enabled, focus=off, spurious vec 255 <damo22>i needed to disable the IOAPIC from the beginning because it was enabled at boot <damo22>so move lapic->spurious_vector.r setting to the top of the function <damo22>i might try a lower divide setting now <damo22>when i use timer divide 8 i get a lower count <damo22>$ objdump -d gnumach |grep c1008cc9 <damo22>c1008cc9: 8b 04 95 20 df 0b c1 mov -0x3ef420e0(,%edx,4),%eax <damo22>yeah but i have a working timer now <damo22>c1008cc9: 8b 04 95 20 df 0b c1 mov -0x3ef420e0(,%edx,4),%eax <AlmuHS>be careful in IRQ redirection. In the point that you shows, the AP hasn't be enabled yet <AlmuHS>so, at this point, you need to redirect all IRQ to BSP <damo22>all IRQs are enabled in BSP only <damo22>c1008cc0: 83 fa 07 cmp $0x7,%edx <damo22>c1008cc3: 0f 84 3f ff ff ff je c1008c08 <spl7> <damo22>c1008cc9: 8b 04 95 20 df 0b c1 mov -0x3ef420e0(,%edx,4),%eax <damo22>the last instruction is the one that crashed <damo22>its trying to put 0xc10bdf20+%edx*4 into %eax <damo22>but edx contains a pointer to some other region instead of the spl index <AlmuHS>yes, but I don't know what means this value <damo22>i need to figure out why %edx is getting corrupted <damo22>ecx looks like a better value to use <damo22>i think its because we use sti in SETIPL <damo22>so an interrupt can occur from within there <damo22>i removed the sti from SETIPL and it crashes at the same place <AlmuHS>if paging is not enabled when you call this function, you have to use phystokv(address) to access a physical address <AlmuHS>but phystokv() only works with address below 0xC0000... <damo22>the edx register is set to 0xf9691000 which is the VA of the ioapic <AlmuHS>then you have waited until paging is enabled to map the address <damo22>yeah i mapped it exactly at the same place where lapic is mapped <damo22>but for some reason, the edx register is being clobbered inside <spl> <damo22>maybe when i enable the timer i should disable interrupts? <AlmuHS>with the pic, how was It solved? <damo22>so i implemented the same drop in replacement <AlmuHS>who wrote the linux drivers port? <damo22>i dont know why it cant just service an interrupt as soon as it arrives <damo22>instead of masking them back and forth all the time <damo22>its not a problem with the porting <damo22>its just that the irqs are considered so critical that it has 8 priority levels for them <damo22>i need to check if i masked them the right way around <damo22>since apic might be different masking polarity <youpi>damo22: you have to mask interrupts somehow to avoid seeing the interrupt handler being itself interrupted <youpi>masking all interrupts was not a good thing in the past, because the handling might take time, and other interrupts might have to be handled fast enough <youpi>nowadays' hardware behaves much better in that regard, so possibly we don't need to care any more <youpi>and just mask all interrupts during interrupt handling <damo22>i dont understand spl7, it just calls cli? <youpi>yes, i.e. it disables all interrupts <damo22>how would i test it with just those two spls <damo22>move all the entrypoints to point to ENTRY(spl7) except spl0? <damo22>booted super fast almost instantly, and crashed at EIP 0x3 <damo22>AHCI SATA 00:1f.2 BAR 0xfebf1000 IRQ 16 <damo22>im not sure if 16 is the correct irq <damo22>lspci shows PCI_INTERRUPT = 0x0a <damo22>arrgh linux irq.c only supports 16 <damo22>haha i can just let it write to pic <damo22>the problem is, pic only has 16 bits for masking <AlmuHS>but, is it a hack or a normal case? <damo22>i disabled all spls except 0 and 7 <damo22>it needs more work for the linux drivers in hurd to work <youpi>if you want to just try getting rid of spl 1-6, I'd say modify a non-patched debian kernel, to avoid mixing the issues <damo22>it was literally commenting out a few lines <damo22>to make the thing boot a ramdisk i just commented out the ahci_probe_pci call <damo22>so it wouldnt probe the pci card <damo22>ive pushed my latest code that kinda boots but breaks due to linux probing <damo22>i'll test this on real hw and then go to sleep <damo22>on real hw, but gets pretty far, all the way to enumerating device nodes <damo22>strange thing is, when i press a keyboard button it crashes immediately <youpi>damo22: if dropping spl1-6 seems to work fine when applied to a debian kernel, I can commit that to master <youpi>so that the rest can be cleaned up ***Glider_IRC__ is now known as Glider_IRC
<gnu_srs2>youpi: I'm working on cross-compiling gnumach,hurd,glibc, etc. Seems like tg-hurdsig-SA_SIGINFO.diff is not upstreamed yet, why? <youpi>because nobody took the time to review it <gnu_srs2> damo22: Doesn't combining your work with the smp development unnecessary complicated?