IRC channel logs

2019-11-05.log

back to list of logs

<damo22>AlmuHS: !
<AlmuHS>damo22 what do you need?
<damo22>i need a miracle
<AlmuHS>XD
<AlmuHS>what is your problem?
<damo22>i cant figure out why my code hangs
<damo22>it traps on invalid opcode
<AlmuHS>compilation or execution?
<damo22>maybe smp is not configured for interrupts?
<damo22>runtime
<AlmuHS>do you know to debug with gdb?
<damo22>nope
<AlmuHS>you can use remote debugging
<damo22>im not sure if it boots into that
<damo22>its failing on detecting devices
<AlmuHS>wait, I'll try to explain how to do It
<AlmuHS>do you need a copy of the gnumach binary which is running in the VM
<AlmuHS>and, if you have the sources directory, It will works better
<AlmuHS>then, in qemu, you have to configure some options in the script
<AlmuHS>you can see my script here: http://dpaste.com/1FDMWVK
<AlmuHS>I've set -no-reboot and -no-shutdown to allow see the error when gnumach hangs
<AlmuHS>and -S to debug
<AlmuHS>to start debugging, you need to turn on the VM
<AlmuHS>the VM will starts in pause
<AlmuHS>then, you have to enter in gdb, using remote debugging
***Server sets mode: +nt
<AlmuHS>do you know how to use gdb?
<damo22>yes
<damo22>thx
<AlmuHS>then, you will know how to debug after this point
<AlmuHS>as a note: if you starts gdb from build/ directory, you can see the exact line of code
<damo22>this will be very useufl
<damo22>useful
<AlmuHS>I usually use gdb to find the origin of some hangs
<AlmuHS>other useful thing is the qemu monitor
<damo22>yea i use that too
<AlmuHS>with my script, you can enter to this using "telnet 127.0.0.1 5555"
<damo22>i proved that ioapic is set up correctly using info lapic and info iopaic
<AlmuHS>in qemu monitor you can see the registers
<damo22>info ioapic
<AlmuHS>yes
<AlmuHS>I usually check registers to try to find problems
<AlmuHS>you can select an certain cpu using "cpu x" command
<AlmuHS>cpu 0, cpu 1...
<damo22>it didnt break
<AlmuHS>what?
<damo22>i saw IOAPIC configure...done
<damo22>but i set breakpoint on ioapic_configure
<AlmuHS>you have disabled kvm?
<damo22>no
<damo22>why is kvm related
<AlmuHS>with kvm, the VM running via hardware
<AlmuHS>so, to set a break, you have to use a hardware break, using the physical address of the function
<AlmuHS>you can use "objdump -d gnumach | less" to find the address, but It's so lazy
<damo22>nice!!
<damo22>i got it to break
<AlmuHS>and, with a hardware break, you have to remove the break to continue the running
<AlmuHS>you have to set the new hardware break, and remove the previous
<AlmuHS>because, with a hardware break, gdb keep in the same address, even when you type "next" or "continue"
<damo22>i wont use kvm
<AlmuHS>but, if you disable kvm in the machine, you can use simple break
<damo22>can i enable reboot
<AlmuHS>I prefer not
<damo22>it takes a lot of effort to set up
<damo22>every time
<AlmuHS>but, if you enable reboot, when gnumach hang, It will reboot very fast
<AlmuHS>and you will can't see the problem
<damo22>ok
<damo22>good point
<AlmuHS>I set -no-shutdown and -no-reboot
<AlmuHS>other point is that you can use EIP value to find the last instruction before the hang
<AlmuHS>using objdump -d gnumach | grep value
<damo22>oh wow
<AlmuHS>but you need to use a copy of the same binary you are running
<AlmuHS>same gnumach binary
<damo22>cool i found it
<AlmuHS>different compilation can produce different address
<damo22>it crashes in init_IRQ
<AlmuHS>can you find the exact instruction?
<AlmuHS>enter into the function, to find which instruction crash
<damo22>request_region (0x00, 0x20, "dma1");
<damo22>that function crashes
<AlmuHS>you can check these address?
<AlmuHS>or even enter in this new function
<damo22>i got a sigquit
<damo22>i think theres a timer running
<AlmuHS>maybe
<damo22>eip 00010060
<AlmuHS>objdump -d gnumach | grep 00010060
<damo22>empty
<AlmuHS>replace first zero with a c
<AlmuHS> c0010060
<damo22>empty
<AlmuHS>i don't know then
<AlmuHS>continues debugging with gdb
<damo22>after a while debugging i get randomly:
<damo22>Thread 1 received signal SIGQUIT, Quit.
<damo22>delay (n=1000000) at ../i386/i386/loose_ends.c:46
<damo22>46 DELAY(n)
<AlmuHS>yes, the crash is befire
<AlmuHS>when gnumach hangs, It happens
<damo22>why not make an infinite loop
<damo22>instead of a huge delay
<AlmuHS>I don't know
<AlmuHS>you can set more breaks, and take care doing next
<AlmuHS>advancing carefully, maybe you can find the instruction which crash
<damo22> delay(1000000); in mp_desc.c
<AlmuHS>this is after hang
<damo22>that is not causing the crash, but might be the cause of the delay
<damo22>i386/i386at/kd.c: delay(1000000);
<damo22>or that
<AlmuHS>when gnumach enter in loose_ends.c with delay, It is hanged
<AlmuHS>with kd.c is the same
<AlmuHS>you have to find the instructions previous to these
<damo22>ok
<AlmuHS>set more breaks, and advance with next
<AlmuHS>even you can use "step into" (s) to enter in the functions
<damo22>how do you display all breakpoints
<AlmuHS>info break
<damo22>thanks
<AlmuHS>you can delete break with "d" and the number of the break
<AlmuHS>and set new break with "b function"
<AlmuHS>you can even set a break in a exactly line into the function, I think that is with "b function+line"
<AlmuHS> https://www.tutorialspoint.com/gnu_debugger/gdb_commands.htm
<AlmuHS> https://web.eecs.umich.edu/~sugih/pointers/summary.html
<damo22>its wierd when i break on the function where it crashed last time it crashes without breaking
<damo22>when i call spl0(), it crashes regardless of the code after it
<AlmuHS>you can set a hardware breakpoint using the function address, but I don't know if It solved the problem
<AlmuHS>sometimes, gdb skips the break
<AlmuHS>also, you can set a break in the caller of the function
<damo22>ahh linux is trying to calibrate the delay
<damo22>but ive already started apic timer
<AlmuHS>you will have to bypass that
<AlmuHS>or imagine a solution to fix It
<damo22>dammit
<damo22>the smp code in linux assumes interrupts are routed only to BSP
<damo22>but i have configured the apic to be physically delivering the interrupts to whichever lapic grabs it
<AlmuHS>the IPI can be sent as broadcast (not recomended) or unicast
<AlmuHS>but I don't know about the rest of APIC interrupts
<damo22>im not using IPIs
<damo22>im using physical fixed interrupts
<AlmuHS>with apic_id
<AlmuHS>?
<damo22>yeah
<AlmuHS>then not broadcast
<damo22>i am delivering to ioapic[apic].apid_id
<AlmuHS>each IRQ are sent to a specific cpu
<damo22>yeah you can set it however you want
<AlmuHS>ok
<damo22>actually that makes no sense
<damo22>ioapic[apic].apic_id is not the lapic id
<AlmuHS>maybe you can bypass that, in linux code, the apic irq only will be sent to bsp
<AlmuHS>but only in linux code
<damo22>i need to deliver to lapci not to an ioapic
<damo22>lapic
<damo22>the interrupts source is the ioapic and destination is lapic
<damo22>i think i wired it up wrong
<AlmuHS>you can do that, in linux code, the destination will be always bsp lapic
<AlmuHS>if you need It, I refers
<damo22>dont worry i have all the docs
<damo22>is the BSP lapic id always 0
<AlmuHS>nope, but It's always the first in cpus array
<AlmuHS>machine_slot[0]
<AlmuHS>to get the apic_id of BSP you can use "machine_slot[0].apic_id"
<damo22>ok i have sent all irqs to BAS
<damo22>BSP
<damo22>same problem
<damo22>EIP 0010060
<damo22>00010060
<damo22>its not even a valid address
<damo22>when i use kvm it says EIP 0x3
<AlmuHS>EIP is wrong then
<damo22>i think its something to do with switching stacks for interrupts
<AlmuHS>there are a interrupt stack
<AlmuHS>each cpu has its own interrupt stack
<AlmuHS>in APs, the interrupt stacks are this
<AlmuHS> * Addresses of bottom and top of interrupt stacks.
<AlmuHS> */
<AlmuHS>vm_offset_t interrupt_stack[NCPUS];
<AlmuHS>vm_offset_t _int_stack_top[NCPUS];
<AlmuHS>in bsp this
<AlmuHS> * First cpu`s interrupt stack.
<AlmuHS> */
<AlmuHS>extern char _intstack[]; /* bottom */
<AlmuHS>extern char _eintstack[]; /* top */
<AlmuHS>the AP intstack are reserved in model_dep.c, with a call to interrupt_stack_alloc()
<AlmuHS>be careful to avoid use these stack before these are initialized
<damo22>err
<damo22>my timer starts delivering interrupts early
<damo22>and i guess it tries to enable interrupts
<AlmuHS>these stacks are necesary to execute the routine asociated with the irq
<damo22>oh ok
<damo22>so its jumping to some garbage address then
<damo22>when the interrupt happens
<AlmuHS>not really
<damo22>i mean currently
<AlmuHS>the interrupts has their own stack to avoid share default stack
<AlmuHS>but, by default, each cpu has loaded the default stack
<damo22>yeah but if your stack is not inited before i start using interrupts, it will brak
<damo22>break
<AlmuHS>at beginning of each interrupt, you might reload the eip with the interrupt stack
<damo22>its complicated for me
<damo22>i have a custom handler for 0xff spurious interrupt
<damo22>it skips calling the ivect
<AlmuHS>maybe youpi can explain better (but now I think he is sleeping)
<AlmuHS>ask him tomorrow
<damo22>ok
<damo22>im going back to work tomorrow
<AlmuHS>tomorrow is 5-6 hours later
<AlmuHS>now is 3:02 AM
<damo22>oh
<damo22>thanks for your help
<damo22>go to sleep!
<AlmuHS>ur welcome
<damo22>getting past the crash
<damo22>now its hung calibrating the timer
<damo22>the linux timer is not advancing
<damo22>ahh its reading the wrong lapic
<damo22>but how do you force it to read a single lapic timer value
<damo22>if the timer is stored inside each core
<damo22>arrgh
<damo22>since the bsp always services all the interrupts, i can make the EOI update the time value
<damo22>:D
<damo22>how do i ensure im on a specific cpu when i enable the lapic timer?
<damo22>yay jiffies has a value
<damo22>the time stamp seems to be reading from locore.S VA_ETC
<damo22>but i cant find that symbol anywhere
<damo22>ahh the lapic counter DECREMENTS until zero and then generates a timer irq
<damo22>225 lapic->spurious_vector.r |= LAPIC_ENABLE | IOAPIC_SPURIOUS_BASE;
<damo22>(gdb)
<damo22>t_page_fault () at ../i386/i386/locore.S:439
<damo22>i cant seem to use a global during early setup
<damo22>Thread 1 hit Breakpoint 1, lapic_init_ioapic () at ../i386/i386at/ioapic.c:225
<damo22>225 lapic->spurious_vector.r |= LAPIC_ENABLE | IOAPIC_SPURIOUS_BASE;
<damo22>(gdb) bt
<damo22>#0 lapic_init_ioapic () at ../i386/i386at/ioapic.c:225
<damo22>#1 ioapic_configure () at ../i386/i386at/ioapic.c:345
<damo22>#2 0xc10409a7 in extra_setup () at ../i386/i386at/acpi_rsdp.c:350
<damo22>#3 0xc1022bc7 in setup_main () at ../kern/startup.c:141
<damo22>#4 0xc1000074 in iplt_done () at ../i386/i386at/boothdr.S:92
<damo22>lapic-> points to some address that causes a page fault
<damo22>PHYS=0xfee00000 virtual = 0xf9690000
<damo22>why do i get a page fault when i try to access this address
<damo22>Backtrace stopped: Cannot access memory at address 0xfee00004
<damo22>(gdb) c
<damo22>Continuing.
<damo22>Thread 1 hit Breakpoint 1, lapic_init_ioapic () at ../i386/i386at/ioapic.c:340
<damo22>340 lapic_init_ioapic();
<damo22>(gdb) p lapic_init_ioapic
<damo22>$1 = {void (void)} 0xc100ca06 <ioapic_configure+134>
<damo22>(gdb) x lapic_init_ioapic
<damo22>0xc100ca06 <ioapic_configure+134>: 0x0cc24ca1
<damo22>(gdb) s
<damo22>t_page_fault () at ../i386/i386/locore.S:439
<damo22>?????
<damo22>calling the function page faults?
<damo22>Thread 1 hit Breakpoint 1, lapic_init_ioapic () at ../i386/i386at/ioapic.c:340
<damo22>340 lapic_init_ioapic();
<damo22>(gdb) bt
<damo22>#0 lapic_init_ioapic () at ../i386/i386at/ioapic.c:340
<damo22>#1 ioapic_configure () at ../i386/i386at/ioapic.c:340
<damo22>#2 0xc1022b9c in setup_main () at ../kern/startup.c:157
<damo22>#3 0xc1000074 in iplt_done () at ../i386/i386at/boothdr.S:92
<damo22>the function calls itself??
<damo22>youpi: the LAPIC registers are not updating, somehow the VA is not mapped
<youpi>damo22: which VA?
<damo22>"lapic"
<damo22>i need to push my latest commit
<youpi>I mean, which address exactly
<damo22>0xf9690000
<youpi>that's very high
<youpi>did you use pmap_get_mapwindow to access it?
<damo22>its mapped to 0xfee00000
<damo22>where the lapic is
<damo22>err
<youpi>"its mapped to" -> how did it get mapped?
<damo22>extra_setup()
<damo22>in acpi_rsdp.c
<damo22>it got saved to a global i think
<youpi>I mean which feature does extra_setup use to map it?
<youpi>where is your tree actually?
<damo22>i need to reboot my vm and then push
<damo22> https://github.com/zamaudio/GNUMach_SMP/tree/feat-ioapic
<damo22>the last commit is trying to fix the timer but its broken
<damo22>vm_map_physical(&virt, lapic_addr, sizeof(ApicLocalUnit), 0);
<damo22>that is how it maps the lapic
<youpi>I don't see anything wrong in vm_map_physical
<youpi>but you'd probably want to check that what pmap_enter does is indeed what you need
<damo22>at what point should i init the apic?
<damo22>it has to be before linux_init
<damo22>and it has to be after the lapic and ioapic addresses are mapped
<damo22>but i need to be on BSP
<damo22>how do i know if im on BSP?
<youpi>i386at_init and its callee are on the BSP
<youpi>(c_boot_entry actually)
<youpi>the thing is: you need vm mapping to map apics, so you can't do that before vm_mem_bootstrap() anyway
<damo22>woot
<damo22>ahci irq16
<damo22>kernel page fault
<damo22>timer works!
<damo22>LVT0 0x00010000 active-hi edge masked Fixed (vec 0)
<damo22>LVT1 0x00010000 active-hi edge masked Fixed (vec 0)
<damo22>LVTPC 0x00000400 active-hi edge NMI
<damo22>LVTERR 0x00010000 active-hi edge masked Fixed (vec 0)
<damo22>LVTTHMR 0x00010000 active-hi edge masked Fixed (vec 0)
<damo22>LVTT 0x00020030 active-hi edge periodic Fixed (vec 48)
<damo22>Timer DCR=0x3 (divide by 16) initial_count = 14669
<damo22>SPIV 0x000001ff APIC enabled, focus=off, spurious vec 255
<youpi>what was missing?
<damo22>i needed to disable the IOAPIC from the beginning because it was enabled at boot
<damo22>so move lapic->spurious_vector.r setting to the top of the function
<damo22>that is all
<damo22>pushed the fix
<damo22>i might try a lower divide setting now
<damo22>more accurate timer!
<damo22>when i use timer divide 8 i get a lower count
<damo22>wth
<damo22>i was using timer divide 16
<damo22>is that expected?
<damo22> https://i.imgur.com/yKRBU9t.png
<damo22>$ objdump -d gnumach |grep c1008cc9
<damo22>c1008cc9: 8b 04 95 20 df 0b c1 mov -0x3ef420e0(,%edx,4),%eax
<damo22>c10bdf20 <pic_mask>
<damo22>but EDX = some large value
<damo22>AlmuHS: :) https://i.imgur.com/yKRBU9t.png
<AlmuHS>kernel panic
<damo22>yeah but i have a working timer now
<AlmuHS>yes
<damo22>c1008cc9: 8b 04 95 20 df 0b c1 mov -0x3ef420e0(,%edx,4),%eax
<AlmuHS>be careful in IRQ redirection. In the point that you shows, the AP hasn't be enabled yet
<AlmuHS>so, at this point, you need to redirect all IRQ to BSP
<damo22>all IRQs are enabled in BSP only
<AlmuHS>ok
<damo22>c1008cc0 <spl>:
<damo22>c1008cc0: 83 fa 07 cmp $0x7,%edx
<damo22>c1008cc3: 0f 84 3f ff ff ff je c1008c08 <spl7>
<damo22>c1008cc9: 8b 04 95 20 df 0b c1 mov -0x3ef420e0(,%edx,4),%eax
<AlmuHS>what is this?
<damo22>the last instruction is the one that crashed
<AlmuHS>a simple jump?
<damo22>no the mov
<AlmuHS>i see
<damo22>its trying to put 0xc10bdf20+%edx*4 into %eax
<damo22>but edx contains a pointer to some other region instead of the spl index
<AlmuHS>yes, but I don't know what means this value
<damo22>pic mask index
<damo22>i need to figure out why %edx is getting corrupted
<damo22>ecx looks like a better value to use
<damo22>lol
<damo22>i think its because we use sti in SETIPL
<damo22>so an interrupt can occur from within there
<AlmuHS>maybe you need to use cli
<damo22>gah
<damo22>i removed the sti from SETIPL and it crashes at the same place
<AlmuHS>check the addressing
<AlmuHS>if paging is not enabled when you call this function, you have to use phystokv(address) to access a physical address
<AlmuHS>but phystokv() only works with address below 0xC0000...
<damo22>the edx register is set to 0xf9691000 which is the VA of the ioapic
<damo22>i mapped it like lapic
<AlmuHS>ok
<damo22>but edx should not be used
<AlmuHS>then you have waited until paging is enabled to map the address
<damo22>yeah i mapped it exactly at the same place where lapic is mapped
<damo22>same function
<damo22>but for some reason, the edx register is being clobbered inside <spl>
<damo22>it should not be possible
<damo22>maybe when i enable the timer i should disable interrupts?
<damo22>with cli
<AlmuHS>try it
<damo22>linux drivers suck in hurd
<AlmuHS>linux is vero chaotic
<AlmuHS>*very
<damo22>it rearranges all the irqs
<damo22>reschedules them
<damo22>flipping the irq masks all day
<AlmuHS>with the pic, how was It solved?
<damo22>it just toggles them like crazy
<damo22>so i implemented the same drop in replacement
<damo22>using apic
<damo22>hoping it would work
<AlmuHS>who wrote the linux drivers port?
<damo22>i dont know why it cant just service an interrupt as soon as it arrives
<damo22>instead of masking them back and forth all the time
<damo22>its not a problem with the porting
<damo22>its just that the irqs are considered so critical that it has 8 priority levels for them
<AlmuHS>youpi can you help damo22 ?
<damo22>i need to check if i masked them the right way around
<damo22>since apic might be different masking polarity
<youpi>damo22: you have to mask interrupts somehow to avoid seeing the interrupt handler being itself interrupted
<youpi>masking all interrupts was not a good thing in the past, because the handling might take time, and other interrupts might have to be handled fast enough
<youpi>nowadays' hardware behaves much better in that regard, so possibly we don't need to care any more
<youpi>and just mask all interrupts during interrupt handling
<youpi>thus spl0 and spl7 only
<damo22>i dont understand spl7, it just calls cli?
<youpi>yes, i.e. it disables all interrupts
<youpi>(except NMI of course)
<damo22>how would i test it with just those two spls
<damo22>move all the entrypoints to point to ENTRY(spl7) except spl0?
<youpi>probably, yes
<damo22>booted super fast almost instantly, and crashed at EIP 0x3
<damo22>AHCI SATA 00:1f.2 BAR 0xfebf1000 IRQ 16
<damo22>im not sure if 16 is the correct irq
<damo22>as i have no ACPI parser
<damo22>lspci shows PCI_INTERRUPT = 0x0a
<damo22>im guessing that is PIRQA
<damo22>hmm irq17 seems to work better
<damo22>arrgh linux irq.c only supports 16
<damo22>lol
<damo22>oh crap, it writes to pic
<damo22>its a real mess
<damo22>haha i can just let it write to pic
<damo22>its useless but does nothing
<damo22>since i am routing through apic
<damo22>the problem is, pic only has 16 bits for masking
<damo22>and i want to enable 24 irqs
<damo22>lol
<damo22> https://i.imgur.com/EUIOydd.png WOT
<damo22>youpi: ^^
<youpi>wow
<damo22>i think just timer irq
<youpi>looks so yes
<damo22>i ran it off a ramdisk
<damo22>AlmuHS: it boots off a ramdisk (almost) https://i.imgur.com/EUIOydd.png
<AlmuHS>but, is it a hack or a normal case?
<damo22>i disabled all spls except 0 and 7
<damo22>it needs more work for the linux drivers in hurd to work
<youpi>if you want to just try getting rid of spl 1-6, I'd say modify a non-patched debian kernel, to avoid mixing the issues
<damo22>ok
<damo22>it was literally commenting out a few lines
<damo22>to make the thing boot a ramdisk i just commented out the ahci_probe_pci call
<damo22>so it wouldnt probe the pci card
<damo22>ive pushed my latest code that kinda boots but breaks due to linux probing
<damo22>i'll test this on real hw and then go to sleep
<AlmuHS>ok, good night
<damo22>page fault EIP=0
<damo22>on real hw, but gets pretty far, all the way to enumerating device nodes
<damo22>strange thing is, when i press a keyboard button it crashes immediately
<damo22>night
<youpi>damo22: if dropping spl1-6 seems to work fine when applied to a debian kernel, I can commit that to master
<youpi>so that the rest can be cleaned up
<AlmuHS_>hi
<AlmuHS>anyone talked me in private
***Glider_IRC__ is now known as Glider_IRC
<gnu_srs2>youpi: I'm working on cross-compiling gnumach,hurd,glibc, etc. Seems like tg-hurdsig-SA_SIGINFO.diff is not upstreamed yet, why?
<gnu_srs2>At least not in 2.28 or 2.29.
<youpi>because nobody took the time to review it
<z3ntu>youpi: Do you know of any other documentation on how to set up the hurd console from scratch (other than https://www.gnu.org/software/hurd/hurd/console.html )?
<youpi>no
<gnu_srs2> damo22: Doesn't combining your work with the smp development unnecessary complicated?