IRC channel logs

2023-01-25.log

back to list of logs

<damo22>start acpi:
<damo22>Sending IPI(0) to call TLB shootdown...done
<damo22>Kernel General protection trap, eip 0xc10315fd
<damo22>kernel: General protection (13), code=0
<damo22>Stopped at all_intrs+0x9: movl 0xc11e070c,%ebx
<damo22>all_intrs()
<damo22>db{1}>
<damo22>c10315fc: 50 push %eax
<damo22>c10315fd: 8b 1d 0c 07 1e c1 mov 0xc11e070c,%ebx
<damo22>that last address holds the "lapic" value
<damo22>it didnt fail on pushing to the stack because the stack pointer was not 0
<damo22>i dont know why it general protection faulted
<damo22>hmm now its crashing in splx:
<damo22>int3 /* Oops, interrupts got enabled?! */
<damo22>it could be switching to an available task during iret instruction
<damo22>?
<damo22>that could cause general protection fault
<damo22>youpi1: is it safe to call "sti" directly when i want to enable interrupts for the first time on an AP?
<damo22>or should i use spl() or something
<damo22>i think i should call spl0() ?
<youpi1>well, all the variables concerning interrupts need to be initialized somehow
<youpi1>the main CPU's code seem to use spl0() eventually to enable interrupts
<youpi1>but these variables still need to be properly initialized
<damo22>it boots when i dont enable the lapic timer on AP, but then the AP cpu sits in loop while bsp boots
<damo22>otherwise if i enable the lapic timer on AP, it now hits all_intrs ud2
<damo22>invalid opcode
<damo22>i can push my branch soon
<youpi>damo22: ud2 is not actually "invalid", in that it's meant to stop the cpu, because we've gotten to a bogus point, here $esp doesn't have the expected valkue
<youpi>so either it's actually a stack overflow (which I doubt), or $esp is just not properly initialized
<damo22>ok
<damo22>in cpu_setup, after all the steps, do i need to reset esp to int stack?
<damo22>paging is enabled
<damo22>or do i need to set esp in the cpuboot.S
<youpi>see how it's done for the main cpu
<youpi>it's set inside all_intrs
<youpi>(and it has nothing to do with paging)
<damo22> http://git.zammit.org/gnumach-sv.git/tree/i386/i386/mp_desc.c?h=feat-smp2-faults&id=2aa0427708f648b80c731520637f41f3420e289f#n259 it crashes at this point
<youpi>but AP need to have $esp set at some point, yes
<youpi>for the first value at least
<damo22>i do set it in cpuboot.S and it seems to pass that because it reaches the c code
<damo22>CPU_NUMBER(%edx)
<damo22>movl CX(EXT(int_stack_top), %edx), %esp
<damo22>andl $0xfffffff0, %esp
<damo22>When interrupts are pending in the IRR register, the local APIC dispatches them to the processor one at a time,
<damo22>based on their priority and the current processor priority in the PPR (see Section 10.8.3.1, “Task and Processor
<damo22>Priorities”).
<damo22>huh??
<damo22>so a timer interrupt on lapic 1 can go to cpu0 ??
<damo22>how do you make LVTT interrupts only go to the processor where the lapic is?
<youpi>I don't know, never programmed x86 smp
<youpi>but i'd be surprised that the lapic interrupt could go to another cpu
<youpi>+timer
<damo22>the docs are long, and yet so incomplete
<damo22>esp 0xc11d8e1c
<stixp>Hi there, when performing a `arm-none-eabi-objcopy -O binary`, the resulting file is totally empty, and I don't have any error code. Do you know cases where this would happen ?
<damo22>hurd isnt ported to arm afaik
<stixp>The generic `objcopy` is very different from the arm one ? Would you have in mind another channel more suited for this question to be asked ?
<damo22>AP=(1) reset page dir done
<damo22>lapic_enable();
<damo22>for(;;);
<damo22>ESP=c11d8fd0
<damo22>int_stack_base[1]=0xc11d9000
<damo22>int_stack_top[1]=0xc11d9ffc
<damo22>stack overflow
<damo22>should the int stacks be more than 0x1000 per cpu?
<damo22>maybe the ktss temp stack was allocated on the stack?!
<damo22>and filled it
<damo22>looks like esp is not set up right, it still overflows if i make the INTSTACK_SIZE bigger
<damo22>argh i had to run autoreconf -fi or my tree was out of whack
<damo22>some constants were not updated
<damo22>i verified there is a stack positioning problem
<damo22>but i checked the initial stack values and they appear correct
<damo22>stack pointers*
<damo22>somehow, just before the AP goes into lapic_enable_timer(), esp is pointing at something outside of its int stack
<damo22>which the interrupt handler determines is a stack overflow
<damo22>interrupt routines
<damo22>but at that point there have been no interrupts
<damo22>it must be that the routines in cpu_setup() used a lot of the stack and didnt release it?
<damo22>if i dont need to ever return from a function, can i reset the stack pointer in c?
<damo22>it seems the function i am in has chewed a lot of stack
<youpi>I don't understand: when you boot an AP, don't you tell which stack it should use?
<damo22>yes
<damo22>i have booted the AP with the correct stack
<damo22>but when it comes to turn on interrupts, the stack has overflowed
<youpi>then you shouldn't need more than that, and setting the int_stack_top array for all_intrs to copy that into esp
<youpi>note the comparison at the beginning of all_itrs
<youpi>all_intrs*
<youpi>« on an interrupt stack?  »
<youpi>that test looks bogus since it doesn't care about the CPU number
<youpi>it's supposed to test whether one is on the interrupt stack of the current CPU
<damo22>i have changed my code
<damo22>pushed
<damo22>feat-smp2-faults
<damo22>something must be switching to int_stack_base instead of int_stack_top
<damo22>i can work on this tomorrow as its a public holiday
<damo22>but its baffling
<softwar>I happened to see a roadmap at the HP conference debconf, is there a recommendation for ext4 yet?
<Pellescours>damo22: when you compile for smp, do you enable apic or not?
<Pellescours>because I just tried your branch feat-smp2-fault, and without apic, it seems to boot with rumpdisk IDE. Except than It seems to hang (or a step really takes times idk)
<Pellescours>Oh nevermind it’s feat-smp2-works
<Pellescours>I just tried to use gnumach with apic (and rumpdisk piixide, latest build myself) and it to works. I’m able to ssh on it. I just have some message "lost interupt" and "type: ata tc_bcount: 4096 tc_skip: 0"
<Pellescours>Ah poweroff command did not worked correctly, I had to hard reboot the VM
<Pellescours>gnumach with apic and no smp (for the 2 previous messages)