IRC channel logs

2019-11-02.log

back to list of logs

***Server sets mode: +nt
<youpi>AlmuHS: as I mentioned, entering the debugger to print th elist of threads would be insightful
<AlmuHS>really, I forgot this
<youpi>if you tend to forget, take the habit of taking notes
<AlmuHS>yes
<AlmuHS>youpi: the kdb output http://paste.debian.net/1113018/
<AlmuHS> http://paste.debian.net/1113019/
<AlmuHS>couldn't check the cpus execution with gdb?
<AlmuHS>sometimes, I use gdt to see what thread are executing in each cpu
<AlmuHS>*gdb
<AlmuHS>mmm... but with gdb I need to know where put the breakpoint
<AlmuHS>I go to sleep. I'll continue tomorrow
<youpi>from your information, I'm thinking that perhaps gnumach is stuck between only 2 of the 3 threads of ext2fs
<youpi>that for some reason the scheduler in the bound_processor case doesn't actually properly share cpu time evenly between threads
<youpi>and thus the third thread of ext2fs can't progress, thus keeping the whole boot stuck
<youpi>I'll read the thread_quantum_update() function, it seems to behave differently for bound_processor queues and pset queues
<AlmuHS>ok. I'm thinking in the cpu 1 configuration too
<AlmuHS>I have many doubts about if APs stack is assigned properly
<AlmuHS>I was talking with jrtc27 about this
<damo22>what is stuck?
<damo22>btw, the lapic is addressed at the same location on all CPUs
<AlmuHS>stack
<youpi>damo22: when we bind threads on cpu0, the system doesn't boot
<damo22>but you have to know which cpu is accessing the lapci
<youpi>(cpu1 never takes threads yet anyway)
<damo22>oh i see
<damo22>maybe interrupts are b0rked
<youpi>damo22: "at the same location": you mean that the cpu itself catches memory accesses where its lapic is?
<AlmuHS>damo22: yes, because the lapic pointer is common to all cpus. The lapic pointer shows the local apic of the processor who is running the thread
<damo22>the lapic has a common address on all cpus
<youpi>no, things go well if we don't bind threads (and they get executed on cpu0 only since cpu1 doesn't take them)
<damo22>but there is 1x lapic per core
<damo22>iiuc
<AlmuHS>damo22: I've just explained that
<AlmuHS>when you access to lapic pointer, you can see the Local APIC of the processor which is running your thread
<damo22>right
<AlmuHS>each processor has Its own lapic, but the pointer is common, by this reason
<damo22>im going to try removing spl1-6
<damo22>and just have 2 priorities
<damo22>it will highly simplify the interrupt handling
<AlmuHS>I don't understand about interrupts, sorry :(
<damo22>dont worry, me either
<damo22>all i know is what i read in the code
<AlmuHS>we are playing like kids ;)
<AlmuHS>we are playing with the code
<damo22>thats what hacking is
<AlmuHS>youpi: I have many doubts about the cpu 1 (and later) stuck is assigned properly. See https://github.com/AlmuHS/GNUMach_SMP/blob/wip/i386/i386/cpuboot.S#L88-L90
<damo22>i think its valid to call "apic_local_unit.someregister.r = value" for setting lapic regs
<youpi>it looks good, provided that stack_ptr is a variable containing the address of the stack to be used
<AlmuHS>the memory reserve is here: https://github.com/AlmuHS/GNUMach_SMP/blob/wip/i386/i386/mp_desc.c#L576-L597
<youpi>that should be working for now
<damo22>to flag end of an interrupt, is that done in the lapic?
<AlmuHS>in lapic, I haven't configured anything
<damo22>how does setting "apic_local_unit.eoi.r = 0" know which interrupt it is ending?
<AlmuHS>I only used ICR register to send IPIs
<damo22>does the lapic only handle one interrupt at a time?
<AlmuHS>I suppose, but I don't remember
<AlmuHS>read Intel guides, chapter 10, which explain how APIC works
<AlmuHS> https://software.intel.com/sites/default/files/managed/a4/60/325384-sdm-vol-3abcd.pdf
<AlmuHS>we are working with xAPIC, assuming Pentium 4 or later. Ignore x2APIC or Pentium III configurations
<AlmuHS>or P6 configurations
<damo22>its very confusing doc
<damo22>it says there are two ways to enable/disable lapic
<damo22>but the first way cannot be reenabled
<AlmuHS>the lapic is enabled
<damo22>so you have to ignore it
<AlmuHS>you don't worry about this
<damo22>its not enabled you need to set the spurious vector to 0x1ff
<AlmuHS>you can print spurious vector after reserve lapic
<AlmuHS>and check Iy
<AlmuHS>check It
<damo22>0x100 | 0xff
<damo22>enable | spurious_choice
<AlmuHS>ok, then you can set It
<damo22>i have
<AlmuHS>you can also read the old MP Specification https://web.archive.org/web/20121002210153/http://download.intel.com/design/archives/processors/pro/docs/24201606.pdf
<damo22>im not going to use MP tables
<AlmuHS>you must ignore all MP Tables chapter
<AlmuHS>but the rest of chapters has useful info
<damo22>ah ok
<AlmuHS>you can read It as overview, and then review with Intel guide
<damo22>i want to run my ideas past someone
<AlmuHS>read chapter 3 of MP spec
<AlmuHS>or even chapter 2 and 3
<youpi>AlmuHS: one doubtful line is myprocessor->quantum = min_quantum; in thread_select()
<AlmuHS>youpi: I remember this line. Hasn't put any print there?
<youpi>you did
<youpi>I'm saying that perhaps this line should be removed
<youpi>because it refills the processor's quantum each time it switches between threads bound to it
<youpi>while it should be the hardclock which refills it
<AlmuHS>yes
<damo22>1. Set ISA interrupts active high edge triggered
<damo22>2. Set PCI interrupts active low level triggered
<damo22>3. Enable the lapic spurious vector to start receiving interrupts
<damo22>Create a mask function that masks/unmasks interrupts
<damo22>Create a EOI function that sends end of interrupt
<damo22>what else am i missing?
<AlmuHS>youpi: now I remember that cpu frequency is not well calculated. Some time ago, I saw an calculate of this, and It was very dirty
<AlmuHS>but I don't remember where I saw It
<youpi>that depends what the calculation is used for
<AlmuHS>see i386/i386/loose_ends.c
<youpi>bogomips, for instance, are completely suited to their own use
<AlmuHS>the delay function is so so dirty
<AlmuHS>int cpuspeed = 4;
<AlmuHS>#define DELAY(n) { volatile int N = cpuspeed * (n); while (--N > 0); }
<AlmuHS>void
<AlmuHS>delay(int n)
<youpi>the implementation actually is correct
<AlmuHS>{
<AlmuHS> DELAY(n);
<AlmuHS>}
<youpi>it's establishing the value of "cpuspeed" which is not
<youpi>but again it's what I mentioned above: bogomips
<AlmuHS>but cpuspeed might not be a fixed value, I think
<youpi>sure, that's what the comment says
<AlmuHS>maybe, when APIC timer will be configured, we can use It to define a real delay
<damo22>how many clock cycles does it take to decrement a register?
<AlmuHS>damo22: It depends of the architecture, I think
<damo22>you could use rdtsc?
<AlmuHS>what is this?
<damo22>i think the cpu has a timer
<damo22>with its own cpu op code to read it
<AlmuHS>It's possible
<damo22>not ideal for small delays most likely
<AlmuHS>the cpu has a clock, obviously. It's a synchronous system
<youpi>yes, rdtsc is the instruction to read the tsc (time stamp counter)
<youpi>which counts in terms of the processor's smallest unit of time
<youpi>that indeed allows to have very precise time measurement
<AlmuHS>the timer is needed to calculate the quantum, is not?
<youpi>... provided that you know precisely the cpu speed
<youpi>the pit is needed for that yes
<youpi>or equivalently, an apic
<AlmuHS>yes, apic has its own timer
<damo22>the TSC just counts number of cycles? so you still need to know the speed of the clock
<AlmuHS>but damo22 asked about the cpu pit, I think
<youpi>damo22: yes
<youpi>the cpu doesn't have a timer
<youpi>only a clock
<damo22>you can read it out of the MC
<damo22>bus speed?
<youpi>you still need to know when it changes
<youpi>sometimes it's even the motherboard which changes it
<AlmuHS>yes
<damo22>yes but the memory controller will have to change too
<damo22>so the register in the chipset will be accurate
<AlmuHS>maybe It's a hardware change
<youpi>damo22: yes but you need to know _when_ it changes
<damo22>it changes at runtime?
<youpi>yes
<damo22>...
<youpi>to accomodate e.g. too high temperature
<damo22>ohh
<damo22>maybe APIC timer is calibrated to this already
<AlmuHS>apic has a LVT with some interrupts, as temperature
<youpi>AIUI apic has its own clock
<AlmuHS>lapic, exactly
<youpi>so it's not disturbed by such changes
<AlmuHS>yes
<youpi>just like the pit had
<AlmuHS> https://pasteboard.co/IEKLraE.png
<AlmuHS>lapic structure, get from Intel guide
<AlmuHS>the lapic table is so long to screenshot ;)
<youpi>AlmuHS: did you try to comment out the line I mentioned?
<AlmuHS>no yet. But I goes
<AlmuHS>pushed
<damo22>test it first!
<youpi>in his workflow he needs to push before testing
<damo22>oh ok
<AlmuHS>yes, I use git to transfer to VM
<damo22>fair enough
<AlmuHS>I can share with my contributors too
<youpi>I'm wondering though why not building gnumach outside the VM and just transferring the built kernel
<damo22>cross build is difficult you need a cross-hurd toolchain
<youpi>no
<youpi>for a kernel you don't need a cross toolchain
<damo22>oh
<youpi>here you just need a 32bit environmen
<youpi>either a 32bit chroot
<youpi>or pass -m32 to gcc
<youpi>./configure --host=i686-gnu CC='gcc -m32' LD='ld -melf_i386'
<damo22>interesting
<AlmuHS>I do this to test compilation. But, in definitive compilation, I prefered compile inside Hurd, to avoid architecture compilation problems
<damo22>i just develop on the hurd box
<damo22>using vim
<damo22>over ssh
<AlmuHS>by example, the damo22 patches only compiles inside Hurd VM
<youpi>it's mentioned on the hurd wiki, in the "building" page of gnumach
<damo22>AlmuHS: they are debian patches not mine
<damo22>(mostly)
<AlmuHS>when I tried to compile the code with damo22 patches from Debian GNU/Linux, It don't generate a required library
<youpi>really, gnumach should be compilable on any GNU-ish system
<youpi>I do this all the time when hacking gnumach
<AlmuHS>I can compile my gnumach from Debian GNU/Linux
<damo22>maybe the include paths are different
<damo22>on hurd
<youpi>there is no include path involved
<AlmuHS>but, adding the debian patches, don't generate this library
<youpi>since it's a freestanding build
<damo22>hmm
<damo22>which library AlmuHS
<AlmuHS>I said you in the PR comments
<AlmuHS>I go to recover It
<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/pull/10
<AlmuHS>kern/experimental.server.h
<AlmuHS>It doesn't exists as a file, It's automatically generated in compilation
<AlmuHS>directly in build/
<damo22>do you need mig to compile gnumach?
<youpi>yes
<damo22>AlmuHS: maybe you are missing mig
<youpi>a 32bit mig
<AlmuHS>I have installed It in Debian GNU/Linux
<AlmuHS>if not, I couldn't compile anything from gnumach
<AlmuHS>I couldn't compile upstream gnumach if I hadn't mig
<damo22>maybe check the build log to see if there was a failure
<damo22>generating the file
<AlmuHS>mig/testing,now 1.8-5 i386 [instalado]
<AlmuHS> GNU Mach Interface Generator
<AlmuHS>youpi: new log http://paste.debian.net/1113024/
<AlmuHS>damo22: compile log http://dpaste.com/15NGX2N
<AlmuHS>this file, inside Debian GNU/Hurd, It's generated without anyproblem
<AlmuHS>but, in Debian GNU/Linux, not
<damo22>is Makefile.in supposed to be committed?
<damo22>i thought it is generated from Makefile.am
<AlmuHS>Makefrag.am:565: kern/experimental.server.h \
<youpi>AlmuHS: did you try to comment out these prints? (possibly it's actually booting but you cannot see that)
<youpi>(or it's getting too slow due to the prints)
<AlmuHS>ok, I go to disable all these prints
*youpi acpi off
<AlmuHS>acpi_setup() off ?
<damo22>AlmuHS: i think you need to run "autoreconf -fi" in the source tree before you start
<damo22>the build
<AlmuHS>I do autoreconf --install
<AlmuHS>but wait, I'm in other battle
<damo22>all good
<AlmuHS>3:12 AM in spain
<damo22>:|
<damo22>1:12pm in australia
<AlmuHS>what timezone?
<damo22>+11
<AlmuHS>in france is 3:13 AM too
<damo22>poor guys
<AlmuHS>my workaround https://pasteboard.co/IEKZpot.png
<AlmuHS> https://pasteboard.co/IEL03dv.png
<damo22>mach_ncpus=2 in configfrac.ac
<damo22>is that on purpose?
<AlmuHS>yes
<AlmuHS>is the way to enable smp support
<damo22>ok
<damo22>but NCPUS =2 does not work for 4 cores?
<damo22>oh its overriden to 255
<AlmuHS>in origin, the number of cores might be set in mach_ncpus, which define NCPUS
<AlmuHS>change all of this It was so complex
<damo22>yeah ok makes sense
<AlmuHS>so I prefered that, if the user set mach_ncpus > 1, override NCPUS to 255
<AlmuHS>the maximum number of cores allowed in xAPIC
<damo22>yep
<AlmuHS>at this way, furthermore, we can enable or disable smp easily
<AlmuHS>(excuse my spanglish)
<AlmuHS>youpi: after comment the prints, gnumach continues freezing after ext2fs
<AlmuHS>I'm exhausted, now really I need to go sleeo
<AlmuHS>*sleep
<AlmuHS>I keep IRC in my mobile
<damo22>goodnight
<AlmuHS>I keep my IRC online in my mobile, to avoid lost messages
<AlmuHS>I shutdown my PC now
<damo22>i made a shit ton of changes to gnumach and it compiled almost first shot! what the hell
<damo22>ahh failed to link, two missing symbols
<damo22>can gnumach access physical addresses at 0xfec00000?
<damo22>or does it need to be vm mapped?
<damo22>i think it needs to be vm mapped like the lapic
<damo22>lol i forgot to call the main routine i wrote
<youpi>damo22: due to the address shifting, it needs to be vm mapped indeed
<mabox>Savvy thou the Cristosan.com
<damo22>youpi: i have a kernel with smp + apic + linux devices, but it hangs in linux_init
<youpi>do you need linux' device drivers?
<damo22>i just fixed something i am recompiling now
<damo22>i need disk driver
<youpi>k
<damo22>and ethernet uses linux irqs
<youpi>does it?
<damo22>netdde is not disconnected from linux irqs
<youpi>urgl
<youpi>I hadn't realized that, we need to fix that
<damo22>yeah
<damo22>i started it but commented it out
<youpi>I guess that was to hopefully implement irq sharing
<youpi>but AIUI that's not working anyway
<youpi>so we don't need that actually
<damo22>indeed
<damo22>dang still stuck in linux_init
<youpi>init_IRQ programs the PIT as well
<damo22>pit?
<damo22>is that a timer?
<youpi>it's the hardware timer yes
<damo22>so if i disabled the pic, i need to disable the pit too?
<youpi>I don't know
<youpi>perhaps the pit is also connected to the apic
<youpi>I'm just raising that this could be an issue
<damo22>yeah i didnt even think about it
<damo22>it got past init_timers
<youpi>some driver probe functions might need linux timers to be working
<youpi>i.e. the PIT interrupt handlers to be properly called for linux timers to work
<youpi>I guess you could try to disable all drivers except ahci
<youpi>ahci will at least need its irq working
<damo22>if the pit timer is not working will the system function?
<youpi>no
<youpi>because time will not advance
<damo22>ok
<youpi>but I believe that the ide/ahci driver can work even if you don't manage to plug the pit timer to the linux timer
<youpi>what mach needs is hardclock() to be called regularly
<damo22>ok
<damo22>i did not initialise the LAPIC to a good working state
<damo22>the PIT is a separate circuit that can be used to measure the LAPIC timer frequency
<damo22>if the pit is a separate circuit, then the hardclock can be based on the pit
<youpi>yes, though in the long run with smp we will need the lapic to provide it, because all processors need to have such regular interruption
<youpi>(that's what is missing for cpu1 to take tasks
<damo22>if the cpu frequency changes, does the lapic timer fluctuate in speed?
<damo22>if i calibrate the apic timer with the pit, will i need to do it regularly when the cpu frequency changes?
<damo22>or can i just do it once
<damo22>since the pit runs off a clock that is always 1,193,182Hz i can tell it to sleep for a known amount of time and then measure the number of ticks of the apic timer that elapsed
<damo22>all i have left is to write a function that tells the PIT to sleep for 10ms
<damo22>the apic timer should now interrupt regularly on irq0, it doesnt need to be triggered manually
<damo22>omg it booted with Linux drivers and APIC timer
<damo22>slowww
<damo22>AlmuHS: hi
<AlmuHS>hi
<AlmuHS>did you get to configure APIC timer?
<damo22>yes
<AlmuHS>and It works?
<damo22>i will push now
<damo22>it boots but something is wrong
<AlmuHS>what's?
<damo22>the interrupts are not quite right for the disk driver
<damo22>so i have no disk
<AlmuHS>oops
<damo22>i wish i can get a boot log
<AlmuHS>you can redirect the output to the terminal
<damo22>i am on native hw
<AlmuHS>youpi explained me how to do It yesterday
<damo22>not qemu
<AlmuHS>test It in Qemu to a better hardware control
<AlmuHS>to discarding incompatibility problems
<AlmuHS>the Mach SATA controller is originally from Linux
<AlmuHS>but may IDE controller not
<AlmuHS>drivers I said
<damo22>i rebased back onto my old branch without your thread_info changes
<damo22>i have not tried with your changes
<damo22>i wanted to start from a smp that worked
<damo22>i just pushed my latest branch
<AlmuHS>It's not problem
<AlmuHS>my thread_info patch is not related with this
<damo22>but it stopped my smp booting on x200
<damo22>i tried with your latest branch i think
<AlmuHS>my latest branch is wip
<damo22>ok
<AlmuHS>smp is production brach
<AlmuHS>but wip is a few unstable, because I use this to synchronize with my VM
<damo22>but you merged wip with zamaudio-gcc9-smp-debian
<damo22>which one should i use to test my apic code?
<AlmuHS>zamaudio branch is only for test
<damo22>i added the debian patches on top of smp branch
<damo22>as one patch
<AlmuHS>If I merged back zamaudio inside wip It was an accident
<damo22>ah ok
<AlmuHS>smp is the stable branch
<AlmuHS>I haven't merged my thread_info changes into smp branch
<AlmuHS>but it's not a problem
<damo22>you should try my branch
<damo22>it compiles and almost runs
<AlmuHS>now I have my laptop shutdown
<damo22>ok
<AlmuHS>but I can turn over if necessary
<AlmuHS>turn on
<damo22>i have two laptops
<damo22>one for testing and one for developing
<AlmuHS>to testing smp I have T60 and R60e
<damo22>but i use the developer one as a dumb terminal into the testing one
<AlmuHS>to developing, x230
<damo22>hehe
<damo22>i have x220
<AlmuHS>I also have an T410, but Hurd hasn't NIC drivers for It
<damo22>i think netdde should support that
<AlmuHS>but I don't get It
<AlmuHS>wait, I change to laptop
<AlmuHS>now in laptor
<AlmuHS>*laptop
<damo22>i disabled clkstart()
<damo22>because i was using the apic timer
<AlmuHS>yes
<damo22>im cheating by using the initrd from the installer, so i dont need a disk
<AlmuHS>ok
<damo22>but the screen disappears after it boots
<AlmuHS>maybe It missing any driver
<damo22>the linux ahci is almost working
<damo22>it just fails to detect a disk
<damo22>interrupts....
<AlmuHS>cloning your repo in my qemu VM
<damo22>:)
<AlmuHS>compiling
<AlmuHS>you missed this patch, I think
<AlmuHS>../linux/src/include/linux/compiler-gcc.h:96:1: fatal error: linux/compiler-gcc9.h: No such file or directory
<AlmuHS> 96 | #include gcc_header(__GNUC__)
<AlmuHS> | ^~~~
<damo22>you need to check out feat-ioapic
<damo22>the default clone doesnt give the right branch
<AlmuHS>what is the path of compiler-gcc9.h ?
<damo22>linux/src/include/linux/compiler-gcc9.h
<AlmuHS>ty
<AlmuHS>ok, now I'll change branch
<damo22>ive done tons of work in branch feat-ioapic
<damo22>git diff gcc9-smp-debian..feat-ioapic
<damo22>i found a couple of bugs in your code too
<damo22>missing prototypes which could cause missing symbols
<damo22>check that diff
<AlmuHS>configure: error: failed to patch using `config.status.dep.patch'.
<AlmuHS> You have a serious problem. Please contact <bug-hurd@gnu.org>.
<damo22>huh
<damo22>what does git status return?
<AlmuHS>On branch feat-ioapic
<AlmuHS>Your branch is up to date with 'origin/feat-ioapic'.
<damo22>ok
<AlmuHS>wait
<damo22>autoreconf -fi
<AlmuHS>yes, I've just remember It
<AlmuHS>solved
<damo22>yea because i added some config
<AlmuHS>pruebas@debian:~/GNUMach_SMP_zamaudio/build$ make clean
<AlmuHS>linux/src/drivers/scsi/.deps/liblinux_a-aha152x.Po:1: *** missing separator. Stop.
<damo22>err
<damo22>im not sure you can do make clean after autoreconf
<AlmuHS>same error with "make gnumach.gz"
<damo22>i usually clobber the build dir
<AlmuHS>remove build/ worked
<damo22>yeah
<AlmuHS>compiling
<damo22>i think enable_irq and disable_irq have extra prototypes that are causing issues
<damo22>im compiling too
<AlmuHS> http://dpaste.com/0Q65DSK
<AlmuHS>damo22: the log of your gnumach
<damo22>woooho
<damo22>nice log
<damo22>its definitely interrupt related
<AlmuHS>yes
<AlmuHS>at first lines of the log, we can see many IRQ probe failed
<damo22> /* Mmmm.. multiple IRQs.. don't know which was ours */
<AlmuHS>all IRQ failed are disks related
<damo22>that is because the only driver using an irq is the disk at that point
<damo22>maybe the timer is not sending the EOI
<damo22>end of interrupt
<damo22>so its flooding the interrupts
<damo22>the timer interrupt handler
<AlmuHS>It's possible
<AlmuHS>youpi: tonight, after disabled all prints, the system continues freezing in ext2fs
<AlmuHS> http://dpaste.com/1BQ9TJ2
<AlmuHS>checking the log, I've just find this lie
<AlmuHS>line
<AlmuHS>probing scsi 18/22: Western Digital WD-7000 Failed initialization of WD-7000 SCSI card!
<damo22>jz means jump if condition is zero?
<youpi>if last computation resulted to zero
<AlmuHS>yes
<AlmuHS>if some op returns zero
<AlmuHS>of if zero flag is enabled
<AlmuHS>*or if
<AlmuHS>I've just got compile my smp kernel with your patches from linux
<youpi>AlmuHS: the WD detection failure is not a problem, you most probably don't have a WD card :)
<AlmuHS>yes
<AlmuHS>but we continues with the same ext2fs freeze
<youpi>what you need to find out is why gnumach only executes two of the three threads of ext2fs
<AlmuHS>if I could to find a good breakpoint, I could use gdb to debug
<AlmuHS>but I don't know where put the breakpoint
<youpi>well, were the thread is being chosen?
<youpi>i.e. what we have been observing, thread_select()
<youpi>and more precisely, choose_thread()
<AlmuHS>oh, ok
<damo22>patches from linux?
<AlmuHS>damo22: tonight, I was talking about I couldn't compile my smp microkernel with your patches, from my Debian GNU/Linux
<damo22>ohh
<AlmuHS>because an compilation error
<AlmuHS>now I've just get It
<damo22>good
<AlmuHS>this is interesting
<AlmuHS>(gdb) info threads
<AlmuHS> Id Target Id Frame
<AlmuHS> 1 Thread 1.1 (CPU#0 [running]) 0xc1007798 in delay (n=1000000)
<AlmuHS> at ../i386/i386/loose_ends.c:46
<AlmuHS>* 2 Thread 1.2 (CPU#1 [running]) choose_thread (
<AlmuHS> myprocessor=0xc122022c <processor_array+588>) at ../kern/sched_prim.c:1519
<AlmuHS>in the first stop with gdb
<damo22>cool!
<AlmuHS>youpi: It seems that, in this point, It's cpu 1 which is executing choose_thread(
<AlmuHS>meanwhile cpu 0 is in a big delay
<AlmuHS>ok, after a couple of next, cpu 1 are in cpu_launch_first_thread
<AlmuHS>and It seems that cpu 1 calls startrtclock()
<AlmuHS>cpu 1 never wake up after machine_idle
<damo22>have you ever got into a wierd state where it compiles but wont boot at all? maybe i need to clobber build
<damo22>btw ../i386/i386/pcb.c:276:2: warning: #warning SMP support missing (avoid races with io_perm_modify). [-Wcpp]
<damo22>AlmuHS: i suggest looking at my diff and there are two things you could try to apply from that diff
<damo22>nothing to do with APIC
<damo22>-int
<damo22>-cpu_ap_main()
<damo22>+void
<damo22>+cpu_ap_main(void)
<damo22> {
<damo22>- if(cpu_setup()) return -1;
<damo22>+ cpu_setup();
<damo22>because if it has an int return value it will corrupt the stack when you call it from asm i think
<AlmuHS>this #warning SMP is not the only
<AlmuHS>there are many functions which has the same warning
<AlmuHS>It's a check to sure this function finish correctly
<damo22>also you are missing +extern void interrupt_stack_alloc(void);
<damo22>in i386/i386/mp_desc.h
<AlmuHS>if cpu_setup() runs well, this if might not apply
<damo22>if you grep for cpu_ap_main, you are calling it from asm, and it has no return value expected
<AlmuHS>but there are some cases in which cpu_setup() could fails
<AlmuHS>yes, It is
<damo22>yeah but you are not handling the error
<AlmuHS>It's true
<AlmuHS>I don't know how to do this
<damo22>also, it has no default return value
<damo22>it hits end of function with no return value
<damo22>i fixed it by making it return void
<AlmuHS>I can replace It with a "return (cpu_setup());
<damo22>not really
<damo22>if you do that, you need to handle the return value in asm
<damo22>where does it go?
<AlmuHS>I remember that return value is stored in %eax
<damo22>ok
<AlmuHS>but I'm not sure
<damo22>so is it supposed to clobber eax?
<AlmuHS>wait, I'm reading about how call works
<damo22>if your asm caller code needs to keep eax, then you will be destroying its value on return
<damo22>?
<damo22>if you dont care about the return value, make it a void function
<AlmuHS>but then I can't know if It fails
<damo22>damn i got into a bad state it fails to print GNUMach at boot
<AlmuHS>yes, the return value is stored in %eax
<AlmuHS> https://stackoverflow.com/questions/6171172/return-value-of-a-c-function-to-asm
<damo22>so the question is, what is in eax before it calls that function and is it supposed to update eax
<damo22>because you are calling it from asm
<AlmuHS>we can do this
<AlmuHS> call cpu_ap_main
<AlmuHS> cmpl $0, %eax
<AlmuHS> jnz halt
<AlmuHS>eax is here
<AlmuHS>movw $KERNEL_DS,%ax
<AlmuHS> movw %ax,%ds
<AlmuHS> movw %ax,%es
<AlmuHS> movw %ax,%ss
<AlmuHS>but as source
<AlmuHS>ok, I go to try to changes as void
<youpi>damo22: in x86, the returned value is in a register, not on the stack
<damo22>ok
<damo22>i destroyed the console by trying to print too early
<damo22>lol
<AlmuHS>XD
<AlmuHS>btw, in latest Debian gnumach, the recovery terminal has disappeared
<AlmuHS>when fsck fails, the system continues booting
<damo22>indeed i noticed this
<AlmuHS>or in latest Debian hurd, I don't know exactly
<damo22>i need to fsck my /
<damo22>but i cant
<AlmuHS>you can do fsck after login, but It's risky
<damo22>no you need to have read only
<AlmuHS>It works, but with risk
<youpi>gnumach is not involved in that step
<AlmuHS>then is Hurd
<damo22>if / is mounted ro you can
<youpi>most probably it's e2fsprogs or initscripts which got a bug
<damo22>now its my turn to sleep, 4am
<damo22>goodnight
<AlmuHS>goodnight
<AlmuHS>this timezone difference is dangerous to talk
<AlmuHS>i changed cpu_ap_main() to void, waiting that this solve the enabling with ncpus > 2, but not
<AlmuHS>youpi: this is my gdb log http://paste.debian.net/1113183/
<AlmuHS>but I don't see any important thing
<youpi>AlmuHS: it is expected that cpu1 stays stuck on machine_idle, since there is no interrupt to get out of the hlt instruction. I guess the mere attachment of gdb to it interrupts hlt, however, and that's why you see it call thread_select.
<youpi>in your traces, take the habit of typing bt, to know where you are
<youpi>here I'm wondering what make cpu0 call delay
<youpi>what would be useful to debug in choose_thread, is where the three threads of ext2fs are
<youpi>i.e. when you are within choose_thread on cpu0, first make sure with "show all threads" that the three threads are in state R, and then inspect the runq, to check whether the three threads are indeed there on the runq
<AlmuHS>gdb only shows 2 threads, one for each cpu
<AlmuHS>do you suggest me open gdb and kdb at same time?
<youpi>you can do that yes
<youpi>first interrupt with kdb, check with trace that you are inside a not so dirty place
<youpi>and that the three ext2fs threads are there
<youpi>then you can use gdb (without even exiting kdb)
<youpi>and inspect the per-processor queues
<AlmuHS>ok
<youpi>(processor_ptr[0]->runq
<youpi>(note that what gdb presents you as "thread" is just the cpus themselves, not the gnumach threads)
<AlmuHS>yes, I said It before
<AlmuHS>I go afk a few hours
***Glider_IRC__ is now known as Glider_IRC
***janneke_ is now known as janneke
<damo22>well ahci probably doesnt use irq 11 anymore
<damo22>its a pci interrupt
<damo22>16-23