IRC channel logs

<jrtc27>eh, I deliberately default to 4 spaces per tab because I find it more readable/looks nicer (even for low depth), but will quickly recognise files for which I should change to 8 :P

<AlmuHS>I usually use 4 space per tab in my personal code

<AlmuHS>but, if GNU standard is 8 tabs, I'll have to follow It

<jrtc27>yeah (and beware of the extra indentation when {} is involved)

<jrtc27>(and the spaces before ('s in function calls)

<jrtc27>which perhaps comes from certain early developers' use of lisp, where it's `(f (a b))`, so translates more to `f (a, b)` in C

<AlmuHS>I don't use lisp

<AlmuHS>but It can be true

<AlmuHS>youpi: I've just sent the patch to maillist

<AlmuHS>I don't tested It with hurd, I advice

<AlmuHS>I only test to compile

<youpi>(also note that comments contain sentences, and should thus start with a capital, and finish with a dot and two spaces)

<AlmuHS>ok, I'll take notes

<jrtc27>I'd also suggest you use git-format-patch so you can bundle the changelog in with the diff rather than them being 2 separate files

<AlmuHS>oh, I didn't know this! ty

<youpi>I didn't notice in your previous change log: what is in parenthesis is the name of the function being modified, not the exactly code location

<AlmuHS>because the file is said before

<AlmuHS>in the parenthesis I said not the modified function, instead I said that I modified the callers of this function

<AlmuHS>I might had been more clear, It's true

<AlmuHS>but It's so late

<youpi>it's not a question of being clear, it's a question that it is the convention

<youpi>all other changelog entries are made this way

<AlmuHS>ok, I didn't know the convention either

<youpi>see how it is done in git log

<youpi>general rule: look around how it is done, mimic it

<AlmuHS>ok

<youpi>that'll save you a lot of GNU Coding Style reading

<AlmuHS>I feel as a kid, hahaha

<AlmuHS> http://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=d8671bc2a0fead7655b9e80736db33d84f14025c

<AlmuHS>oops, I notice a typo, sorry

<AlmuHS>+ [#include <mach/thread_info.h]) must be + [#include <mach/thread_info.h>])

<AlmuHS>youpi: can you fix It or I resend the patch?

<youpi>I'mll fix

<youpi> sending a modified patch is useless once the previous is commited

<AlmuHS>youpi: I sent a new patch. I found a new error (I'm embarrased, sorry)

<AlmuHS>now compiles without problems

<damo22>youpi: how do you install an interrupt handler from userspace?

<damo22>i found a bunch of symbols in gnumach that are pointers to functions *ivect[NINTR]

<damo22>im a bit confused how that can translate to something a userspace driver can implement

<youpi>damo22: you don't

<youpi>damo22: you just forward the event through a notification

<youpi>see the 70_dde.patch in the debian package that forwards interrupts to netdde

<youpi>it's possibly full of issues, but at least the principle is correct

<damo22>so every interrupt is going to be implemented as a separate rpc message

<youpi>yes

<damo22>ok thats kind of easier to handle

<damo22>i could have a single function that takes a parameter of the IRQ number

<damo22>that fired

<damo22>but i think i need to extend *ivect to 48 interrupts

<damo22>for possibly 2x ioapics?

<youpi>sure

<damo22>i probably dont even need all that

<damo22>but i want to allow scheduling of irqs

<damo22>it seems like we are inventing a new interrupt mechanism

<damo22>by passing it as a message

<damo22>it can be interposed

<youpi>that's useful for giving a PCI device to a subhurd, yes

<damo22>do i just set them all as intnull?

<damo22>the problem is, this is all very hardware dependent

<youpi>sure, intnull is fine

<youpi>waiting for anything to register a handler

<youpi>be it the user-land forwarding handler

<damo22>how does it decide which interrupt to register

<youpi>pic_isa.c is in i386/i386at/pic_isa.c, so it's fine to depend on x86 :)

<youpi>see the patch, the register function says which interrupt should be used

<damo22>ok

<youpi>netdde determines that from reading the PCI config space

<damo22>nice

<youpi>a proper sub-hurd forwarded would make sure the subhurd only requests interrupts for the PCI device it's given

<youpi>to avoid abuses

<youpi>though getting more interrupts than needed is not really that much a problem

<youpi>of course this model doesn't support the driver doing cli/sti

<youpi>but nowadays nobody does this any more

<damo22>does rump do cli/sti?

<youpi>no

<damo22>gooood

<youpi>afaik

<youpi>it's too dangerous to be doing this in userland anyway, you could get stuck in a blocking system call

<damo22>yeah

<youpi>and then never have the chance of calling sti

<damo22>is that why interrupts.S has cli in the middle?

<damo22>to stop interrupts from interrupting interrupt handlers?

<youpi>yes

<youpi>otherwise you may have unbound stacking

<damo22>stack overflow

<youpi>that does happen with very busy systems

<damo22>ok

<damo22>im learning a lot from reading this code

<youpi>it's much simpler to read than linux' :)

<damo22>interesting that irq2 is intnull, its not even connected to a line

<damo22>youd never receive an interrupt on irq2

<damo22>because the pics are plugged together on that line

<youpi>better have a null handler than a hang if for whatever reason the hardware invented an interrupt on irq2

<damo22>lol

<mabox>Did you saw ideologing at Cristosan.com

<mabox>Did ya ideology like at Cristosan.com

<AlmuHS>are there any secure way to test and upstream hurd compilation? Sometimes, after do "make install", hurd hang at boot. And I have to reinstall entire system

<AlmuHS>**test an

<youpi>using a sub-hurd yes

<AlmuHS>ok, I go to search about this

<youpi>note that you usually don't need to reinstall the entier system

<youpi>just unpack hurd_foo.deb

<youpi>either by booting from the debain installer, chrooting into your system and running dpkg -i

<youpi>or by mounting the disk, and unpacking by hand with ar p hurd_foo.deb | tar x

<AlmuHS>yes, but in this time, the filesystem crashed

<youpi>depends how it crashed

<AlmuHS>the translator which crashed was procfs

<youpi>I have never had to reinstalled the buildd boxes for a decade

<AlmuHS>btw, can anyone test the latest gnumach with the latest hurd? from upstream repositories. Yesterday hurd crashed after do It, but may I did any step wrong

<AlmuHS>I'm repeating the test now

<youpi>I uploaded them in debian yesterday

<AlmuHS>tonight?

<youpi>no, yesterday

<youpi>early yesterday

<AlmuHS>then only gnumach

<youpi>see http://qa.debian.org/gnumach

<youpi>see http://qa.debian.org/hurd

<AlmuHS>gnumach seems runs well, the problem seems to be in hurd

<youpi>ah, the hurd binaries may have come this morning only

<AlmuHS>I'm repeating the test now, as I said

<youpi>1140 oct. 30 08:17 hurd_0.9.git20191029-1_hurd-i386.upload

<AlmuHS>ok

<youpi>+archive delay

<AlmuHS>but, has you test It this configuration?

<AlmuHS>first test: upstream gnumach with default debian hurd. OK

<youpi>I have, sure

<AlmuHS>now I have to test this with upstream Hurd

<AlmuHS>my debian gnu/hurd is fully updated, btw

<youpi>note that upstream hurd doesn't have the debianish startup changes

<youpi>it'll expect scripts from /libexec etc.

<youpi>so in general it can't boot

<youpi>that might work by luck, but there is really no guarantee

<youpi>GNU and Debian just disagree on where files should be put

<youpi>if you want to test something in particular, just copy over that part in your system

<AlmuHS>upstream gnumach boot well, but without network

<youpi>upstream gnumach doesn't have the dde changes

<youpi>so netdde can't work

<AlmuHS>yes, I know

<AlmuHS>but boot

<AlmuHS>(excuse my spanglish)

<AlmuHS>compiling hurd. It takes a while

<AlmuHS>excuse me, I was eating lunch

<AlmuHS>youpi: upstream gnumach with upstream hurd: FAILED https://pasteboard.co/IEwY3Xq.png

<AlmuHS>or partial failed

<AlmuHS> https://pasteboard.co/IEwYFuI.png

<AlmuHS> https://pasteboard.co/IEwZmE0.png The fail is in procfs

<AlmuHS>same problem with default Debian gnumach

<AlmuHS> https://pasteboard.co/IEx0gdJ.png

<AlmuHS>in my SMP gnumach, also fails https://pasteboard.co/IEx1dHt.png

<AlmuHS>youpi: I finished my tests

<youpi>as I said, upstream's init system is very different from debian's init system, so it's not surprising that it fails

<youpi>just copy over what you actually need

<youpi>similarly, installing an upstream glibc over debian's glibc would be deemed to fail

<AlmuHS>yes, I forgot this

<AlmuHS>has procfs any debian patch?

<AlmuHS>and proc

<youpi>nothing that would be necessary

<AlmuHS>ok

<AlmuHS>and, where is proc and procfs? To copy manually

<AlmuHS>instead "make install"

<youpi>in proc/ and procfs/

<AlmuHS>in "/proc" and "/procfs"

<AlmuHS>in "/servers" I don't see It

<youpi>from proc/ and procfs/ to /hurd/

<youpi>see showtrans /proc

<AlmuHS>ok

<AlmuHS>ok, I see

<AlmuHS>ok, now Debian's gnumach with upstream proc and procfs (over Debian's Hurd) runs well

<AlmuHS>but, in my SMP gnumach, something goes wrong

<AlmuHS> https://pasteboard.co/IExcbv5.png

<AlmuHS>ok, with Debian's gnumach, the top command shows this crash

<AlmuHS>similar to my SMP gnumach

<youpi>you can attach gdb to the running procfs before triggering the crash

<AlmuHS> https://pasteboard.co/IExdm8m.png

<AlmuHS>ok, I'll try It

<jrtc27> http://git.savannah.gnu.org/cgit/hurd/hurd.git/tree/proc/info.c?id=7719472e3410d009c89d775ea05dc7f305d9ebce#n714 looks like that should have a `continue;` now too?

<jrtc27>not that I would expect that to be a segfault

<youpi>ergl, sure, the added code should be protected by an if (!err)

<jrtc27>not going to help, err gets cleared

<youpi>or continue in the if (err)

<youpi>note that that's the proc source code

<youpi>he has procfs dying, not proc

<jrtc27>ah right yes

<jrtc27>also the format string for last_process is %d yet the local is declared as long unsigned

<jrtc27>which is fine on i386, and probably harmless on x86_64

<jrtc27>(I'm assuming varargs stack slots on x86_64 are 64-bit)

*jrtc27 wonders whether libps needs to be in sync with procfs

<AlmuHS>excuse me, I was afk

<youpi>oh yes sure

<youpi>you also need to install libps

<AlmuHS>oh, ok. I go to try It

<AlmuHS>where is libps?

<jrtc27>/lib/i386-gnu/libps.so.0.3

<AlmuHS>Is It a package?

<jrtc27>built by hurd

<AlmuHS>out of hurd

<AlmuHS>?

<jrtc27>same package

<youpi>it's in the hurd-libs0.3 package

<youpi>same source package (hurd), but different binary package (hurd vs hurd-lib0.3)

<AlmuHS>ok

<AlmuHS>hurd-libs0.3/unstable,now 1:0.9.git20191029-1 hurd-i386 [instalado]

<AlmuHS>It's installed by default

<youpi>by "you also need to install libps", I mean like proc and procfs

<AlmuHS>but, Is It generated in "make", in hurd sources?

<youpi>yes

<youpi>see in libps

<AlmuHS>oh, I've just find It

<AlmuHS>libps.so ?

<youpi>yes, copy that into your lib/i386-gnu/libps.so.0.3

<AlmuHS>ok

<AlmuHS>done

<AlmuHS>in Debian's gnumach, continues same problem

<youpi>getting a backtrace with gdb would be really insightful

<AlmuHS>gdt must be executed inside the machine? or out

<youpi>inside the machine

<youpi>(fun typo, looks like your head is full of x86 :) )

<youpi>gdb /hurd/procfs the_pid_of_procfs

<AlmuHS>oh, It's strange. "ps -e | grep procfs" don't shows anything

<AlmuHS>only the same grep: pruebas 728 p0 S 0:00.03 grep procfs

<AlmuHS>I'm in my smp gnumach

<AlmuHS>ls /proc is empty too

<AlmuHS>It seems procfs if off

<AlmuHS>*is off

<youpi>showtrans /proc

<youpi>to be sure that it's still recorded

<youpi>possibly it just fails to start early

<youpi>and thus it doesn't show up

<AlmuHS>showtrans /proc don't shows anything

<youpi>then that's why

<youpi>ah wait that's perhaps mounted by init scripts nowadays

<youpi>but you can as well make it passive as well

<youpi>settrans /proc /hurd/procfs

<AlmuHS>but, If this, ps might show It, is not?

<youpi>not if it crashed already

<AlmuHS>ok, now /proc is not empty

<AlmuHS>and "showtrans /hurd/procfs" shows /hurd/procfs

<AlmuHS>but the error continues

<AlmuHS>now I can try gdb

<AlmuHS>nope, "ps -e" don't show procfs anyway

<AlmuHS>now

<youpi>if it crashed you need to let it restart by looking in /proc

<AlmuHS>ok, I'm in gdb

<AlmuHS>what I have to do

<AlmuHS>?

<youpi>xc

<youpi>grr

<youpi>c

<youpi>to let it continue

<youpi>(c as in "continue")

<youpi>then trigger the crash

<youpi>then bt

<youpi>(bt as in "backtrace")

<youpi>to get the backtrace

<AlmuHS>warning: Can't wait for pid 783: No child processes

<youpi>not a problem

<AlmuHS>gdb freeze

<youpi>that's expected

<youpi>it's letting the program continue

<AlmuHS>waiting

<youpi>waiting what ?

<youpi>did you trigger the crash?

<AlmuHS> https://pasteboard.co/IExImjHP.png

<youpi>like I said: trigger the crash

<youpi>on another console

<youpi>or through ssh

<youpi>I don't really understand what you expected, actually

<youpi>to get the backtrace of a crash, you need to trigger the crash

<AlmuHS>but I don't know how

<youpi>wasn't ps triggering the crash?

<youpi>or top, or whatever you did that triggered the crash

<AlmuHS>oh, I understand

<AlmuHS>wait, I go to reboot

<AlmuHS>I found the error

<AlmuHS>0x08033676 in process_file_gc_stat (ps=0x803f870, contents=0x1fffaa8)

<AlmuHS> at ../../procfs/process.c:237

<AlmuHS>237 last_processor = thsi->last_processor;

<AlmuHS>It's my code :(

<youpi>oh, you need to add PSTAT_THREAD_SCHED in the entries array

<youpi>along the others

<youpi>otherwise libps will not have filledit

<AlmuHS>in process.c ? (in stat writting)

<AlmuHS>or in info.c? calling thread_info()

<youpi>in the same file

<youpi>see where process_file_gc_stat is referenced

<AlmuHS>info.c, I remember

<AlmuHS>process.c is what has crash

<AlmuHS>but, what is "entries array"? stat array?

<youpi>look for "entries[]"

<youpi>or, like I said, look where process_file_gc_stat is referenced

<AlmuHS>hurd hasn't any caller for "process_file_gc_stat"

<youpi>no, but it's referenced in the file

<youpi>look for the function name

<AlmuHS>procfs/process.c:497: .get_contents = process_file_gc_stat,

<AlmuHS>or the own declaration

<AlmuHS>procfs/process.c:219:process_file_gc_stat (struct proc_stat *ps, char **contents)

<youpi>.get_contents, that's it, it's a reference to the function

<youpi>see the .needs just below

<AlmuHS>ok

<youpi>you need to add PSTAT_THREAD_SCHED, to express that it needs that information

<AlmuHS>ok

<AlmuHS>done

<AlmuHS> http://dpaste.com/03R4SDT

<AlmuHS>compiled with this mod. Booting

<AlmuHS>now works!! :-)

<AlmuHS>although top only shows 3 process

<AlmuHS>but all process are in cpu 0, anyway

<AlmuHS> https://pasteboard.co/IExVWc2.png

<AlmuHS>forcing "make -j2" to gen 2 threads, both runs over cpu 0 https://pasteboard.co/IExWLMa.png

<AlmuHS>youpi: fixed. But bad news for my smp work

<youpi>AlmuHS: could you post the output of ps -feM

<AlmuHS>ok

<youpi>I'm wondering if an idle thread shows up for the second processor

<youpi>rootdir_gc_stat would need to be fixed to include all processors information

<AlmuHS> https://pastebin.com/5RmX387T

<youpi>ah, sorry I missed a flag

<youpi>ps -feMj

<AlmuHS>ok

<AlmuHS> https://pastebin.com/8XrU0VTq

<youpi>Mmm, perhaps even ps -feMT

<AlmuHS>ok...

<youpi>ps -feMTj

<youpi>even :)

<youpi>so we have all informations

<AlmuHS> https://pastebin.com/RrfaXrYs

<youpi>yay, there are two gnumach thread with "N"

<youpi>so there is an idle thread for the second processor indeed

<AlmuHS>maybe I can know the cause. wait

<youpi>I wonder how get_idletime can determine which one is for which processor, though

<youpi>it's expected to have these two idle threads

<youpi>it's however surprising that one of them has no time

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/wip/i386/i386/cpuboot.S#L105-L109

<AlmuHS>maybe is this

<youpi>does cpu_ap_main terminate?

<AlmuHS>I added this hack to avoid that, when the processor is not added to the system, It returns to the snippet and finish

<AlmuHS>ideally, after "slave_main()" the processor never regret to cpuboot.S

<AlmuHS>because slave_main() generate a new thread over this

<AlmuHS>but maybe It regret anyways and does "hlt"

<youpi>you should put printfs to make sure what happens

<AlmuHS>yes

<youpi>normally slave_main doesn't return, so put printf there

<AlmuHS>I added this hack because, when I disable pagging and slave_main(), and the processor don't enter to the system, this regret to cpuboot.S, finish the thread and dead

<AlmuHS>ok

<youpi>you'll need to release ap_config_lock before calling it

<youpi>otherwise that lock will never be released

<AlmuHS>what?

<youpi>cpu_launch_first_thread() doesn't return

<youpi>(at least it's not supposed to)

<youpi>so slave_main won't either

<AlmuHS>It might not

<youpi>it's not supposed to

<youpi>put printfs to be suer

<AlmuHS>ok

<youpi>but in the long run it won't

<youpi>so releasing a mutex after it doesn't make sense

<youpi>that'll never happen

<AlmuHS>the mutex in cpu_setup() is useless, I think

<AlmuHS>I added this as an experiment

<youpi>does the second CPU have its own PIT?

<AlmuHS>PIT?

<youpi> cpu_launch_first_thread calls startrtclock

<youpi>you wouldn't want to do that again for the second processor, if you have only one PIT

<AlmuHS>what is PIT? :-(

<youpi>the programmable interrupt timer

<AlmuHS>ok, ok

<AlmuHS>in APIC, the Local APIC has Its own timer, but Mach don't use It

<youpi>I'd say in cpu_launch_first_thread put an if (th != THREAD_NULL) before calling startrtclock

<AlmuHS>I haven't implemented It yet

<AlmuHS>yes

<youpi>so only the boot cpu does it

<AlmuHS>added

<youpi>in idle_thread, put a printf

<youpi>to make sure that it's getting started

<AlmuHS>ok

<youpi>also put prints before thread_block, and between thread_block and idle_thread_continue

<youpi>to be sure what happens there

<AlmuHS> printf("cpu %d entering in idle_thread", cpu_number());

<youpi>and also at the begining of idle_thread_continue

<youpi>concerning get_idletime in procfs, the answer is: idle threads are created in the cpu number order

<youpi>so get_idletime can assume that

<youpi>note that for later :)

<youpi>that'll be needed to get proper cpu stats in top

<AlmuHS>I added this printf in the own idle_thread() function, not in the callet

<AlmuHS>*caller

<AlmuHS> printf("cpu %d continue in idle thread", mycpu);

<AlmuHS>?

<youpi>for instance

<youpi>the message doesn't really matter as long as it's different for each print, and contains the cpu number

<AlmuHS>what do you say about get_idletime ?

<youpi>err, well, what I said

<youpi>you'll see that later

<youpi>just copy/paste what I said

<youpi>for when you get to that function to get proper cpu stats in top

<youpi>so, about that idle thread not getting time in ps, that might be "normal": clock_interrupt() is supposed to be called in order to update that

<youpi>but since there's only the pit, for which cpu0 catches the interrupt, cpu1 never gets such interrupt and never calls clock_interrupt

<youpi>so you need to program the local apic insteda of the pit, to get these interrupts

<AlmuHS>ok

<AlmuHS>damo22 said that he could help me with local apic configuration

<youpi>yes, that's it: in the ivect array, irq0 is hardwired on the hardclock() function, which calls clock_interrupt

<youpi>you want to make the local apic call that as interupt handler

<youpi>that might also actually help to make thread go to cpu1

<youpi>without it, only scheduling IPIs would get that

<youpi>I guess you haven't implemented IPIs for scheduling yet?

<AlmuHS>I've implemented a generic function to write ICR

<AlmuHS>and the startup IPI sequence

<AlmuHS>anymore

<youpi>but that's only used at startup, not by the scheduler, right?

<AlmuHS>yes

<AlmuHS>only in startup

<AlmuHS>the sequence is start_other_cpus() -> intel_startCPU() -> startup_cpu()

<AlmuHS>---> (from interrupt) cpuboot.S -> cpu_ap_main() -> cpu_setup() - ... -> slave_main()

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/commit/46c4bc76ac9b0dffa5105f29a11cffbba9f88e20

<youpi>so apparently what you need to do is to define an aston() function and define MACHINE_AST, so that in kern/sched_prim.c's thread_setrun(), the ast_on call does something

<youpi>aston would have to send the ipi

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/commit/e1c6fac3fcc1ac0ff7c88402057f3566e8458913

<AlmuHS>ast.c

<AlmuHS>there are a ast_check() function empty https://github.com/AlmuHS/GNUMach_SMP/blob/master/i386/i386/ast_check.c

<AlmuHS>and here there are some NCPU > 1 cases without implement (I think) https://github.com/AlmuHS/GNUMach_SMP/blob/master/kern/act.c

<youpi>basically, your IPI interrupt handler wouldn't have to do anything

<youpi>ast_on only has to record the ast reason in the need_ast array, and send the IPI

<AlmuHS>I haven't an IPI interrupt handler by own

<AlmuHS>only a startUP IPI sequence

<youpi>the interrupt handler does nothing, but on return it will call all_intrs, which checks need_ast, and if so calls take_ast

<youpi>sure, that's to be set up

<youpi>just telling you what needs t obe done for this to work

<AlmuHS>ok

<AlmuHS>but this ast is not a architecture-defined IPI, is a fixed-type IPI , is not?

<AlmuHS>It's not like INIT IPI or StartUP IPI

<youpi>ast is just a software layer

<youpi>which only requires from the hardware to manage to raise an IPI

<youpi>whatever the form that has

<AlmuHS>but I haven't clear what is the purpose of a AST. Or, even, what is exactly an AST

<youpi>it's just a software layer for various stuff in the kernel to notify another cpu to do something

<youpi>since that's all software, you don't need to implement anything but the IPI

<AlmuHS>only send an IPI with the content indicated in the call ?

<youpi>really, as I said, all it takes to implement it is to write an aston function which records the reason in the need_ast array and sends an IPI

<youpi>and add an IPI handler which does nothing

<youpi>no even need for passing the content in the IPI, just record it in need_ast

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/smp/i386/i386/mp_desc.c#L232-L236

<youpi>ah, ast_on() already does the recording

<youpi>so aston() only has to trigger the IPI

<youpi>and the handler do nothing

<AlmuHS>my send_ipi() function only fills IDT (to send the IPI)

<youpi>that should be enough yes

<AlmuHS>but this need the IDT content to write

<AlmuHS>you can see an example of use just below send_ipi(), in startup_cpu()

<AlmuHS>ok, I start understand a few

<AlmuHS>but then, we need to define some IPI types (to each reason)

<AlmuHS>different signals

<AlmuHS>where I said "IDT content", I wanted to say "ICR content"

<youpi>put whatever you like, it doesn't matter

<youpi>the ast layer already manages the reasons

<youpi>so you don't need to manage them at the IPI level

<youpi>i.e. you'd have only one reason at the IPI level: ast

<AlmuHS>ok, as message passing

<youpi>note that it's when you'll start executing processes on cpu1 that mayhem may happen

<youpi>for now you do not have concurrency in drivers etc.

<youpi>once executing on cpu1 you will

<youpi>notably, the disk driver wil lbe called there

<AlmuHS>yes. I'm not an SMP expert, I'm programming a bit blindy (I only knowed until slave_main() )

<AlmuHS>I'm embarrased to say that, but It's the true :-(

<youpi>everybody started like that

<youpi>I did

<youpi>just a longer time ago than you did

<AlmuHS>then, I need a new step sequence, with the next steps to solve

<youpi>yeah, the ahci/ide disk driver will not be smp safe

<youpi>it'd probably be safer to use a rumpdisk driver

<AlmuHS>I need a new task list

***Emulatorman__ is now known as Emulatorman

<youpi>what i said for now: local apic, IPI

<AlmuHS>damien said ioapic (to share IRQ)

<youpi>then you'll want to implement cpu stats in proc for top to show them

<youpi>but in terms of implementation that should be about it

<youpi>what is next is to debug issues that will come out

<youpi>or ioapic, I don't remember the details. BAscially, something that interrupts cpus periodically

<AlmuHS>ioapic is for devices IRQ

<AlmuHS>at moment, keyboard IRQ (the tty don't reply keyboard)

<AlmuHS>Mach currently use PIC8259 controller

<AlmuHS>but damo22 said that, to share I/O IRQ between processors, could be needed to configure IOAPIC

<AlmuHS> http://paste.debian.net/plain/1111966

<youpi>we don't really need to share IRQs between processors for now

<youpi>it's fine to let cpu0 handle them all

<AlmuHS>then, why tty don't reply keyboard?

<youpi>possibly due to the pit thing, iirc they are related

<AlmuHS>It's possible

<AlmuHS> https://pasteboard.co/IEyBIUw.png

<youpi>ok, put an if to avoid these :)

<AlmuHS> if(self != THREAD_NULL) printf("cpu %d entering in idle_thread\n", cpu_number()); ?

<youpi>you can as well just test cpu_number()

<AlmuHS>yes

<AlmuHS>if(cpu_number() != 0)

<AlmuHS> printf("cpu %d entering in idle_thread\n", cpu_number());

<youpi>thinking about it on my way back home: in idle_thread_continue() you could replace machine_idle() with machine_relax(), to avoid making the cpu enter sleep mode and need an irq to be awakened

<youpi>that'll suck power, but it should allow for proper cpu scheduling

<AlmuHS>in cpu_launch_first_thread() ?

<youpi>? no, that one doesn't call machine_idle ?

<AlmuHS>idle_thread_continue() then?

<youpi>like I said, yes

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/wip/kern/sched_prim.c#L1665

<AlmuHS>this?

<youpi>well, yes, there is no other call to machine_idle in idle_thread_continue

<AlmuHS>ok

<AlmuHS>compiling...

<AlmuHS>something crash https://pasteboard.co/IEyKODJ.png

<youpi>ok. To avoid issues I was thinking that you could, in thread_init(), set bound_processor to cpu_to_processor(0) instead of PROCESSOR_NULL

<youpi>normally that'd make all tasks remain on cpu0

<youpi>so you can choose what you want to possibly put on cpu1

<youpi>that'd be notably useful to make ext2fs remain on cpu0 and avoid smp issues in the ide/ahci driver

<AlmuHS>yes, I go to try It

<AlmuHS>do you say thread_init() function, or the call to this?

<AlmuHS>ok, done

<AlmuHS>there are a special SMP usecase, disabled

<AlmuHS>#if NCPUS > 1

<AlmuHS> /* thread_template.last_processor (later) */

<AlmuHS>#endif /* NCPUS > 1 */

<youpi>when I say "in thread_init", I mean inside it

<AlmuHS>yes, I've just find It

<youpi>otherwise I would say "in callers of thread_init"

<AlmuHS>compiling...

<AlmuHS>same problem

<AlmuHS>wait

<AlmuHS>It's booting, but so slow

<youpi>how are you emulating the system?

<youpi>qemu?

<AlmuHS>yes

<youpi>with kvm acceleratoin?

<AlmuHS>I think yes

<youpi>do you have several CPUs on your host?

<youpi>does top on your host report 200% cpu usage?

<AlmuHS>yes, dual cpu, 4 thread

<AlmuHS>I have enable-kvm, in fact

<youpi>ok, you could perhaps try to put a for loop around calling machine_relax()

<youpi>to call it like a thousand times

<AlmuHS>ok

<youpi>if that's not enough, try a million times :)

<AlmuHS>ok

<AlmuHS>you can follow my commits here, btw https://github.com/AlmuHS/GNUMach_SMP/commits/wip

<youpi>already checked that out :)

<AlmuHS>latest commit don't solve problem

<AlmuHS>oh, wait. I had a fail in my latest commit. I forgot i++

<youpi>that said, it still means that calling machine_relax in a loop makes gnumach slow

<youpi>with i++ that'll be the same

<AlmuHS>yes

<AlmuHS>really, what makes slow is Hurd servers boot

<AlmuHS>the boot start freezing in ext2fs and finally freeze in Hurd server bootstrap, after load exec

<AlmuHS> https://pasteboard.co/IEz0nvP.png

<youpi>did you add the bound_processor thing?

<AlmuHS>yes

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/commit/e9d9c00ccef0fc02fd5e94e1fc985d70b7163621

<AlmuHS>what can I do?

<youpi>did you try with the bound_processor thing but keep machine_idle()?

<youpi>in case it's actually the bound_processor thing which poses problem

<AlmuHS>I don't test It, I remember

<AlmuHS>I go to try It

<youpi>oh, wait, I didn't realize: you want to replace machine_idle with machine_relax only on cpu1

<youpi>cpu0 should continue using machine_idle

<youpi>and the infinite while loop for sure made cpu0 somehow hang :)

<AlmuHS>I go to check It

<AlmuHS>same: https://pasteboard.co/IEzqoCU.png

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/commit/56d8508c10196a8df80e822e73ab767c2a557b73

<youpi>I was thinking: to make sure to know what is happening, put a print there when cpu is not 0

<youpi>to make sure whether cpu1 is just stuck in machine_idle, or something else

<AlmuHS>after or before than machine_relax() ?

<AlmuHS>before better

<AlmuHS>I go to dinner

<youpi>before and after

<AlmuHS>compiling

<AlmuHS> https://pasteboard.co/IEzHW8H.png

<AlmuHS>youpi:

<youpi>ah, I meant to use the machine_idle version

<youpi>it's not surprising that it gets out of machine_relax

<AlmuHS>ok, XD

<youpi>I was rather wondering whether the machine_idle version does return or not

<youpi>did you look what machine_idle/relax do?

<AlmuHS>I didn't look It yet

<youpi>it's just a few lines

<youpi>you'll better understand what I'm after

<AlmuHS>in model_dep.c, I see

<AlmuHS>machine_relax() is a "nop" loop

<youpi>do you use tags in your editor?

<AlmuHS>no

<youpi>to be able to quickly jump to the definition of a function?

<youpi>you *really* should

<youpi>you won't have to care about where is what any more

<youpi>that's a so huge time saving

<AlmuHS>I usually use CodeBlocks when I want to find any symbol

<AlmuHS>but now I'm in a simple Gedit

<AlmuHS>ok. machine_idle() is a hlt

<AlmuHS>I've just opened CodeBlocks

<youpi>why using gedit?

<AlmuHS>because It's light

<AlmuHS>to quick changes, It's more simple than a fully IDE

<AlmuHS>but the slower task is transfer the changes to the VM. I have to commit and push to my repo, then fetch and merge in my VM, checkout to the Damien's branch (with Debian patches), and remerge It

<youpi>if the IDE makes it slow to make small changes, change IDE :)

<AlmuHS>no, simply I'm so lazy to open the IDE, unless It's really necessary

<AlmuHS>ok, the new print don't shows any result

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/commit/cbd182e9e5c9b9adfc60514ea31f28008d8de3eb

<youpi>there's a misunderstanding

<youpi>I'm not saying to look at what is happening on cpu0

<youpi>that one is fine since your system can usually boot with it

<youpi>I'm saying to use machine_idle on cpu1 too, and observe there whether it does return

<AlmuHS> https://pasteboard.co/IEzQaNf.png

<AlmuHS>oh, ok

<AlmuHS>compiling

<AlmuHS> https://pasteboard.co/IEzTuCM.png

<youpi>does it boot fast?

<youpi>(note that sometimes boot gets stuck at exec startup, and it takes a couple reboots to get it past that)

<AlmuHS>no boot

<AlmuHS>simply freeze

<youpi>just to make sure: apart from the prints, there is no difference with the version that was booting fine?

<AlmuHS>nope

<youpi>so I believe it's just the boot that gets stuck, try to reboot it

<AlmuHS>the slow is not gnumach

<AlmuHS>the problem starts when ext2fs (and hurd after this) trying to boot

<youpi>ah yes there is a difference: the bound_processor thing

<youpi>did you try to reboot several times?

<AlmuHS>yes

<AlmuHS>It's the 4th

<youpi>try to revert the bound_processor thing then

<AlmuHS>ok

<youpi>to get back to something that was working before

<AlmuHS>ok

<youpi>otherwise one can't make sure what is due to what

<AlmuHS>ok, reverted bound_processor, It seems works

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/commit/f47300b95f3560e7e61aa0abbf606a429282f5e0

<youpi>ok, so let's keep that for now, does it print cpu 1 out of idle?

<AlmuHS>I didn't see It

<AlmuHS>I go to reboot to check better

<youpi>if you don't see it, it means it doesn't happen, normally you would see loads of it :)

<youpi>you could try to replace machine_idle(mycpu) with the infinite loop while(1) ;

<youpi>just to make it hang in a cpu-busy with that should get emulated fine

<youpi>perhaps machine_relax() actually bothers virtualization

<AlmuHS> https://pasteboard.co/IEzYTxs.png

<AlmuHS>currently, both processors has machine_idle()

<youpi>if the while(1); loop boots fine as well, you can try to make int i a volatile int i, and use for (i = 0; i < 10000000; i++) ; to make it a small delay loop without using machine_relax()

<youpi>I'm all talking about cpu1

<youpi>cpu0 works just fine since it has its irqs

<AlmuHS>yes, I go to try It

<AlmuHS>compiling

<youpi>err

<youpi>one step at a time

<AlmuHS>what?

<youpi>I said first try with the while(1); loop

<AlmuHS>I misunderstand

<youpi>if you don't try each step, you don't understand what is happening

<AlmuHS>yes, It's true

<AlmuHS>recompiling

<AlmuHS>with while(1) It works well

<youpi>ok

<youpi>so many the cpu spin is not a problem

<AlmuHS>while(1) is like stop the processor

<youpi>so try the for loop

<AlmuHS>ok

<AlmuHS>compiling

<AlmuHS>with for-loop, It seems to freeze

<AlmuHS> https://pasteboard.co/IEA7ZQS.png

<youpi>ok, so it's really letting cpu1 run other threads which poses problem

<youpi>which is only to be expected, actually :)

<AlmuHS>concurrency?

<youpi>all sorts of yes

<AlmuHS>non mutual exclusion

<youpi>I'm thinking that you should first set back the bound_processor thing

<youpi>with machine_idle() on cpu1

<youpi>and make that work, before going further

<AlmuHS>set cpu1 as bound_processor ?

<youpi>no, cpu0

<AlmuHS>I did It before

<AlmuHS>do I keep cpu0's machine_idle() ?

<youpi>yes

<youpi>the idea is to keep almost similar to what was working

<youpi>and only change the bound_processor thing

<youpi>which is not supposed to b reak anything (but apparently does)

<youpi>and once working, should allow to let cpu1 use machine_relax(), for it only to find that it has nothing to execute

<youpi>and then enable running some tasks on cpu1 one by one

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/commit/9a0bfb25b22a2ec9c2c170daf4416aecda4c4b59

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/commit/653a1f7ba3d5ab0f47a52853ad55d6802325cdfb

<AlmuHS>my IDE broken some tabs

<AlmuHS>the system hang again

<AlmuHS>in the same place than before

<youpi>which place ?

<AlmuHS> https://pasteboard.co/IEAgb3s.png

<AlmuHS>maybe hlt is shutdown entirely the cpu1

<youpi>that's not a problem

<youpi>cpu0 shoul dbe able to continue working without problem

<youpi>I'm checking what setting bound_processor would change

<youpi>I don't see anything obvious

<youpi>I'd say put prints in thread_setrun and in thread_select to make sure that evertything happens as appropriate

<youpi>i.e. thread_setrun puts it on cpu0's runq

<youpi>and thread_select finds it there

<AlmuHS>have I to put these prints ?

<youpi>that's waht I'm saying yes

<AlmuHS>ok

<AlmuHS>here? if (myprocessor->runq.count > 0) {

<youpi>anywhere where it makes sense

<AlmuHS>It's interesting

<AlmuHS>if ((thread->state == TH_RUN) &&

<AlmuHS>#if MACH_HOST

<AlmuHS> (thread->processor_set == pset) &&

<AlmuHS>#endif /* MACH_HOST */

<AlmuHS> ((thread->bound_processor == PROCESSOR_NULL) ||

<AlmuHS> (thread->bound_processor == myprocessor))) {

<AlmuHS> simple_unlock(&pset->runq.lock);

<AlmuHS> thread_lock(thread);

<AlmuHS> if (thread->sched_stamp != sched_tick)

<AlmuHS> update_priority(thread);

<AlmuHS> thread_unlock(thread);

<AlmuHS> }

IRC channel logs

2019-10-31.log