IRC channel logs

2023-02-14.log

back to list of logs

<damo22>ok, all my code is upstream :D
<damo22> 1 (f59a6d20) R.....
<damo22> 2 (f59a6bd0) R.....
<damo22> 3 (f59a6a80) R.....
<damo22> 4 (f59a6930) R.....
<damo22> 5 (f59a67e0) R.....
<damo22> 6 (f59a6690) R.....
<damo22>is this the correct state for idle threads?
<damo22>maybe the scheduler thinks the cpus are all running and doesnt select anything for them to run
<gnu_srs1>youpi: Can you resend the mail about go/unix.TIOCGETA I cannot find it.
<youpi>damo22: the idle thread is always running, yes
<youpi>gnu_srs1: about golang build issues, note that the ‘unix.TIOCGETA’ issue is already solved, it's just that debian got a newer package that dropped the cherry-picked patch
<damo22>i put printf in the idle_thread_continue loop, only one of them is running on one cpu
<damo22>but the rest of the cores are idle
<damo22>sitting in machine_idle
<damo22>that is probably expected
<youpi>do they get clock interrupts ?
<youpi>idle should be getting interrupted by the clock at least
<damo22>yes
<youpi>(or by IPIs)
<damo22>clock and IPI
<damo22>1 acpi (f59a8dd0): (f5998bd8) ..SO..(thread_bootstrap_return) i need to figure out what happens after this
<damo22>it seems to die
<youpi>did you set debug_all_traps_with_kdb?
<damo22>boolean_t debug_all_traps_with_kdb = FALSE;
<youpi>better enable it to see such issue
<damo22>start acpi: kernel: Invalid opcode (6), code=0
<damo22>Stopped at 0x80eb880: ???
<damo22>>>>>> user space <<<<<
<damo22>0x80eb880(0,0,0,0,0)
<damo22>0x91()
<damo22>db{1}>
<damo22>that makes more sense, its somehow hitting an invalid opcode and crashing the task
<damo22>hi almuhs
<almuhs>hi
<damo22>all the changes we have for smp are upstream
<almuhs>yes, i saw it
<almuhs>have you checked the bound processor topic?
<damo22>(20:06:16) damo22: start acpi: kernel: Invalid opcode (6), code=0
<damo22>(20:06:16) damo22: Stopped at 0x80eb880: ???
<damo22>i turned on debug_all_traps_with_kdb and now it throws a debug trap
<damo22>i havent looked at bound_processor yet
<damo22>it seems to be switching to a bogus thread?
<almuhs>when youpi1 and me was checking, some years ago, ext2fs has a thread that never was assigned to a cpu
<damo22>almuhs: it hits an invalid opcode and crashes the task
<almuhs>who?
<damo22>start acpi: <crash>
<damo22>(who's on first)
<almuhs>make an objdump -d gnumach
<almuhs>and find this instruction
<damo22>0x80eb880 is not in the code
<damo22>its in memory
<almuhs>then you probably have to debug it using gdb
<damo22>i dont know how to debug more than kernel task with gdb
<almuhs>you will have to set a remote debugging session with gdb, and put a breakpoint in this
<damo22>kernel runs to the end
<youpi1>you can gdb the acpi binary
<youpi1>to know what's there
<youpi1>it won't be live, but at least you know what's at that address
<damo22>ok
<youpi1>(possibly the binary is just not getting properly mapped into memory)
<damo22> 80eb880: 66 0f 6e c0 movd %eax,%xmm0
<almuhs>then you have to find the function who execute this instruction
<youpi1>damo22: possibkly it's the fpu state which is not properly managed among cpus
<youpi1>see i386/i386/fpu.c
<youpi1>notably, enabling mmx etc. instructions
<youpi1>that probably has to be done on APs as well
<damo22>aha
<youpi1>see init_fpu
<almuhs>maybe setting a simpler cpu in qemu we can to check it this is the problem
<youpi1>which is suposed to be called on each cpu
<youpi1>almuhs: without fpu, userland will just not work
<youpi1>it has always assumed to be there
<almuhs>yes, i refered to an cpu which has not mmx instructions ?
<damo22>almuhs: we would have to recompile userland with no mmx
<almuhs>oops
<almuhs>wait me 10 minutes. My class has finished and i have to left the classroom
<damo22>its easier to call init_fpu()
<youpi1>not only no mmx, userland will still try to use floats
<youpi1>ergo no way
<almuhs>it's true
<almuhs>i go out 10 minutes
<almuhs>hi again
<damo22>ext2fs: part:2:device:sd0: No such device or address
<youpi1>\o/
<damo22>it booted with -smp 4
<youpi1>now you need to use rumpdisk etc.
<damo22>rumpdisk worked
<almuhs>:-)
<almuhs>i can try it in a real machine, but i need prepare some things
<damo22>its very slow
<almuhs>check with ps command
<almuhs>what processor is executing each process
<damo22>it hangs randomly while bootstrapping the first tasks
<almuhs>i remember a similar problem some months ago
<almuhs>it was related with stack reserve, if i remember well
<youpi1>it'd probably be good to make an extensive review of the code to check that things are locked
<almuhs>yes, good idea
<damo22>with -smp 2 it got to Hurd server bootstrap: ext2fs[part:2:device:wd0] exec startup proc auth.
<damo22>and hanging there with 200% load
<youpi1>you can look where it's stuck
<almuhs>my error of some months ago was when i was using the harddisk
<almuhs>by example, using apt, some times i had kernel panic
<damo22>Stopped at pmap_put_mapwindow+0xd2: jmp pmap_put_mapwindow+0xbe
<damo22>pmap_put_mapwindow(c10a84cc,0,1000,f8458760,0)+0xd2
<damo22>pmap_zero_page(6f4c4000,f5efb6a8,f599bd3c,f6119a50,0)+0x6d
<damo22>vm_page_zero_fill(f8458760,f6119a50,0,c105e44e,f5997ec8)+0x19
<damo22>vm_fault_page(f6119a50,0,3,0,0,f599bddc,f599bde0,f599bde4,0,0,f599bddc,f599bdd0)
<damo22>...
<almuhs>like this https://pasteboard.co/zGQVKQhsPwmh.jpg
<damo22>syscall_vm_allocate
<almuhs>memory related then
<almuhs>it's not?
<youpi1>damo22: did you take the addition of simple_lock there?
<youpi1>simple_lock(&pmapwindows_lock);
<damo22>i am running master
<damo22>@@ -258,6 +259,7 @@ cpu_setup(int cpu)
<damo22> machine_slot[cpu].cpu_subtype = CPU_SUBTYPE_AT386;
<damo22> machine_slot[cpu].cpu_type = machine_slot[0].cpu_type;
<damo22>
<damo22>+ init_fpu();
<damo22> lapic_enable();
<damo22> cpu_launch_first_thread(THREAD_NULL);
<damo22> }
<damo22>* d6ff5ba7 (HEAD -> master, zammit/master, origin/master, origin/HEAD) linux: Fix non-SMP build
<almuhs>is it your latest commit?
<damo22>i havent committed the init_fpu() change yet
<almuhs>true
<damo22>the rest is in master
<almuhs>i will try to compile upstream with smp
<damo22>i dont think the (pmap == kernel_pmap) change is very good, it sends many many cpu updates, is there a way to reduce them?
<youpi1>if the kernel mapping changes, all cpus have to be aware of it
<youpi1>aka: don't blame the change, blame the code that triggers the case
<almuhs>XD
<almuhs>maybe it's necessary some optimization
<damo22>but if it sets the cpus_active and cpus_using on the pmap, cant it already know
<youpi1>what do you mean by "know"?
<youpi1>just to make sure: do you know about TLB?
<damo22>i think its updating even though nothing is changing
<damo22>i need to read more
<youpi1>then look for the callers of signal_cpus
<youpi1>it's supposed to be called only when something changed
<almuhs>damo22: what are the new configure flags for smp?
<damo22>in pmap_put_mapwindow, should there be a PMAP_READ_LOCK()/UNLOCK() around the PMAP_UPDATE_TLBS() ?
<youpi1>probably better replace the separate slock with the mere PMAP_WRITE_LOCK/UNLOCK
<almuhs>youpi: i have many syntax error when i try to "make gnumach.gz" from Debian GNU/Linux
<gnu_srs1>youpi1: Can you just resend the mail to me to bug-hurd about unix.TIOCGETA I cannot find it.
<youpi1>which mail?
<youpi1>what I pasted above is all I can remember
<youpi1>almuhs: "many syntax error" is not enough of a bug report
<damo22>almuhs: autoreconf -fi && mkdir build && cd build && ../configure --enable-apic --enable-kdb --enable-ncpus=8 && make gnumach.gz
<almuhs> https://pastebin.com/QN8eckJW
<almuhs>damo22: thanks. I will try it
<youpi1>probably you need to upgrade your mig
<almuhs>compiling it?
<youpi1>err, no, just upgrade?
<youpi1>1.8+git20221221-2
<almuhs>1.8+git20200618-5
<youpi1>see
<youpi1>that's elder
<almuhs>yes
<almuhs>i will add sid repository
<youpi1>no need for si
<youpi1>d
<youpi1>it's in testing
<almuhs>yes. I forgot to add testing repositories in my new machine
<almuhs>or not
<almuhs>i have testing repositories, i don't know why apt only offers the version from stable
<youpi1>you possibly have a preference
<youpi1>you can force with apt install mig/testing
<almuhs> https://packages.debian.org/search?keywords=mig&searchon=names&suite=all&section=all
<youpi1>mig-i686-gnu
<youpi1>please learn to use apt-cache
<youpi1>and drop websites
<almuhs>thanks '=D
<almuhs>now compile
<damo22>i'll send in the patch for init_fpu tomorrow, bedtime
<almuhs>good night damo22
<almuhs>gnumach compiled
<almuhs>waiting to finish Debian GNU/Hurd installation
<almuhs>i'm testing in a real machine (thinkpad r60e) without rumpdisk. It's really slow
<almuhs>crashed after sending IPIs
<almuhs>i can enable rumpdisk in a real machine: the latest Debian GNU/Hurd installation image crashes when try to boot from dvd
<almuhs>can't
<almuhs>testing in qemu with IDE disk, my VM is in loop https://pasteboard.co/ox6iGogzjY3M.png
<almuhs>i go to upgrade my installation, and repeat the test
<almuhs>same
<Pellescours>I think before having smp + rumpdisk, we need to ensure that interrupts on apic works correctly
<Pellescours>when I last tested the rump+apic, irq were lost and I got messages similar to what almuhs just had
<almuhs>it's a possibility
<almuhs>now i go to try again in real hardware
<fowler>On Debian I have deb-src lines, but for some packages like netdde I'm unable to do an apt-get source. I can install the binary version 0.0.20200330-11 no issues, but the source doesn't seem to be in the package repos for some odd reason. Can anyone reproduce this?
<youpi1>that's a shortcoming of debian-ports indeed
<youpi1>you can however run "debcheckout netdde" to get the git repo
<youpi1>(from the devscripts package)
<fowler>Thanks for that. Will do, thanks :)
<fowler>W: Unable to locate package netdde
<fowler>unknown package 'netdde'
<youpi1>ah, perhaps debcheckout uses deb-src too
<youpi1>anyway it's just at the same place other hurd debiain packages
<fowler>salsa?
<youpi1>https;//salsa.debian.org/hurd-team/netdde.git
<fowler>Got it, cheers :)
<fowler>The reason I'm poking around at this is because I've done something similar to netdde before. Prof Andrew Tanenbaum hired me 10 years ago to port a linux kernel driver to Minix3. He wanted a gigabit card working on the system for real hardware workstations and chose the Broadcom 5752. The linux kernel driver was ~ 13,000 LOC, but I got it working and managed to get really decent speeds out of it.
<fowler>It was a really fun project and I've been playing with Hurd recently so I thought I'd take a look at the current driver ports
<youpi1>netdde is not really "current", we'll rather head to rump
<youpi1>we haven't worked on the network part of it since we have netdde which is fine enough for various needs, but long-run we'll rather use rump for network too
<fowler>The concept of the rump kernel is definitely appealing, but taking linux drivers "off the shelf" and reusing them looks like it will support a lot more hardware.
<youpi1>except that "reusing" is terribly not easy
<youpi1>for linux drivers
<youpi1>and *way easier
<youpi1>for bsd drivers
<fowler>I do agree, afterall, I do have some experience in matter :D
<fowler>The "terribly no easy" part may be enjoyable to me. I did enjoy the challenge the last time
<youpi1>well, the initial port is fun
<youpi1>and then you have the maintenance part
<youpi1>and that part is not joyful
<youpi1>I've seen such projects die one after the other
<fowler>Oh yes, very true of course. Such is the life of a volunteer developer and a volunteer dev team. People kind of expect the project to be taken on by the dev team after they've done their "fun part" which is in many ways not fair or practical
<youpi1>exactly
<fowler>For now I'm just poking around at things to find something challenging and fun and possibly practical. It may be that I end up doing nothing and it may be that I end up doing something challenging, fun AND practical.