IRC channel logs

***alMalsam1 is now known as alMalsamo

***FragByte_ is now known as FragByte

<clarity_>From what I understand, hurd is cooperative multitasking. I thought that programs have to be designed to work in that type of system. How does hurd run programs not designed for cooporative multitasking?

<youpi>no it's not cooperative multitasking

<youpi>where did you read that?

<clarity_>it's not? I was looking at gnu mach I think, `

<clarity_>There currently is no kernel preemption in GNU Mach.

<clarity_>If GNU Mach were made a a preemptive kernel, using continuations would probably no longer make sense as the kernel itself, that is, kernel threads can be preempted, and then their full state needs to be preserved.

<youpi>*kernel* preemption

<youpi>doesn't mean no user preemption

<clarity_>ah, so the gnu's are cooporative, but the programs running on hurd are preemptive?

<youpi>no

<youpi>the hird of servers is preemptive too

<youpi>only the kernel is cooperative

<clarity_>you mean, just mach?

<youpi>yes

<clarity_>ic, thanks for explaining that

<clarity_>Another question, why is IPC slow with Mach? Is it because of memory pages being swapped out during context switches?

<youpi>from what are you saying that IPC is slow with Mach?

<clarity_>I was reading that the slow IPC/context switching is the big performance issue with hurd?

<clarity_>There was an effort to port Hurd to L4 I think it was called which is a microkernel with better performance for IPC/context switching?

<youpi>where did you read that?

<youpi>I mean, if it's a paper from 10 years ago, computers have changed so much

<clarity_>yeah, it was likely that old at least

<clarity_>I read, "the critique," and that's from 2007

<damo22>i dont understand how biblio's patch makes any difference on his system. the function fails before his code changes are executed

<youpi>yes, processors have changed so much

<youpi>what kills performance is I/O

<youpi>real I/O, disk I/O

<youpi>so missing prefetching etc.

<damo22>which layer is supposed to prefetch?

<damo22>the driver?

<youpi>it's not that clear

<youpi>probably the memory object

<youpi>so the kernel

<damo22>hmm

<youpi>which does see pages accesses etc.

<youpi>possibly that can be delegated

<damo22>how would the kernel know its a disk access?

<damo22>or you mean prefetch everything

<youpi>yes prefetch

<youpi>and the memory object provider can work on providing it

<youpi>it's useful for disks

<youpi>but also ftp, etc.

<damo22>yes

<damo22>so then we should disable caching in rump disk, and prefetch in the kernel instead

<youpi>yes

<youpi>I don't think caching is needed at all in rump disk actually

<youpi>(except buffer cache)

<youpi>a page cache in rump disk is duplicate with the caching already done by ext2fs

<damo22>right

<damo22>pretty sure i enabled rwdX

<damo22> http://git.savannah.gnu.org/cgit/hurd/hurd.git/tree/rumpdisk/block-rump.c#n95

<damo22>the "r" prefix uses it as a character device

<damo22>i think

<damo22>no vfs caching

<damo22>Pellescours: did you try my latest acpica-nothread branch? i cant seem to figure out why the root table works but the rest doesnt

<damo22>(20:08:19) damo22: ACPI: RSDT 0x000000007FFE2337 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001)

<damo22>(20:08:19) damo22: ACPI: ? 0x4350584200000001 C3F000FF (v226 ?S? ? 53F000FF ? 87F000FE)

<damo22>crash

<damo22>i am using qemu with -smp 2

<clarity_>neato, I'm looking at the source for gnumach/hurd

<clarity_>is gnumach still only supporting i386? I see amd64 assembly files in the source?

***Noisytoot is now known as [

<Pellescours>damo22: not yet, but I don't understand why that was working before and not now...exsept if that actually never worked but it wasn't crashing/covered

<Pellescours>clarity_: some work was done, but the kernel is not yet able to boot in amd64. someone ent patch for that but they are not upstreamed

<Pellescours>s/ent/sent/

<clarity_>ah, I see

<clarity_>it seems like development is picking up on hurd

<clarity_>I'm going to look into the source a bit more and see if I can contribute after I understand it a bit more

<clarity_>I've been getting really interested in operating system kernels lately

<clarity_>it's been a good 15 years since I took the operating systems course in college, and even longer since I've coded in assembly

<damo22>youpi: did the kernel memory start address change recently?

<damo22>how can any userspace process get access to physical addresses?

<damo22>is it possible to mmap() a physical address to a virtual address?

<damo22>/dev/mem seems to be different now

<damo22>if (off >= 0xa0000 && off < 0x100000) the mem device returns i386_btop(off)

<damo22>instead of vm_page_lookup_pa(off)

<damo22>#define i386_btop(x) (((unsigned long)(x)) >> I386_PGSHIFT)

<damo22>is that correct?

<Pellescours>damo22: your latest commit on acpica will have conflict with upstream master if ever

<damo22>ah ok

<damo22>i needed to test it though

<damo22>Pellescours: we dont need that commit if we can figure out how to access physical memory properly

<Pellescours>ok, the issue is in gnumach then, no?

<damo22>i dont know

<damo22>acpi is a pain

<damo22>we are duplicating code in gnumach and hurd

<damo22>the acpi tables need to be parsed from physical memory addresses

<damo22>the ACPICA code assumes it has access to raw physical memory

<damo22>(i think)

<Pellescours>SMP need ACPI so it’s complicated to remove it from gnumach

<damo22>yes we cannot remove that part from gnumach

<damo22>it just needs to read the irq overrides

<damo22>and parse the table regarding cpu cores

<damo22>but its also tricky because in userspace we need acpi to parse the AML without a root filesystem present

<Pellescours>why would that be tricky?

<damo22>acpi needs access to the physical addresses where the tables are stored

<Pellescours>oh the /dev/mem

<damo22>we cant use /dev

<Pellescours>is there other way to get this? the get_device() maybe?

<damo22>i guess we just device_open the mem device

<Pellescours>s/get_device/device_open/

<damo22>but using /dev/mem i thought was the same thing (for testing now)

<Pellescours>cool

<damo22>but the acpica code tries to use 16 bit paragraph address

<damo22>for the very first lookup

<damo22>like we do in gnumach

<damo22>whereas, we want to skip that region and try 0xe0000 instead

<damo22>as a 32 bit address

<damo22>i dont think 0x40e works in 32 bit mode?

<damo22>what is confusing is that this function acpi_find_root_pointer() used to work

<Pellescours>Do you know if it stopped to work due to recent patches or to something else ?

<damo22>i upgraded my hurd system from something very old to current

<Pellescours>ok, I gonna check an old commit of acpica-nothread to see if parsing is working

<damo22>maybe we can just slog through it with gdb

<damo22>no point going backwards

<Pellescours>if 880526c1182e8b1f8d3f17ec7ebedd73d90388cc was able to find root table pointer in the past it’s no longer the case

<damo22>oh

<Pellescours>I think biblio does not have updated his hurd/gnumach and that’s why he is able to get it working

<damo22>that does not seem to be a hurd commit

<Pellescours>If it’s possible to downgrade gnumach to confirm that

<damo22>my testacpi.c program does not depend on any hurd libs except libacpica, which i have in a hurd branch just for ease of versioning with the rest of acpi

<damo22>it literally calls acpi_init(); and then the irq functoin

<Pellescours>can it be gnumach commit 230d7726ce55114c5c32c440c5928f104a085ba6 that change the behavior ?

<youpi>damo22: not so recently

<youpi>damo22: of course you can mmap a physical address, that's what page tables are for :)

<youpi>the mem device has changed behavior, though: now it refuses to map any non-reserved memory (that is used for normal allocations)

<damo22>would i be allowed to map just under 1M?

<youpi>> (08:11:51) damo22: we cant use /dev

<youpi>why?

<damo22>bootstrapping the disk requires knowledge of irq

<youpi>you're allowed to map anything which is not RAM used for normal allocations

<damo22>so therefore acpi needs to be available before disk

<youpi>so all bios-reserved areas, acpi areas etc. are fine

<youpi>see memmmap

<youpi>that uses biosmem_addr_available to check whether the address is marked as available in e820

<damo22>memmmap has a special case for under 1M

<youpi>no

<youpi>not any more

<damo22>ok

<damo22>maybe that is what changed

<damo22>biosmem: 00000000000009fc00:0000000000000a0000, reserved

<damo22>biosmem: 0000000000000f0000:000000000000100000, reserved

<damo22>i think e0000 is not mentioned

<damo22>it seems to be a hole

<youpi>then mem should be allowing its mmap

<youpi>since it's not available memory

<damo22>i see

<youpi>+already

<damo22>biosmem: 0000000000feffc000:0000000000ff000000, reserved

<damo22>isnt that part of APIC?

<Pellescours>if mem was not able to memmap, then the call should have returned a non zero value to notify there was an error. And the call retuns 0

<damo22>Thread 4 hit Breakpoint 1, acpi_os_map_memory (phys=4850473839169110017, size=36)

<damo22>thats way too big

<youpi>acpi uses 32bit values, doesn't it? how do you end up with such number?

<damo22>(gdb) p *rsdp

<youpi>perhaps a phys_addr_t type that is 64 while it should be 32bit?

<damo22>$1 = {signature = "RSDT8\000\000", checksum = 1 '\001', oem_id = "EBOCHS", revision = 32 ' ',

<damo22> rsdt_physical_address = 1129338946, length = 538976288, xsdt_physical_address = 4850473839169110017,

<damo22> extended_checksum = 1 '\001', reserved = "\000\000"}

<damo22>revision should be

<damo22>1

<youpi>is the structure type really properly defined ?

<youpi>with proper sizes and aligns

<youpi>and the packed attribute

<damo22>thats using the acpica code

<damo22>Thread 4 hit Breakpoint 1, acpi_tb_parse_root_table (rsdp_address=2147361591)

<damo22> at ../../libacpica/tbutils.c:230

<damo22>ACPI: RSDT 0x000000007FFE2337 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001)

<damo22>it prints that, and then i checked the rsdp contents

<youpi>> thats using the acpica code

<youpi>the acpica code could still be wrong for whatever reason

<youpi>because it assumes things that aren't true on hurd/i386

<damo22>its not the right address for the root pointer

<damo22>(qemu) xp/40c 0x7ffe2337

<damo22>000000007ffe2337: 'R' 'S' 'D' 'T' '8' '\x00' '\x00' '\x00' '\x01' 'E' 'B' 'O' 'C' 'H' 'S' ' '

<damo22>000000007ffe2347: 'B' 'X' 'P' 'C' ' ' ' ' ' ' ' ' '\x01' '\x00' '\x00' '\x00' 'B' 'X' 'P' 'C'

<damo22>it should say "RSD PTR "

<Pellescours>damo22: I don’t see any step that should correspond to the API_MOVE_16_TO_32 when I do step by step in gdb, maybe it’s the macro definition which is not configured properly

<damo22>i almost fixed it

<damo22>it wants the root pointer not the sdt_base

<damo22>but we still need my custom root table searcher, because there is no 16 bit mode

<damo22>IRQ(0:1f.2) = 10

<damo22>:D

<biblio>damo22: :)

<biblio>damo22: i need to update gnumach to test your latest fix.

<damo22>biblio: i made more changes

<damo22>it was still broken

<biblio>damo22: oh ok.

<biblio>damo22: are you testing on real hardware or qemu ?

<biblio>damo22: I checked API docs and examples form Linux. I could not find anything wrong yet.

<biblio>damo22: API docs of acpi

<damo22>qemu

<damo22>i just pushed a working commit

<damo22>we will need to work out if its feasible to upstream the custom root table search

<damo22>i made it reusable for our purposes

<damo22>if you can find a way to reuse their root table finder instead of using mine, we should do that

<damo22>then we can remove /dev/mem call

<biblio>damo22: I did not get "their root table finder". You mean root table finder from acpi API call ?

<damo22>acpi_find_root_pointer()

<damo22>is part of acpi

<damo22>acpica*

<damo22>but it does not work on hurd currently

<biblio>damo22: we should try to use acpi_find_root_pointer() instead of your custom acpi_get_root_table_pointer(...) ?

<damo22>if possible, its better to use the built in code, otherwise we reinventing the wheel

<damo22>but calling that function instead breaks

<biblio>damo22: yes agree.

<damo22>so i implemented a custom one

<biblio>damo22: just to be sure. You want to replace your acpi_get_root_table_pointer() with acpi_find_root_pointer() in future ?

<damo22>there was a hook available to override its implementation

<damo22>well, if its possible yes

<biblio>damo22: ok got it

<AlmuHS>hi. I'm trying to compile upstream gnumach, but I have some problems: https://pastebin.com/rjzgi4rz

<AlmuHS>I used this line to configure. `../configure --host=i686-gnu CC=gcc LD=ld`

<AlmuHS>have I forgotten something?

<Pellescours>AlmuHS: I tried to build with the same command as you, but that’s works for me. Do you have any change compared to origin/master?

<AlmuHS>I fixed a conflict in configure.ac

<AlmuHS>maybe I fixed it bad

<AlmuHS> https://pastebin.com/qnzLsqvi

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/commit/d3a69008b473a2106fa7f21e6554f72ead2cea13

<Pellescours>AlmuHS: I built your smp_stage2 branch successfully

<AlmuHS>then I don't know the problem

<Pellescours>did you do `autoreconf -fi`?

<AlmuHS>I've just did it, but it doesn't works

<Pellescours>make clean && make gnumach.gz ???

<AlmuHS>same error

<AlmuHS>i'm trying to compile "master" branch of my repository

<AlmuHS>but there are not any significant change in compile process

<Pellescours>do you have latest mig ? I know that headers changed a bit

<AlmuHS>i'm not sure

<AlmuHS>upgrading my Debian then

<AlmuHS>meanwhile, I changed my configure line to "../configure --host=i686-gnu CC='gcc -m32' LD='ld -melf_i386' ", and now I have another error

<AlmuHS>ld: relocatable linking with relocations from format elf64-x86-64 (libkernel.a(model_dep.o)) to format elf32-i386 (gnumach.o) is not supported

<AlmuHS>trying again after "make clean"

<Pellescours>I usually just do ../configure

<AlmuHS>ok, now it works

<AlmuHS>upstream kernel working now, successfully by moment

<Pellescours>niice

<Pellescours>are you close to boot multi cpu ?

<AlmuHS>i found a kernel panic in my smp kernel, and I want to check if the panic is from my work or is from upstream

<Pellescours>when is you kp?

<AlmuHS>I got to boot multicpus two years ago, but my previous implementation was very dirty and i'm refactoring

<AlmuHS>this is my kernel panic https://pasteboard.co/agiSegphNSib.png

<AlmuHS>and, after this, i found this other one after reboot https://pasteboard.co/kAxotWNc1WLz.png

<AlmuHS>oops, i sent same image

<AlmuHS>wait

<AlmuHS> https://pasteboard.co/GIFCidMY70Zo.png

<AlmuHS>this is the error after reboot

<Pellescours>if you try the upstream kernel, it works?

<AlmuHS>i'm checking it now

<AlmuHS>but, when this error appears, then i can't boot any kernel

<AlmuHS>after this, i told

<Pellescours>ok, I’m trying to boot your branch to test it on my side

<AlmuHS>you have to add a new flag

<AlmuHS>--enable-ncpus=N

<AlmuHS>replace N by the number of cpus of your preference

<AlmuHS>add this flag in configure step

<AlmuHS>upstream without this flag seems stable, no crash

<Pellescours>fun fact, I first had a kernel panic. but after the reboot, it boot correctly

<AlmuHS>are you using qemu?

<Pellescours>yes qemu with -smp 4

<AlmuHS>ok

<AlmuHS>now compiling master branch with --enable-ncpus=4

<Pellescours>I’m not able to build master due to const changes

<AlmuHS>my repo has a merged master branch

<AlmuHS>now checking master branch with ncpus flag

<Pellescours>I got a kernel panic while compiling gnumach

<Pellescours>in your master branch ./device/device.server.h:38:9: error: unknown type name ‘const_dev_name_t’; did you mean ‘dev_name_t’?

<AlmuHS>now pushed

<AlmuHS>pull again

<Pellescours>got it, it’s compiling

<Pellescours>It boot

<AlmuHS>wait some time

<AlmuHS>make harddisk operations

<Pellescours>AlmuHS: with your smp_stage2 branch, I get a kernel panic (page fault)

<Pellescours>it’s when I do harddisk operation yeah

<AlmuHS>ok

<Pellescours>I was running a configure command actually

<AlmuHS>it's strange, because i didn't modified any harddisk controller's source code

<AlmuHS>try now to boot another kernel

<AlmuHS>another gnumach

<Pellescours>disk corrupted (multiple inode claim), but after fix corruption it boot

<AlmuHS>ok

<AlmuHS>i will try again then. Crossing fingers

<AlmuHS>it boots

<AlmuHS>but panic

<Pellescours>can the kernel panic be due to linux driver?

<AlmuHS> https://pasteboard.co/jaBxmOAcnTNj.png

<AlmuHS>i'm not sure

<AlmuHS>notice that panic is after boot

<AlmuHS>upstream kernel continues booting without problems

<AlmuHS>executing fsck now

<AlmuHS>smp kernel doesn't crashed yet

<AlmuHS>(after reboot)

<AlmuHS>oops, panic again https://pasteboard.co/E0Fn4yPITDoU.png

<Pellescours>yeah, it’s a page fault

<AlmuHS>a friend is debugging the error, but we don't find the origin

<Pellescours>but it don’t happen as long as you don’t write to disk

<AlmuHS>this is the question

<AlmuHS>i write FROM the disk (the assembly routine) but not TO the disk

<AlmuHS>the only that i remember that could generate problems is the copy of this assembly routine. I copy this in a address which is not mapped

<Pellescours>I did write of file in vim, no bug

<AlmuHS>#define AP_BOOT_ADDR (0x7000)

<AlmuHS>this is the memory address in which i copy the assembly routine

<AlmuHS>memcpy((void*)phystokv(AP_BOOT_ADDR), (void*) &apboot, (uint32_t)&apbootend - (uint32_t)&apboot);

<Pellescours>I think it’s concurent write that fails

<AlmuHS>maybe. It's a preliminary work, and i don't add concurrency controls yet

<Pellescours>I tried to write a file multiple time using vim, no kp. I did make clean, and kp after some RM

<AlmuHS>but the cpus are not added to the scheduler yet

<AlmuHS>at moment, i simply start and configure the cpus, but the only working is cpu0

<Pellescours>and the corruption is really strange because it’s not left inode but real inode being corrupted (multiple inode claim)

<Pellescours>I think the bug is hidden behind an #if NCPUs > 1

<Pellescours>not related to your work

<AlmuHS>maybe

<Pellescours>I will try to compile master with ncpu > 1 and check

<AlmuHS>ok

<Pellescours>master multi-cpu works

<Pellescours>no kernel panic

<Pellescours>it’s one of your changes that break something

<AlmuHS>yeah

<Pellescours>doing a "bisect" of your changes maybe would help, or using the kdb

<AlmuHS>these are my latest changes

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/smp.c

<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/mp_desc.c

<luckyluke>AlmuHS: Pellescours did you try to see where is the address causing the page fault? for example, what code is executing at that moment...

<luckyluke>from the last screenshots you sent, this would mean checking address 0xc101cc4d and decoding with addr2line

<AlmuHS>i will try it

<luckyluke>but it seems the address causing the fault is 0xc101ccba, which could be some variable

<luckyluke>also, if you use qemu you can redireect the console to a serial port, so you can copy the text :)

<luckyluke>if you enable kdb, you should also be able to get a backtrace

<AlmuHS>this is my qemu script https://pastebin.com/zHWRP0v2

<AlmuHS>how can i enable kdb?

<Pellescours>at compile time you need to enable it then it’s https://www.gnu.org/software/hurd/microkernel/mach/gnumach/debugging.html

<luckyluke>you need to add --enable-kdb at configure stage

<Pellescours>control+alt+d

<luckyluke>do you use gdb? I see you launch qemu with -s -S

<AlmuHS>i use gdb

<luckyluke>in that case, you could also try to set a breakpoint at the code address above, just before the panic

<AlmuHS>i have a friend who are doing just this

<AlmuHS>he has more experience in debugging than me

<luckyluke>so you can examine the state before the issue

<luckyluke>ah ok

<AlmuHS>i've just added a loop counter after raise each IPI, but the problem continues

<luckyluke>AlmuHS: I was having a look at your code, it seems once you start the other cpus, they execute cpu_setup() but it seems that after that, cpu_ap_main() returns and I don't see a point where the cpu should "wait"... is this correct?

<AlmuHS>it could be correct

<luckyluke>I see a hlt instruction in cpuboot.S, but it's commented, could it be that the cpu just goes on and eventually it causes some mess?

<AlmuHS>i can reenable the hlt loop

<AlmuHS>i created this loop to keep the cpus in a infinite loop if they are not added to scheduler

<luckyluke>that seems reasonable

<AlmuHS>i've just reenabled the loop, but the panic continues

<luckyluke>did you check what code corresponds to the address reported?

<AlmuHS>i'm not yet

<AlmuHS>c101cc4d: 87 43 24 xchg %eax,0x24(%ebx)

<AlmuHS>not significant

<luckyluke>you should check to which function it belongs

<AlmuHS>c101cb40 <thread_quantum_update>:

<luckyluke>there are some NCPU > 1 defines there, you could try adding some print() to see what is wrong there

<luckyluke>do you have the clock_interrupt() running on the secondary cpus?

<AlmuHS>i think not

<AlmuHS>the secondary cpus only are alive, but these are not added to the kernel yet

<AlmuHS>even, i didn't enable paging in these

<AlmuHS>yet

<AlmuHS>I go to disable this line: machine_slot[i].running = TRUE

<AlmuHS>not, it's not the problem

<curiosa>do any of you here knows the story about the string_t type? I'm looking at this https://git.savannah.gnu.org/cgit/hurd/mig.git/tree/type.c#n691

<curiosa>why there is a special array kind in mig called c_string ?

<curiosa>if then it is just array[n] of (MSG_TYPE_STRING_C, 8)

<curiosa>wouldn't be better to get rid of itN

<curiosa>?

<curiosa>it is really the only thing in gnumig that breaks the mig grammar

<curiosa>so it is not just weird, it is just plain ugly

<curiosa>apparently it was already there in the first commit in 1998

<curiosa>it is there in apple defs https://opensource.apple.com/source/autofs/autofs-270.40.1/mig/autofs_migtypes.h.auto.html

<biblio>damo22: FYI https://www.osnews.com/story/134539/a-practical-solution-for-gnu-hurds-lack-of-drivers-netbsds-rumpkernel-framework/

<curiosa>ah damo22, I've seen your talk, very interesting!

<AlmuHS>he is australian, so it can't read your messages until europe's late night

<AlmuHS>**he can't read

***biblio_ is now known as biblio

<damo22>hi, thanks for the link

<damo22>demo@zamhurd:/part3/demo/git/hurd-sv/build/acpi$ sudo ./testacpi

<damo22>ACPI: RSDP 0x00000000000F5860 000014 (v00 BOCHS )

<damo22>ACPI: RSDT 0x000000007FFE2337 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001)

<damo22>....

<damo22>ACPI: MCFG 0x000000007FFE22D3 00003C (v01 BOCHS BXPC 00000001 BXPC 00000001)

<damo22>ACPI: WAET 0x000000007FFE230F 000028 (v01 BOCHS BXPC 00000001 BXPC 00000001)

IRC channel logs

2022-02-11.log