IRC channel logs

2022-02-11.log

back to list of logs

***alMalsam1 is now known as alMalsamo
***FragByte_ is now known as FragByte
<clarity_>hey
<clarity_>From what I understand, hurd is cooperative multitasking. I thought that programs have to be designed to work in that type of system. How does hurd run programs not designed for cooporative multitasking?
<youpi>no it's not cooperative multitasking
<youpi>where did you read that?
<clarity_>it's not? I was looking at gnu mach I think, `
<clarity_>There currently is no kernel preemption in GNU Mach.
<clarity_>If GNU Mach were made a a preemptive kernel, using continuations would probably no longer make sense as the kernel itself, that is, kernel threads can be preempted, and then their full state needs to be preserved.
<clarity_>See also multithreading.` https://www.gnu.org/software/hurd/microkernel/mach/gnumach/preemption.html
<youpi>*kernel* preemption
<youpi>doesn't mean no user preemption
<clarity_>ah, so the gnu's are cooporative, but the programs running on hurd are preemptive?
<youpi>no
<youpi>the hird of servers is preemptive too
<youpi>only the kernel is cooperative
<clarity_>you mean, just mach?
<youpi>yes
<clarity_>ic, thanks for explaining that
<clarity_>Another question, why is IPC slow with Mach? Is it because of memory pages being swapped out during context switches?
<youpi>from what are you saying that IPC is slow with Mach?
<clarity_>I was reading that the slow IPC/context switching is the big performance issue with hurd?
<clarity_>There was an effort to port Hurd to L4 I think it was called which is a microkernel with better performance for IPC/context switching?
<youpi>where did you read that?
<youpi>I mean, if it's a paper from 10 years ago, computers have changed so much
<clarity_>yeah, it was likely that old at least
<clarity_>I read, "the critique," and that's from 2007
<damo22>i dont understand how biblio's patch makes any difference on his system. the function fails before his code changes are executed
<youpi>yes, processors have changed so much
<youpi>what kills performance is I/O
<youpi>real I/O, disk I/O
<youpi>so missing prefetching etc.
<damo22>which layer is supposed to prefetch?
<damo22>the driver?
<youpi>it's not that clear
<youpi>probably the memory object
<youpi>so the kernel
<damo22>hmm
<youpi>which does see pages accesses etc.
<youpi>possibly that can be delegated
<damo22>how would the kernel know its a disk access?
<damo22>or you mean prefetch everything
<youpi>yes prefetch
<youpi>and the memory object provider can work on providing it
<youpi>it's useful for disks
<youpi>but also ftp, etc.
<damo22>yes
<damo22>so then we should disable caching in rump disk, and prefetch in the kernel instead
<youpi>yes
<youpi>I don't think caching is needed at all in rump disk actually
<youpi>(except buffer cache)
<youpi>a page cache in rump disk is duplicate with the caching already done by ext2fs
<damo22>right
<damo22>pretty sure i enabled rwdX
<damo22> http://git.savannah.gnu.org/cgit/hurd/hurd.git/tree/rumpdisk/block-rump.c#n95
<damo22>the "r" prefix uses it as a character device
<damo22>i think
<damo22>no vfs caching
<damo22>Pellescours: did you try my latest acpica-nothread branch? i cant seem to figure out why the root table works but the rest doesnt
<damo22>(20:08:19) damo22: ACPI: RSDT 0x000000007FFE2337 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001)
<damo22>(20:08:19) damo22: ACPI: ? 0x4350584200000001 C3F000FF (v226 ?S? ? 53F000FF ? 87F000FE)
<damo22>crash
<damo22>i am using qemu with -smp 2
<clarity_>neato, I'm looking at the source for gnumach/hurd
<clarity_>is gnumach still only supporting i386? I see amd64 assembly files in the source?
***Noisytoot is now known as [
<Pellescours>damo22: not yet, but I don't understand why that was working before and not now...exsept if that actually never worked but it wasn't crashing/covered
<Pellescours>clarity_: some work was done, but the kernel is not yet able to boot in amd64. someone ent patch for that but they are not upstreamed
<Pellescours>s/ent/sent/
<clarity_>ah, I see
<clarity_>it seems like development is picking up on hurd
<clarity_>I'm going to look into the source a bit more and see if I can contribute after I understand it a bit more
<clarity_>I've been getting really interested in operating system kernels lately
<clarity_>it's been a good 15 years since I took the operating systems course in college, and even longer since I've coded in assembly
<damo22>youpi: did the kernel memory start address change recently?
<damo22>how can any userspace process get access to physical addresses?
<damo22>is it possible to mmap() a physical address to a virtual address?
<damo22>/dev/mem seems to be different now
<damo22>if (off >= 0xa0000 && off < 0x100000) the mem device returns i386_btop(off)
<damo22>instead of vm_page_lookup_pa(off)
<damo22>#define i386_btop(x) (((unsigned long)(x)) >> I386_PGSHIFT)
<damo22>is that correct?
<Pellescours>damo22: your latest commit on acpica will have conflict with upstream master if ever
<damo22>ah ok
<damo22>i needed to test it though
<damo22>Pellescours: we dont need that commit if we can figure out how to access physical memory properly
<Pellescours>ok, the issue is in gnumach then, no?
<damo22>i dont know
<damo22>acpi is a pain
<damo22>we are duplicating code in gnumach and hurd
<damo22>the acpi tables need to be parsed from physical memory addresses
<damo22>the ACPICA code assumes it has access to raw physical memory
<damo22>(i think)
<Pellescours>SMP need ACPI so it’s complicated to remove it from gnumach
<damo22>yes we cannot remove that part from gnumach
<damo22>it just needs to read the irq overrides
<damo22>and parse the table regarding cpu cores
<damo22>but its also tricky because in userspace we need acpi to parse the AML without a root filesystem present
<Pellescours>why would that be tricky?
<damo22>acpi needs access to the physical addresses where the tables are stored
<Pellescours>oh the /dev/mem
<damo22>we cant use /dev
<Pellescours>is there other way to get this? the get_device() maybe?
<damo22>i guess we just device_open the mem device
<Pellescours>s/get_device/device_open/
<damo22>but using /dev/mem i thought was the same thing (for testing now)
<Pellescours>cool
<damo22>but the acpica code tries to use 16 bit paragraph address
<damo22>for the very first lookup
<damo22>like we do in gnumach
<damo22>whereas, we want to skip that region and try 0xe0000 instead
<damo22>as a 32 bit address
<damo22>i dont think 0x40e works in 32 bit mode?
<damo22>what is confusing is that this function acpi_find_root_pointer() used to work
<Pellescours>Do you know if it stopped to work due to recent patches or to something else ?
<damo22>i upgraded my hurd system from something very old to current
<Pellescours>ok, I gonna check an old commit of acpica-nothread to see if parsing is working
<damo22>maybe we can just slog through it with gdb
<damo22>no point going backwards
<Pellescours>if 880526c1182e8b1f8d3f17ec7ebedd73d90388cc was able to find root table pointer in the past it’s no longer the case
<damo22>oh
<Pellescours>I think biblio does not have updated his hurd/gnumach and that’s why he is able to get it working
<damo22>that does not seem to be a hurd commit
<Pellescours>If it’s possible to downgrade gnumach to confirm that
<damo22>my testacpi.c program does not depend on any hurd libs except libacpica, which i have in a hurd branch just for ease of versioning with the rest of acpi
<damo22>it literally calls acpi_init(); and then the irq functoin
<Pellescours>can it be gnumach commit 230d7726ce55114c5c32c440c5928f104a085ba6 that change the behavior ?
<youpi>damo22: not so recently
<youpi>damo22: of course you can mmap a physical address, that's what page tables are for :)
<youpi>the mem device has changed behavior, though: now it refuses to map any non-reserved memory (that is used for normal allocations)
<damo22>would i be allowed to map just under 1M?
<youpi>> (08:11:51) damo22: we cant use /dev
<youpi>why?
<damo22>bootstrapping the disk requires knowledge of irq
<youpi>you're allowed to map anything which is not RAM used for normal allocations
<damo22>so therefore acpi needs to be available before disk
<youpi>so all bios-reserved areas, acpi areas etc. are fine
<youpi>see memmmap
<youpi>that uses biosmem_addr_available to check whether the address is marked as available in e820
<damo22>memmmap has a special case for under 1M
<youpi>no
<youpi>not any more
<damo22>ok
<damo22>maybe that is what changed
<damo22>biosmem: 00000000000009fc00:0000000000000a0000, reserved
<damo22>biosmem: 0000000000000f0000:000000000000100000, reserved
<damo22>i think e0000 is not mentioned
<damo22>it seems to be a hole
<youpi>then mem should be allowing its mmap
<youpi>since it's not available memory
<damo22>i see
<youpi>+already
<damo22>biosmem: 0000000000feffc000:0000000000ff000000, reserved
<damo22>isnt that part of APIC?
<Pellescours>if mem was not able to memmap, then the call should have returned a non zero value to notify there was an error. And the call retuns 0
<damo22>Thread 4 hit Breakpoint 1, acpi_os_map_memory (phys=4850473839169110017, size=36)
<damo22>thats way too big
<youpi>acpi uses 32bit values, doesn't it? how do you end up with such number?
<damo22>(gdb) p *rsdp
<youpi>perhaps a phys_addr_t type that is 64 while it should be 32bit?
<damo22>$1 = {signature = "RSDT8\000\000", checksum = 1 '\001', oem_id = "EBOCHS", revision = 32 ' ',
<damo22> rsdt_physical_address = 1129338946, length = 538976288, xsdt_physical_address = 4850473839169110017,
<damo22> extended_checksum = 1 '\001', reserved = "\000\000"}
<damo22>revision should be
<damo22>1
<youpi>is the structure type really properly defined ?
<youpi>with proper sizes and aligns
<youpi>and the packed attribute
<damo22>thats using the acpica code
<damo22>Thread 4 hit Breakpoint 1, acpi_tb_parse_root_table (rsdp_address=2147361591)
<damo22> at ../../libacpica/tbutils.c:230
<damo22>ACPI: RSDT 0x000000007FFE2337 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001)
<damo22>it prints that, and then i checked the rsdp contents
<youpi>> thats using the acpica code
<youpi>the acpica code could still be wrong for whatever reason
<youpi>because it assumes things that aren't true on hurd/i386
<damo22>its not the right address for the root pointer
<damo22>(qemu) xp/40c 0x7ffe2337
<damo22>000000007ffe2337: 'R' 'S' 'D' 'T' '8' '\x00' '\x00' '\x00' '\x01' 'E' 'B' 'O' 'C' 'H' 'S' ' '
<damo22>000000007ffe2347: 'B' 'X' 'P' 'C' ' ' ' ' ' ' ' ' '\x01' '\x00' '\x00' '\x00' 'B' 'X' 'P' 'C'
<damo22>it should say "RSD PTR "
<Pellescours>damo22: I don’t see any step that should correspond to the API_MOVE_16_TO_32 when I do step by step in gdb, maybe it’s the macro definition which is not configured properly
<damo22>i almost fixed it
<damo22>it wants the root pointer not the sdt_base
<damo22>but we still need my custom root table searcher, because there is no 16 bit mode
<damo22>IRQ(0:1f.2) = 10
<damo22>:D
<biblio>damo22: :)
<biblio>damo22: i need to update gnumach to test your latest fix.
<damo22>biblio: i made more changes
<damo22>it was still broken
<biblio>damo22: oh ok.
<biblio>damo22: are you testing on real hardware or qemu ?
<biblio>damo22: I checked API docs and examples form Linux. I could not find anything wrong yet.
<biblio>damo22: API docs of acpi
<damo22>qemu
<damo22>i just pushed a working commit
<damo22>we will need to work out if its feasible to upstream the custom root table search
<damo22>i made it reusable for our purposes
<damo22>if you can find a way to reuse their root table finder instead of using mine, we should do that
<damo22>then we can remove /dev/mem call
<biblio>damo22: I did not get "their root table finder". You mean root table finder from acpi API call ?
<damo22>acpi_find_root_pointer()
<damo22>is part of acpi
<damo22>acpica*
<damo22>but it does not work on hurd currently
<biblio>damo22: we should try to use acpi_find_root_pointer() instead of your custom acpi_get_root_table_pointer(...) ?
<damo22>if possible, its better to use the built in code, otherwise we reinventing the wheel
<damo22>but calling that function instead breaks
<biblio>damo22: yes agree.
<damo22>so i implemented a custom one
<biblio>damo22: just to be sure. You want to replace your acpi_get_root_table_pointer() with acpi_find_root_pointer() in future ?
<damo22>there was a hook available to override its implementation
<damo22>well, if its possible yes
<biblio>damo22: ok got it
<AlmuHS>hi. I'm trying to compile upstream gnumach, but I have some problems: https://pastebin.com/rjzgi4rz
<AlmuHS>I used this line to configure. `../configure --host=i686-gnu CC=gcc LD=ld`
<AlmuHS>have I forgotten something?
<Pellescours>AlmuHS: I tried to build with the same command as you, but that’s works for me. Do you have any change compared to origin/master?
<AlmuHS>I fixed a conflict in configure.ac
<AlmuHS>maybe I fixed it bad
<AlmuHS> https://pastebin.com/qnzLsqvi
<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/commit/d3a69008b473a2106fa7f21e6554f72ead2cea13
<Pellescours>AlmuHS: I built your smp_stage2 branch successfully
<AlmuHS>then I don't know the problem
<Pellescours>did you do `autoreconf -fi`?
<AlmuHS>I've just did it, but it doesn't works
<Pellescours>make clean && make gnumach.gz ???
<AlmuHS>same error
<AlmuHS>i'm trying to compile "master" branch of my repository
<AlmuHS>but there are not any significant change in compile process
<Pellescours>do you have latest mig ? I know that headers changed a bit
<AlmuHS>i'm not sure
<AlmuHS>upgrading my Debian then
<AlmuHS>meanwhile, I changed my configure line to "../configure --host=i686-gnu CC='gcc -m32' LD='ld -melf_i386' ", and now I have another error
<AlmuHS>ld: relocatable linking with relocations from format elf64-x86-64 (libkernel.a(model_dep.o)) to format elf32-i386 (gnumach.o) is not supported
<AlmuHS>trying again after "make clean"
<Pellescours>I usually just do ../configure
<AlmuHS>ok, now it works
<AlmuHS>upstream kernel working now, successfully by moment
<Pellescours>niice
<Pellescours>are you close to boot multi cpu ?
<AlmuHS>i found a kernel panic in my smp kernel, and I want to check if the panic is from my work or is from upstream
<Pellescours>when is you kp?
<AlmuHS>I got to boot multicpus two years ago, but my previous implementation was very dirty and i'm refactoring
<AlmuHS>this is my kernel panic https://pasteboard.co/agiSegphNSib.png
<AlmuHS>and, after this, i found this other one after reboot https://pasteboard.co/kAxotWNc1WLz.png
<AlmuHS>oops, i sent same image
<AlmuHS>wait
<AlmuHS> https://pasteboard.co/GIFCidMY70Zo.png
<AlmuHS>this is the error after reboot
<Pellescours>if you try the upstream kernel, it works?
<AlmuHS>i'm checking it now
<AlmuHS>but, when this error appears, then i can't boot any kernel
<AlmuHS>after this, i told
<Pellescours>ok, I’m trying to boot your branch to test it on my side
<AlmuHS>you have to add a new flag
<AlmuHS>--enable-ncpus=N
<AlmuHS>replace N by the number of cpus of your preference
<AlmuHS>add this flag in configure step
<AlmuHS>upstream without this flag seems stable, no crash
<Pellescours>fun fact, I first had a kernel panic. but after the reboot, it boot correctly
<AlmuHS>are you using qemu?
<Pellescours>yes qemu with -smp 4
<AlmuHS>ok
<AlmuHS>now compiling master branch with --enable-ncpus=4
<Pellescours>I’m not able to build master due to const changes
<AlmuHS>my repo has a merged master branch
<AlmuHS>now checking master branch with ncpus flag
<Pellescours>I got a kernel panic while compiling gnumach
<Pellescours>in your master branch ./device/device.server.h:38:9: error: unknown type name ‘const_dev_name_t’; did you mean ‘dev_name_t’?
<AlmuHS>now pushed
<AlmuHS>pull again
<Pellescours>got it, it’s compiling
<Pellescours>It boot
<AlmuHS>wait some time
<AlmuHS>make harddisk operations
<Pellescours>AlmuHS: with your smp_stage2 branch, I get a kernel panic (page fault)
<Pellescours>it’s when I do harddisk operation yeah
<AlmuHS>ok
<Pellescours>I was running a configure command actually
<AlmuHS>it's strange, because i didn't modified any harddisk controller's source code
<AlmuHS>try now to boot another kernel
<AlmuHS>another gnumach
<Pellescours>disk corrupted (multiple inode claim), but after fix corruption it boot
<AlmuHS>ok
<AlmuHS>i will try again then. Crossing fingers
<AlmuHS>it boots
<AlmuHS>but panic
<Pellescours>can the kernel panic be due to linux driver?
<AlmuHS> https://pasteboard.co/jaBxmOAcnTNj.png
<AlmuHS>i'm not sure
<AlmuHS>notice that panic is after boot
<AlmuHS>upstream kernel continues booting without problems
<AlmuHS>executing fsck now
<AlmuHS>smp kernel doesn't crashed yet
<AlmuHS>(after reboot)
<AlmuHS>oops, panic again https://pasteboard.co/E0Fn4yPITDoU.png
<Pellescours>yeah, it’s a page fault
<AlmuHS>a friend is debugging the error, but we don't find the origin
<Pellescours>but it don’t happen as long as you don’t write to disk
<AlmuHS>this is the question
<AlmuHS>i write FROM the disk (the assembly routine) but not TO the disk
<AlmuHS>the only that i remember that could generate problems is the copy of this assembly routine. I copy this in a address which is not mapped
<Pellescours>I did write of file in vim, no bug
<AlmuHS>#define AP_BOOT_ADDR (0x7000)
<AlmuHS>this is the memory address in which i copy the assembly routine
<AlmuHS>memcpy((void*)phystokv(AP_BOOT_ADDR), (void*) &apboot, (uint32_t)&apbootend - (uint32_t)&apboot);
<Pellescours>I think it’s concurent write that fails
<AlmuHS>maybe. It's a preliminary work, and i don't add concurrency controls yet
<Pellescours>I tried to write a file multiple time using vim, no kp. I did make clean, and kp after some RM
<AlmuHS>but the cpus are not added to the scheduler yet
<AlmuHS>at moment, i simply start and configure the cpus, but the only working is cpu0
<Pellescours>and the corruption is really strange because it’s not left inode but real inode being corrupted (multiple inode claim)
<Pellescours>I think the bug is hidden behind an #if NCPUs > 1
<Pellescours>not related to your work
<AlmuHS>maybe
<Pellescours>I will try to compile master with ncpu > 1 and check
<AlmuHS>ok
<Pellescours>master multi-cpu works
<Pellescours>no kernel panic
<Pellescours>it’s one of your changes that break something
<AlmuHS>yeah
<Pellescours>doing a "bisect" of your changes maybe would help, or using the kdb
<AlmuHS>these are my latest changes
<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/smp.c
<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/mp_desc.c
<luckyluke>AlmuHS: Pellescours did you try to see where is the address causing the page fault? for example, what code is executing at that moment...
<luckyluke>from the last screenshots you sent, this would mean checking address 0xc101cc4d and decoding with addr2line
<AlmuHS>i will try it
<luckyluke>but it seems the address causing the fault is 0xc101ccba, which could be some variable
<luckyluke>also, if you use qemu you can redireect the console to a serial port, so you can copy the text :)
<luckyluke>if you enable kdb, you should also be able to get a backtrace
<AlmuHS>this is my qemu script https://pastebin.com/zHWRP0v2
<AlmuHS>how can i enable kdb?
<Pellescours>at compile time you need to enable it then it’s https://www.gnu.org/software/hurd/microkernel/mach/gnumach/debugging.html
<luckyluke>you need to add --enable-kdb at configure stage
<Pellescours>control+alt+d
<luckyluke>do you use gdb? I see you launch qemu with -s -S
<AlmuHS>i use gdb
<luckyluke>in that case, you could also try to set a breakpoint at the code address above, just before the panic
<AlmuHS>i have a friend who are doing just this
<AlmuHS>he has more experience in debugging than me
<luckyluke>so you can examine the state before the issue
<luckyluke>ah ok
<AlmuHS>i've just added a loop counter after raise each IPI, but the problem continues
<luckyluke>AlmuHS: I was having a look at your code, it seems once you start the other cpus, they execute cpu_setup() but it seems that after that, cpu_ap_main() returns and I don't see a point where the cpu should "wait"... is this correct?
<AlmuHS>it could be correct
<luckyluke>I see a hlt instruction in cpuboot.S, but it's commented, could it be that the cpu just goes on and eventually it causes some mess?
<AlmuHS>i can reenable the hlt loop
<AlmuHS>i created this loop to keep the cpus in a infinite loop if they are not added to scheduler
<luckyluke>that seems reasonable
<AlmuHS>i've just reenabled the loop, but the panic continues
<luckyluke>did you check what code corresponds to the address reported?
<AlmuHS>i'm not yet
<AlmuHS>c101cc4d: 87 43 24 xchg %eax,0x24(%ebx)
<AlmuHS>not significant
<luckyluke>you should check to which function it belongs
<AlmuHS>c101cb40 <thread_quantum_update>:
<luckyluke>there are some NCPU > 1 defines there, you could try adding some print() to see what is wrong there
<luckyluke>do you have the clock_interrupt() running on the secondary cpus?
<AlmuHS>i think not
<AlmuHS>the secondary cpus only are alive, but these are not added to the kernel yet
<AlmuHS>even, i didn't enable paging in these
<AlmuHS>yet
<AlmuHS>I go to disable this line: machine_slot[i].running = TRUE
<AlmuHS>not, it's not the problem
<curiosa>do any of you here knows the story about the string_t type? I'm looking at this https://git.savannah.gnu.org/cgit/hurd/mig.git/tree/type.c#n691
<curiosa>why there is a special array kind in mig called c_string ?
<curiosa>if then it is just array[n] of (MSG_TYPE_STRING_C, 8)
<curiosa>wouldn't be better to get rid of itN
<curiosa>?
<curiosa>it is really the only thing in gnumig that breaks the mig grammar
<curiosa>so it is not just weird, it is just plain ugly
<curiosa>apparently it was already there in the first commit in 1998
<curiosa>it is there in apple defs https://opensource.apple.com/source/autofs/autofs-270.40.1/mig/autofs_migtypes.h.auto.html
<biblio>damo22: FYI https://www.osnews.com/story/134539/a-practical-solution-for-gnu-hurds-lack-of-drivers-netbsds-rumpkernel-framework/
<curiosa>ah damo22, I've seen your talk, very interesting!
<AlmuHS>he is australian, so it can't read your messages until europe's late night
<AlmuHS>**he can't read
***biblio_ is now known as biblio
<damo22>hi, thanks for the link
<damo22>demo@zamhurd:/part3/demo/git/hurd-sv/build/acpi$ sudo ./testacpi
<damo22>ACPI: RSDP 0x00000000000F5860 000014 (v00 BOCHS )
<damo22>ACPI: RSDT 0x000000007FFE2337 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001)
<damo22>....
<damo22>ACPI: MCFG 0x000000007FFE22D3 00003C (v01 BOCHS BXPC 00000001 BXPC 00000001)
<damo22>ACPI: WAET 0x000000007FFE230F 000028 (v01 BOCHS BXPC 00000001 BXPC 00000001)