IRC channel logs

2023-11-30.log

back to list of logs

<matrix_bridge><Andrius Štikonas> oriansj, fossy: what do you think about https://github.com/fosslinux/live-bootstrap/pull/339 I'm a bit reluctant to have those assembly functions in wrap.c...
<matrix_bridge><Andrius Štikonas> Maybe it's better to put them into M2libc?
<oriansj>stikonas: The syscalls definitely belong in M2libc; the wrap.c program minus the assembly syscalls definitely should go in mescc-tools-extra
<stikonas>do you want to leave a comment there?
<stikonas>or should I write
<oriansj>I can leave a comment
<stikonas>thanks!
<oriansj>done
<Googulator>Is it intentional that the "src" command in builder-hex0 never closes the file descriptor it writes to?
<Googulator>I tried adding a close system call to the end, but it actually breaks things
<stikonas>I think only rickmasters know this...
<stikonas>s/know/knows/
<stikonas>oriansj: thanks again!
<fossy>stikonas: yeah i agree with the functions going in M2libc. I'm not sure i understand the overarching purpose of the PR yet either though. while it does reduce the binary seed, in such a mode bwrap is such a small part of the binary seed cause you have a whole Linux userland underneath, unless i'm misinterpreting the PR?
<fossy>stikonas: also if you could review #337 #340 #341 (but don't merge, I'd like them to be merged at the same time as the big PR) i'd appreciate that. those 3 are relatively smaller PRs
<fossy>but i still want eyes on them
<rickmasters>Googulator: The src command doesn't call close because close was not implemented when the src command was originally written
<rickmasters>Googulator: close was implemented later as part of per-process file descriptors
<rickmasters>Googulator: I probably should have added a close call to src when per-process file descriptors was implemented.
<rickmasters>Googulator: Without the close, the kernel process (process zero) will probably overflow its file descriptor table.
<rickmasters>Googulator: From inspection, it appears that overflowing the table will go into memory that is not used yet, so technically the close does not appear to be necessary
<rickmasters>Googulator: But I don't like that so I think adding a close is a good idea.
<rickmasters>Googulator: Just looking at the source, I can't see why that would not work, so I'm surprised you ran into trouble.
<rickmasters>Googulator: close sets eax to zero but that's normal for a system call and I don't see that causing any trouble.
<Googulator>rickmasters: the effect of adding a close is that hex2-0 never completes linking catm
<Googulator>./x86/artifact/hex2-0 ./x86/catm_x86.hex2 ./x86/artifact/catm is the command that fails
<Googulator>it just seems to loop forever
<rickmasters>Googulator: Hmm that's mysterious because that comes well after src commands are done
<rickmasters>Googulator: I can try to reproduce that by inserting a close here:
<rickmasters> https://github.com/ironmeld/builder-hex0/blob/3f20b992161a1e1976549d5a74db617164b0a1ac/builder-hex0.hex2#L2020
<Googulator>that's exactly what I did
<Googulator>I have
<Googulator>:src_finish
<Googulator>EB 07 # 89 D3         # mov ebx, edx
<Googulator>B8 04 00 00 00        # mov eax, 6  ; syscall_close
<Googulator>CD 80                 # int 80
<Googulator>5F                    # pop edi
<Googulator>5E                    # pop esi
<Googulator>5A                    # pop edx
<Googulator>59                    # pop ecx
<Googulator>5B                    # pop ebx
<Googulator>58                    # pop eax
<Googulator>CB                    # ret
<Googulator>EB 07 to skip it - delete it to actually run the close
<rickmasters>shoudn't that be B8 06 00 00 00 ?
<Googulator>oh sh*t...
<Googulator>you're right
<Googulator>BTW, is scratch actually used here for anything?
<Googulator>:readwrite_loop
<Googulator>85 C9                 # test ecx, ecx
<Googulator>74 !src_finish        # jz src_finish
<Googulator>BF 00 02 04 00        # mov edi, 0x00040200     ; scratch buffer
<Googulator>57                    # push edi                ; save buffer address
<Googulator>31 DB                 # xor ebx, ebx            ; ebx=0=stdin
<Googulator>9A &read 08 00        # call read
<Googulator>89 D3                 # mov ebx, edx            ;  prepare to write
<Googulator>5E                    # pop esi                 ; restore buffer address to esi
<Googulator>9A &write 08 00       # call write
<Googulator>49                    # dec ecx   ; count--
<Googulator>EB !readwrite_loop    # jmp readwrite_loop
<Googulator>...but nothing in there seems to use edi or esi
<rickmasters>right, that looks unused. Its probably left over from old code that worked differently. I'd have to research that. Good find.
<Googulator>OK, with the correct system call number, it now works in qemu
<Googulator>but it unfortunately didn't solve the lockup on bare metal
<rickmasters>Googulator: Does the lockup happen every time? I seem to remember you said you got the bootstrap to work on your birthday...
<Googulator>It happens every time if the drive controller is in standard or enhanced IDE mkde
<Googulator>*mode
<Googulator>Works in AHCI or RAID
<Googulator>But only one of my boards can actually enable and boot from AHCI or RAID
<Googulator>the others either lack those options in the BIOS, or require the boot drive to be configured as IDE
<rickmasters>ok
<rickmasters>To explain the unused scratch, this is left over from when builder-hex0 only supported text files and src read/wrote a line at a time into a scratch buffer
<rickmasters>That changed to a byte at a time here: https://github.com/ironmeld/builder-hex0/commit/66ca665f1645646494f4813db8c5250742649e0d
<rickmasters>I should have removed the buffer setup in that commit. I'll work on that.
<Googulator>The freeze issue is reproducible also on a HP D530 USDT with a Socket 478 Pentium 4 (with a modified builder-hex0 that puts file data @ 0x14000000 instead of 0x54000000 - of course, such a kernel will never successfully run a real build, but it's enough to test the srcfs reading code)
<Googulator>Totally different BIOS, totally different chipset
<Googulator>I don't have a period AMD system to test with, unfortunately
<Googulator>& yet it somehow works in qemu with SeaBIOS, even if I use legacy IDE as the emulated HDD interface
<rickmasters>Googulator: I wish I could help more. I've got hardware to test with but I won't be at that location until late December.
<rickmasters>Googulator: The nature of the problem makes it difficult to determine where the fault lies.
<rickmasters>it's hard to imagine that such varied hardware has the same kind of problem - software seems more likely.
<rickmasters>Working towards the simplest test case possible seems like the way to go.
<rickmasters>Googulator: looking back at your previous comments it looks like you said that after changing stage1 memory writes to noop you were able to read consistently?
<GoogulatorMobile>rickmasters: in stage1 only
<GoogulatorMobile>The same stage1 code with a real stage2 triggers the bug in stage2
<GoogulatorMobile>I've also seen the bug trigger as early as sector 0x96 (srcfs starts at 0x93 with my current code)
<GoogulatorMobile>If I force stage1 to read garbage data, it reads well past that
<GoogulatorMobile>Actually not the same stage1 code, since what I did there was to not store read bytes at all
<rickmasters>right, so with a hacked up stage1 you can consistently read much further into the disk without lockups?
<GoogulatorMobile>Yes, stage1 with the stosb commented out reads all the way to the cylinder boundary
<GoogulatorMobile>Even the unmodified stage1 can read up to about 600KiB (where it runs out of low memory)
<GoogulatorMobile>Stage2 fails at a random point, no more than 512 sectors in (sometimes as early as the 4th sector read)
<GoogulatorMobile>& it fails regardless of which reading code I use
<GoogulatorMobile>CHS one matching stage1
<GoogulatorMobile>Or my new LBA one
<GoogulatorMobile>Both show the same behavior
<GoogulatorMobile>If I had to guess, we don't quite return to real mode properly
<GoogulatorMobile>Hmm...
<GoogulatorMobile>Do we disable Gate A20 when returning to real mode?
<GoogulatorMobile>I think not
<GoogulatorMobile>...flaky connection
<rickmasters>Googulator: something related to switching back and forth between 16-bit and 32-bit was my first guess as well.
<rickmasters>It's the biggest difference between stage1 and stage2.
<rickmasters>Just getting the switching to work at all was a big challenge. There isn't much code available as a guide so I had to piece it together with some guesswork.
<stikonas>fossy: I'll take a look at your PRs, but a bit later, maybe next week
<fossy>stikonas, thanks, no rush