IRC channel logs

2023-03-29.log

back to list of logs

<aggi>seems i could narrow down one critical bug with linux-2.x/tcc-compiler/assembler
<aggi>it is this macro called early in init/main.c: #define __sti() __asm__ __volatile__("sti": : :"memory")
<avih>aggi: did you try a more recent tcc? or even mob?
<avih>if it works, maybe you can backport the tcc fix
<aggi>yes avih
<aggi>i tried different kernel versions 2.4.37.11 (seyko2), and 2.4.26 (Bellard's version for tccboot); both crash at the same spot, when sti is called, set interrupt flag, bam..
<avih>that's not what i asked. i asked about different tcc versions, not different kernel versions
<aggi>2.4.26 is interesting, because that's the one Bellard chose, and i grabbed his tccboot.iso, which does compile/boot (although i didn't check each individual file yet, if these really are identical)
<aggi>avih: moment... yes, tcc-version too differs
<aggi>i am on tcc mob branch, 95aab7d68723b9aed4d8fc36e4c4e6d5042ae56e, Sat Mar 18 14:02:52 2023 +0100
<aggi>although i wouldn't bet on it's different tcc-compiler version at fault necessarily
<aggi>avih: because, my recent tcc-version, does compile and link the kernel image, which does boot, and crashes at the mentioned spot
<avih>right
<avih>(that commit is a fix to an issue which i reported few days earlier)
<avih>aggi: also, if you identified a specific asm/whatever which you think tcc handles incorrectly, you can report it to the ML
<aggi>avih: i have no clue yet, why an almost trivial asm call crashes the kernel, in this case, setting the interrupt flag
<aggi>and i think, it's not necessarily an asm related issue (whickly changed the macro to __asm__ __volatile__("sti")... same fault)
<aggi>however, i must rule out both, specific kernel version and/or specific tcc-version at fault, hence i'll try bellards version which he had used initially with his own tccboot.iso, which does compile and boot without a crash
<aggi>however, he did use tccboot itself, while i boot a tcc-pre-compiled vmlinuz
<aggi>really nasty
<aggi>i may give fiwix a try too... and see what happens
<avih>anyway, i'm off. night and gl
<aggi>i take good luck as an offense in this case
<aggi>because, luck is unrelated to what the problem is; it's not a lottery ticket to bet on
<aggi>anyway, gn
<aggi>another side-effect, since i replaced Kbuild with custom scripting, i can hook into both gcc-4.7 and tcc identically... to rule out any problem with this
<aggi>and replacing CC=tcc with CC=i686-tcc does work; the kernel doesn't crash then; and there's some weird symbol-resolving/linking issue involved too
<aggi>which is tcc complains about duplicated symbols, and gcc-4.7 doesn't link correctly with -O0, it does need -O2 for an unknown reasons, otherwise symbols aren't found while linking... strange
<aggi>anyhow
<aggi>i'll do something else next... copying over all kernel-source files from Bellard's tccboot.iso into the kernel-tree of the exact same version and tccboot-patches... and then diffing this, if i missed something
<aggi>and i think, there's some iregularity involved with the presence of APIC support or none, if and when kernel crashes with "unable to handle kernel paging request"...
<aggi>it's strange nonetheless, an exact same kconfig/build.sh scripting succeeds with gcc, and fails with tcc; while tcc reports zero warnings/problems while compiling/linking
<aggi>hence i suspect, it may be some linking/symbol-resolving issue with tcc, to link the kernel
<aggi>i remember, susematz, who hacked linux-4.6 for compilation with tcc, he mentioned tcc missed some linking support...
<aggi>however i could compile and link a vmlinux image with CC=i386-tcc, which does bootload and spawn init/main as it should, just like the variant with gcc
<aggi>the former crashes, the latter doesn't
<aggi>*and replacing CC=i386-tcc with CC=i686-gcc does work... typo
<aggi>anyway
<aggi>allright, that's interesting too... when enabled APIC support, the asm sti call doesn't crash the kernel with CC=i386-tcc, but it's then stuck again in "Calibrating dealy loop..."
<aggi>it's merely a symptom, bellards tccboot.iso doesn't use apic and doesn't crash, and with CC=gcc no crash or block either
<oriansj>well I think we can safely say the pinnacle of microcode'd processors have finally been reached: http://mynor.org/my4th
<oriansj>a single XOR gate for an ALU and everything else effectively in the EEPROM
<oriansj>*correction NOR gate*
<aggi>i think it's some issue related to interrupts...
<aggi>and, diffed Bellard's kernel-sources with his tccboot patch against the sources i compiled ... no difference
<aggi>except, kstart.S... i'll give that a try.. it's the only difference
<aggi>and i suspect, the tccboot sequence, which compiles kernel JIT, behaves differently
<aggi>i'm not interested in the JIT part, and it could be, it's necessary to use that, which i don't want to
<fossy>aggi, are you using qemu?
<aggi>fossy: no
<aggi>but i can confirm, Bellards original tccboot.iso which i diff and bisect against, does boot, on real hardware
<fossy>what are you testing your kernel on
<aggi>and that's another suspicion i have... include/asm-i386/hw_irq.h which too is patched by Bellard
<aggi>fossy: an old laptop, which is fine, and unrelated to the problem, because Bellards original tccboot.iso does work
<fossy>yep, i see
<aggi>i am not sure, why the kernel i compiled crashes, although kconf/sources/patches are identical
<fossy>i was going to say diffing memory dumps from qemu with crash/nocrash versions may help
<aggi>one difference remaining... the boot sequence itself (i am not using tccboot), and a different tcc compiler version
<aggi>fossy: difficult, because then i had to test tccboot sequence itself, which i wanted to avoid for now, because it complicates and increases the amount of test cases further
<fossy>fair enough
<aggi>and, i narrowed down the problem, to an IRQ handling related issue, with high certainty
<aggi>i am not sure how a trivial sti asm call could trigger a crash
<aggi>which is done in init/main: sti(); bam...
<aggi>if i uncomment this part, again, an irq related issue in __delay, calibration loop
<aggi>that's rather consistent, this symptom... question remains why
<aggi>because, with the exact same build-system (removed all kbuild), gcc-compiled vmlinuz boots, and does not crash
<aggi>could be an internal macro or something
<aggi>too i tried this: compiling with tcc, and linking with gcc... crashes; so something goes wrong in the compilation phase, surrounding IRQ/asm, i suspect
<aggi>what... remains a mystery
<aggi>because, i tried the exact same sources and konfig that bellard had used, with the only differences remaining, the tccbootloader compiling JIT, and the tcc-compiler version, mine is almost 20 years newer
<aggi>ok, i need to test an older version of tcc-compiler, the one bellard used 20 years ago
<aggi>another problem, the tccboot.iso Bellard provided is from year 2004 - and the compiler version from Bellard, that he used for tccboot, there isn't any source-bundle.tar.gz which matches this version
<aggi>so, i have to guess, which one in the git repository is the one
<aggi>amazing, the closest release tag release_0_9_22 was made _after_ he had released tccboot.iso, and in between the preceeding release tag there is almost one year of potential commits, year 2004
<aggi>there's two ways to find out: either throwing a _new_ tcc-compiler version into tccboot.iso for JIT, or pre-compiling vmlinux with the old-version
<aggi>it's probably easier, to throw in the new tcc-compiler version onto his old tccboot.iso, and then see what happens...
<aggi>in either case, if tccboot fails then i have to bisect tcc-compiler-versions, and if the new tcc-compiler version passes on the old tccboot.iso then i have to diff tccboot.iso JIT sequence against the pre-compiled vmlinux image
<aggi>which is both, complicated
<muurkha>this is exciting, aggi!
<aggi>just decided, it's easier to test an older tcc-compiler with vmlinux pre-compilation, which i got stabilzed scripting for already
<aggi>i'll just pick the release tag release_0_9_22 and another commit candidate matching the date of tccboot.iso closely
<aggi>allright, digged out an old version of tcc (v0.9.21/v0.9.22/...) testing
<aggi>and one interesting blocker popped up already, the i8259.c source file errors when compiling with an old version... which is... the interrupt controller
<aggi>although, a most recent tcc compiler version didn't complain, the older ones can't digest a macro inside... so...
<aggi>and this is what i could track down problems with recently... interrupt handling
<aggi>the old compilers can't digest some macro, and the new tcc-version does process the macro, but the generated interrupt handling code may be flawed...
<aggi>so, i may try this next... preprocess this macro with gcc... dump it inside i8259.c... and hope for this doesn't crash anymore
<aggi>another pecularity with tinycc.git repository, I picked the last commit before/when tccboot.iso was released by Bellard, year 2004
<aggi>however, this commit does mis a change, which was added to the repository one day _after_ tccboot.iso was released
<aggi>so, it's a lucky guess, which commit to pick... and even tcc-0.9.22 tagged one year _after_ tccboot.iso was stamped, didn't digest i8259.c source inside linux-2.4.x
<aggi>really weird... so... what's going on with this?
<aggi>it is several questions: why/how tcc-compiler versions aren't identified or don't match with what's available on tccboot.iso?
<aggi>because, although it said 0.9.21 which is the version label on tccboot.iso - this compiler version fails - even 0.9.22 does with i8259.c source file
<aggi>could be it is some header and/or #define mismatch
<aggi>and next, what's wrong with the tcc-preprocessor and/or tcc-assembler;
<aggi>i'll try something else next... compile/link/assemble i8259.c with i686-gcc, and the rest with tcc, and then see if this errors at init_IRQ