IRC channel logs

2023-07-03.log

back to list of logs

<muurkha>a similar comment applies a fortiori with virtual memory mechanisms including segmentation and paging
<muurkha>yes, 2 MB machines existed in 01990. 16 MiB machines existed, in fact, they were just expensive. extrapolating, the US$40/MiB plateau lasted from about 01992 to about 01996, so in 01990 2 MiB would have been about US$80
<kerravon>muurkha - sorry, had to deal with my baby. as you know, the S/3X0 doesn't have a hardware stack. But I don't see a problem in practice. You just point R13 (e.g.) to a big buffer (stack) and let it grow up whenever you enter a function
<muurkha>I didn't know it didn't have a hardware stack; I've never written any 360/370/390/zSeries code
<muurkha>correction: in 01990 2 MiB would have been about US$160
<muurkha>how does interrupt handling work on the 370?
<muurkha>things like the ARM3 had shadow registers
<oriansj>if one wanted to radically change the world of technology, it would only take a half dozen names a few technical details
<muurkha>it would only take what?
<muurkha>kerravon: I did google how interrupt handling worked on the 370 but I keep finding things that don't answer the question
<kerravon>muurkha - there is PSW that is loaded when there is an I/O address, and that can point to any address. I think it is only a little bit different from the x86. Note that I have a S/3X0 version of PDOS too
<kerravon>and there is an emulator (Hercules) if you want to run your own mainframe
<kerravon>including running MVS from the early 1980s
<muurkha>kerravon: the PC gets stored in the PSW when there's an interrupt?
<kerravon>and we have modern modifications since then
<oriansj>Think commadore with Chuck Peddle, Jay Miner, Sophie Wilson, Richard Stallman and Jochen Liedtke; then you could have had the hurd on an ARM Amiga with a few tweaks to massively improve the resulting system. Release the software under the GPLv2 (or v3) (and the libraries under the LGPLv3) and get the free software community unified for free
<muurkha>they'd probably kill each other
<muurkha>who's Jay Miner?
<kerravon>the hardware will store the old PSW before loading the new one
<muurkha>kerravon: where, in like a shadow PSW? is there a shadow PC or something too?
<oriansj>muurkha: https://en.wikipedia.org/wiki/Jay_Miner the main engineer behind the amiga
<muurkha>aha!
<muurkha>I'm trying to figure out how it gets back to the code it was executing before the interrupt
<kerravon>old I/O PSW is at a location in memory, and will be updated before the new I/O PSW is loaded
<kerravon>there are 3 other pairs for different kinds of interrupts
<oriansj>muurkha: getting cooperation would be a miracle but an ARM amiga with an L3 microkernel and a gnu userspace would have removed the need for Linux and enabled a much cleaner system than x86 ever could be
<kerravon> https://www.ibm.com/docs/en/zos/2.3.0?topic=information-psa-mapping
<kerravon>x'78' has new psw for i/o
<kerravon>x'38' has old psw for i/o
<muurkha>I don't understand this page at all
<muurkha>it seems to be some kind of data dictionary
<muurkha>the PSW doesn't have the PC in it, does it?
<kerravon>yes it does
<muurkha>oh, okay, thanks!
<kerravon>it's low memory map
<muurkha>I thought the PSW was just like the carry flag and user/supervisor bit and stuff like that
<muurkha>oriansj: yeah, that would have been pretty great
<kerravon>here is the PSW:
<kerravon> https://en.wikipedia.org/wiki/Program_status_word#S/370_Extended_Architecture_(S/370-XA)
<kerravon>one of them, anyway
<kerravon>it has those things you mentioned too
<muurkha>right
<muurkha>a neat thing about early ARMs (up to at least ARM3) is that they packed those status flags into the same 32-bit register as the regular program counter
<muurkha>so not just interrupt handling (a huge priority in the ARM design) but function call and return would automatically save and restore the status flags
<kerravon>S/370 did that too and it caused huge issues
<kerravon>limiting the address space to 16 MiB
<kerravon>i'm not sure what was in those 8 bits - it may not have been status flags
<muurkha>yeah, I think that's why they stopped doing it in later ARMs, though they had learned from IBM's mistake
<muurkha>so it was 26 bits instead of 24 bits
<muurkha>you could have 64 MiB of code! not just 16
<kerravon>data uses up the address space too
<muurkha>yes, but you don't have to store it in the area the program counter can point at
<kerravon>unless you have a different address space, code and data combined will be limited to that 64 MiB
<muurkha>the address space was 32 bits! just the program counter was only 26
<kerravon>i see
<muurkha>also the instructions had to be 4-byte-aligned so the bottom two bits would have always been 0
<muurkha>so instead they used them to store supervisor/user/interrupt-handler/fast-interrupt-handler state
<kerravon>on the S/3X0, instructions are 2-byte aligned
<kerravon>and when IBM went to 64-bit they made use of that low bit to set 64-bit mode
<kerravon>in the BSM instruction
<muurkha>I didn't know that!
<kerravon>yeah - i thought that was very clever myself
<oriansj>muurkha: although knowing my luck, it would result in an Avro Arrow situation and tank the entire computer industry for decades
<muurkha>oriansj: heh, I was going to say something like that about kerravon's time-travel approach
<muurkha>maybe if he travels back to 01986 and writes PDOS/386 it will inspire someone to do something that keeps the 386 from becoming dominant
<oriansj>well if the Motorola 88000 was backwards compatible with the 68000; you could have avoided the entire PowerPC hardshift that killed the 68K systems (all except Apple)
<muurkha>yes! or maybe DEC would have succeeded in shipping PRISM, or AMD would have shipped a non-shitty compiler for the 29k which would have made it 68k-competitive
<muurkha>one thing that I guess we know in retrospect is that backward compatibility was a lot more important than people gave it credit for at the time
<muurkha>except IBM of course
<oriansj>or if Ken Olsen didn't reject the Apple 200K DEC Alpha order because of a refusual to do business with men who cheat on their spouses openly
<oriansj>(which drove Apple to partner with Motorola and IBM to make PowerPC a thing)
<muurkha>dunno, maybe not a bad idea to not do business with people who are dishonest or who take advantage of those who are vulnerable to them
<muurkha>if PRISM hadn't been canceled, arguably there wouldn't have been a Win32
<muurkha>certainly it would have looked very different
<muurkha>("cheat" implies we're not talking about a consensually non-monogamous relationship)
<oriansj>yeah, crazy the number of things that had to happen for x86 and Windows to occur like they did; makes one wonder who with the time machine betted on the combo?
<muurkha>well, I think they would have probably happened in some form
<muurkha>the executives at Microsoft's competition were mostly MBAs, except for a few like Novell and Digital Research and DEC
<oriansj>well Intel did have the i860 failure, which absorbed all the engineering talent which have made the x86 architecture much cleaner
<muurkha>Ashton-Tate and all those guys didn't really stand a chance
<oriansj>Microsoft DOS only had a chance because CP/M decided to snub IBM
<muurkha>Microsoft DOS only had a chance because billg's mom was on a board with IBM's president
<oriansj>well that was the reason for the sweetheart Microsoft Basic detail; the DOS deal was an unbelievable free extra
<muurkha>but billg was in an excellent position to win one way or another
<muurkha>he was a first-class hacker with upper-class financial resources and unmatchable bloodlust
<muurkha>and from the beginning his vision was a computer on every desk and in every home running Microsoft software
<muurkha>I should say upper-class financial and social resources
<oriansj>and a hard financial license lesson from Jack Tramiel which is a master class in itself
<muurkha>hm?
<oriansj>muurkha: unlimited Microsoft licenses for only $50K (for literally all Commodore computers sold)
<oriansj>TOTAL as a single lump payment
<muurkha>nice
<muurkha>pretty sure that wasn't the first time billg saw something like that happen though
<oriansj>it is the hardest financial f*cking microsoft *EVER* recieved; (and continued to recieve until the death of Commodore)
<muurkha> https://en.wikipedia.org/wiki/Motorola_68000_series really has quite the list: Macintosh, Atari ST, Sega Genesis, Amiga, Sun, NeXT, TI-89, TI-92, PalmPilot, and LaserWriter
<muurkha>I don't think it was an ongoing cost center for Microsoft, was it?
<oriansj>muurkha: cost them an estimated $20M in support costs and over $500M in revenue loses
<muurkha>oh, I didn't know about the support costs
<muurkha>I was thinking it was just a missed opportunity, like all the software they could have written but didn't
<muurkha>I was reading about ARM assembly last night. it seems like a pretty sweet instruction set
<muurkha>I'd written and read a tiny amount before but not really appreciated it
<oriansj>minus 3 bad design ideas, it is actually pretty great
<muurkha>apparently NXP still sells ColdFire: https://www.nxp.com/products/processors-and-microcontrollers/legacy-mpu-mcus/32-bit-coldfire-mcus-mpus/68k-processors-legacy/m680x0/low-cost-32-bit-microprocessor-including-hc000-hc001-ec000-and-sec000:MC68000
<oriansj>0) every instruction being conditional [literally 1/16 of all instructions are encoded as NOPs]
<muurkha>which are the 3 bad design ideas? flag bits in the PC, making every instruction conditional, and
<muurkha>jinx
<oriansj>and the optional shift all over the place
<oriansj>and minor disagreement on the little endian instruction ordering
<muurkha>I feel like the shifts and conditionals are pretty convenient, especially in an in-order implementation which necessarily needs at least one cycle per instruction
<muurkha>but maybe your objection is that they reduce code density?
<oriansj>my objection is not that there are ALU+Shift instructions; it is that all ALU instructions include shift in the datapath (They fixed that in AArch64)
<muurkha>is that bad because it reduces the clock speed?
<muurkha>oh apparently SGI also started out on the 68k: https://en.wikipedia.org/wiki/Silicon_Graphics#Motorola_680x0-based_systems
<muurkha>Atari actually died before they adopted the 68k, and I guess Amiga lived long enough to ship a 68040 but not a 68060 so I don't think PowerPC is what killed them
<muurkha>and TI and NeXT are alive and well today
<muurkha>and I guess Palm
<muurkha>Sun and Sega aren't but I don't think we can blame that on the PPC either
<muurkha>I feel like the ARM always offered pretty competitive clock speeds for its epoch?
<muurkha>ARM2 came out in 01986 (8 MHz), ARM3 in 01989 (25 MHz), ARM250 in 01992 (12 MHz), and ARM700 in 01993 (33 MHz)
<muurkha>contemporary to the ARM2 we have SPARC MB86900 (RISC, 17 MHz), and NEC V60 (CISC, 16 MHz), so maybe not?
<muurkha>but it's perhaps relevant that those were respectively 4× and 14× the size of the ARM2
<muurkha>contemporary to ARM3 we have 80486 (CISC, 25 MHz), i860 (VLIW, 40 MHz), and maybe POWER1 (RISC, a bit later, 30 MHz), also all much bigger than the ARM3
<muurkha>contemporary to ARM700 we have Alpha 21064 (RISC, 200 MHz), the 68060 (CISC, 50 MHz), P5 Pentium (CISC, 66 MHz), and POWER2 (RISC, 72 MHz)
<muurkha>so maybe at that point ARM's clock speeds were starting to lag? but that was also the time when ARM was refocusing on low-power mobile devices
<muurkha>by comparison, DEC's StrongARM SA-110 shipped in 01996 with a 233 MHz clock
<muurkha>a contemporary CISC part was the AMD K5 (100 MHz), and the Alpha αxp had by that time been pushed to the EV5 21164A at 400 MHz
<muurkha>so I don't see a super compelling argument that the barrel shifter in the datapath cost a lot of transistors or a big clock-speed penalty?
<muurkha>oriansj: perhaps I have misunderstood?
<oriansj>its penalty doesn't show up until superscalar processing; which would require multiple barrel shifters (one for each ALU) unless you want to add special logic to determine which instructions don't require any shifting.
<oriansj>and allocate only those instructions to the ALUs without a barrel shifter attached.
<oriansj>which as noticed by RISC-V developers does complicate those implementations but becomes a non-issue by the time you go full OoO
<oriansj>and even AArch64 walked that design detail back a good bit (a bit too far but RAM is cheap these days)
<muurkha>hmm, I guess that makes sense
<muurkha>superscalar also eliminates its big advantage
<oriansj>and 1/16 of all possible opcodes being a nop is insanely wasteful of the encoding space
<muurkha>that is, the big advantage of the barrel shifter in the ALU data path is that you don't have to waste an entire extra clock cycle on selecting whether your index register is getting shifted by 0, 1, 2, 3, or 4 bits
<muurkha>but with superscalar, you wouldn't have to waste that extra clock cycle anyway
<muurkha>yeah, ARM without Thumb isn't that great for code density compared to things like amd64 and RVC
<muurkha>but wasting 1/16 of possible opcodes is not that important
<muurkha>it costs you 0.09 bits per instruction (out of 32)
<muurkha>that is, instead of 4.2949673e+09 useful instructions, you have 4.0265318e+09, which is equivalent to 31.91 bits
<muurkha>and I think ARM2 is competitive in code density with its contemporaries like the SPARC and even the 386
<oriansj>well yes (x86 burned throught their encoding space and started having to make more bloated instructions because of it)
<muurkha>i386 isn't that bad! Qfitzah is going to fit into 1KiB if I ever finish it
<muurkha>and that's an interpreter for a high-level language with dynamic method dispatch, flexible data containers, and pattern-matching
<muurkha>ARM could be better in that sense with less options for the barrel shifter and without the pervasive conditionalization
<oriansj>it isn't bad if you limit yourself to a clean subset of i386; it could have been much cleaner and much denser with very small tweaks
<muurkha>I mean bad in terms of code density!
<muurkha>limiting yourself to a clean subset of i386 is not the way to improve code density >;)
<muurkha>I haven't seen *anything* that's *much* denser than i386 though! Thumb2 and RVC are a little better. what are you thinking of?
<oriansj>ah, fair point EAX heavily code paths can definitely be quite dense when you abuse the stack
<oriansj>muurkha: VAX or PDP-11
<muurkha>I don't think VAX or PDP-11 have better code density than i386 or even as good
<muurkha>other ways to squish i386 code include stack abuse, LODSD abuse, unaligned access for string searches, xchg, indirecting procedure calls through a procedure table pointed to by a register, lea arithmetic, xor/inc to load a constant 1, doing tail calls by falling off the end of one subroutine into the beginning of another...
<muurkha>s/other //
<muurkha>rep movs, repne cmps, repne movs, operating with immediate operands on short sub-registers like %ax or %al or %dil instead of the whole register...
<muurkha>omitting cld and initially zeroing registers when you think you can take it for granted :)
<muurkha>oh, strategically positioning an unconditional jump to a faraway place that you have several conditional jumps to from more than 128 bytes away, so that your conditional jumps can jump to the nearby unconditional jump (2 bytes per conditional jump) instead of directly to the faraway destination (6 bytes)
<oriansj>indeed; most common assembly sequences I can imagine writing take 16-24bytes on x86 but only 10-14bytes on VAX but you are probably right in that the compiled code probably isn't much more efficient (althoug the 3OP instructions are the big saver in reducing the number of instructions performed.
<muurkha>maybe handwritten VAX code is different? I haven't ever written any
<oriansj>the 3 op instructions with 3 memory addresses would take 2 load instructions, an ALU instruction and a store instruction (and depending on the ALU instruction you *might* be able to mix it with the load or the store to save an instruction)
<muurkha>on i386?
<muurkha>you can pretty much always mix ALU instructions with a load or a store
<oriansj>VAX could do 3 op instructions with 3 memory addresses; i386 can only do 2 op instructions at best cause and only 1 memory address max per instruction.
<muurkha>but I very rarely write code that does something like sum two vectors to form a third vector
<muurkha>almost invariably I have most of my operands already in registers except when I'm doing things like chasing pointer chains
<oriansj>3 memory addresses in an instruction really doesn't benefit cache performance and usually just makes OoO much more messy
<muurkha>apparently it also makes restarting the instruction after a page fault difficult ;)
<muurkha>> By any practical measure, the VAX family of computers is one of the most successful series of computer systems ever developed. As of this writing, over 100,000 machines have been installed, ranging in size from the MicroVAX II to the VAX 8800—a number that even surpasses that for the pioneering IBM SYSTEM 360/370 series.
<muurkha>Imagine writing this in 01987 when 3.5 million Commodore 64s had been sold by mid-01986
<muurkha>and the Apple ][ was selling 1 million a year in 01983
<muurkha>A year after Sun went public in 01986
<muurkha>and had already come out with the Sun-3, and would ship its first SPARCs that year
<muurkha> https://www.abortretry.fail/p/the-network-is-the-computer says Sun sold 500 million dollars' worth of hardware to the NSA in 01986, which would be on the order of 25000 machines
<muurkha>the facepalm VAX quote is from http://bitsavers.trailing-edge.com/pdf/dec/vax/archSpec/EY-3459E-DP_VAX_Architecture_Reference_Manual_1987.pdf
<muurkha>> we expect VAX computers to remain the backbone of Digital's product offerings for many years into the future.
<muurkha>then they shipped the DECstation 3100 two years later in 01989 and the αxp three years after that in 01992
<muurkha>possibly DEC's ostrich problems went deeper than just canceling PRISM
<muurkha>anyway what I actually wanted to say about the VAX is that it had 16 32-bit general-purpose registers, so most code ought to be able to avoid referencing memory even once per instruction and will run faster if it does avoid it; this is a little harder on the i386 but also pretty easy on amd64 where you also have 16 general-purpose 32-bit registers
<oriansj>tfgbvhrdjeugnufhfttgiigglgibkubklkcdhbnvgidj
<oriansj>sorry about that
<oriansj>but yes, AMD64 is in many ways what i386 should have been but sadly wasn't
<muurkha>heh, no worries
<muurkha>I have a correction: earlier I said the ARM2 had a 32-bit address bus, but apparently it only had a 26-bit address bus, so you couldn't physically hook up more than 64 MiB of RAM without bank-switching. I think the ARM3 did have a 32-bit address bus
<muurkha>according to https://youtu.be/KKTa54UikgE at 14'
<oriansj>this says arm6 https://en.wikipedia.org/wiki/ARM_architecture_family
<muurkha>it seems to say that ARM3 supported full 32-bit memory
<oriansj>well a 26bit PC doesn't prevent one from using 32bit registers to get data from the 4GB of memory but the programs themselves are limited to 26bit which is enough for pretty big programs in assembly
<muurkha>in this case the problem was that the physical chip only had 26 pins dedicated to the address bus
<muurkha>the actual CPU design inside the ARM3 was the same as in the ARM2, just inside a much larger chip with enough transistor budget for caches
<fossy>i've been working on https://github.com/fosslinux/live-bootstrap/issues/306 for a little bit. after a bit of hacking around in scheme and adding in a few new guile routines into mes' library, i got it regenerating the files. however upon integrating it into live-bootstrap i noticed an incredibly large problem. the regenerations uses psyntax.
<fossy>so i began attempting to port guile-psyntax-bootstrap to mes - got a fairr way along, with a very large amount of just hacking and removing things, but i have reached a huge block that i don't know how to get around - mes is just gobbling up memory and i can't tell to what or why
<muurkha>aw :(
<fossy> http://0x0.st/H1qj.tar.gz -- this is my progress (on the psyntax bootstrap) -- if anyone more experienced with mes than me could have a look at what's going on, i'd appreciate it a lot
<stikonas>janneke: do you remember what exactly is the purpose of hex2:immediate8 function?
<stikonas>it's broken on 64-bit mescc (and there is a comment saying that)
<stikonas>but I'm still trying to understand why it is so complicated...
<janneke>stikonas: to write an 8-byte immediate number
<stikonas>hmm, somehow in M2-Planet we mostly use 4-byte immediates with x86_64 opcodes
<janneke>but it's a terrible function, indeed
<janneke>yeah
<stikonas>yeah, it spits out something like mov____$i64,%rdi %0x0 %0x-1
<janneke>so it writes two times a 4-byte immediate i guess
<stikonas>not sure if it's the same move opcode as in M2-Planet..., probably not...
<janneke>yes
<stikonas>yes, it's different
<stikonas>M2-Planet uses mov_rdi, 48C7C7 which expects 32-bit constant only
<stikonas>which is a limitation I guess...
<stikonas>and mescc has mov____$i64,%rdi 48bf
<janneke>yeah, i guess the proper fix should be in mescc-tools
<stikonas>hmm, but then we would have to add 8 bit contants there...
<stikonas>(8-byte)
<stikonas>oriansj might not want that
<stikonas>anyway, I now understand the difference between mescc and M2
<janneke>yeah, and until now i didn't really care that much as everything was 32bits...
<muurkha>stikonas: is there an explanation that would have enabled you to understand that difference much earlier?
<janneke>i.e., x86_64 was known to be broken anyway...
<muurkha>I'm thinking that if you can imagine one, this would be a great time to write it down and commit it to Git
<muurkha>or the Wiki
<muurkha>because probably many, many people will benefit from reading it
<stikonas>muurkha: well, I just didn't dig too deep into source, somehow didn't expect that mescc does it differently
<stikonas>janneke: yeah, I know that x86_64 is broken... though it's not too broken
<stikonas>it can already self-host itself
<janneke>true, it could/should work really...but it just hasn't produced a viable tcc
<janneke>(broken is a bit too harsh)
<stikonas>yeah, it crashes for me in set_idnum
<stikonas>(which does isidnum_table[c - CH_EOF] = val;)
<stikonas>and that's where I noticed that something strange is going on with that immediate8 function
<janneke>right
<stikonas>it compiles into movabs rdi, 0xffffffff00000000 (in intel syntax)
<stikonas>(and original instruction was mov____$i64,%rdi %0x0 %0x-1)
<stikonas>and that %0x0 seems suspicious
<janneke>yes, that looks weird
<stikonas>which I guess happens because #x100000000 is 0 on mes
<janneke>(i'm unsure about how/when sign extension would work)
<janneke>ah, that could be
<oriansj>well, the only reason we don't have 64bit constants in M1/hex2 is there has not been an architecture that required them to bootstrap to the next level (same for 128/256/512/etc bit constants)
<janneke>that makes sense
<stikonas>well, in principle we should be able to spit correct instruction in mescc too
<stikonas>(I'm not even sure if tcc 0.9.26 even has any big constants...)
<oriansj>48C7C7 %-1 makes the same register value as 48BF %-1 %-1 but is 3 bytes shorter
<stikonas>indeed
<stikonas>so maybe we should fix up mescc to use 48C7C7
<stikonas>though this would break if we really use > 32-bit immediates
<stikonas>(which most likely we are not)
<oriansj>well if janneke wants to add some basic immediate optimizations
<stikonas>hmm, so far I found stuff like o(0xe5894855); in tcc
<stikonas>so it has 31-st bit set
<oriansj>then a check to see if the immediate fits in 8bits, 16bits, and 32bits can result in denser binaries and fallback logic to the current code probably will cover the ugly cases
<stikonas>except that current code is broken due to mes bug
<stikonas>(probably work on guile)
<janneke>yes, best to get it to work on guile first
<stikonas>argh, but tcc code is so ugly
<stikonas>who calls function "o"
<stikonas>I guess it's output
<stikonas>but still..
<stikonas>janneke: so I think I'm getting non-segfaulting tcc on amd64 if I build with guile
<janneke>stikonas: oh, that's _amazing_
<janneke>well done!
<stikonas>I've only tried on a very small c file
<stikonas>(just return 42;)
<janneke>well still, that's a start
<stikonas>yeah, disassembly seems alright
<stikonas> https://paste.debian.net/1284824/
<stikonas>janneke: so I think we only need to deal/workaround mes bugs
<stikonas>hmm, or did I build 32-bit tcc...
<stikonas>mov $0x2a,%eax seems a bit suspicious
<janneke>there are some 32bit instructions used in 64 bit too, but not all that much
<stikonas>hmm, no, mes-tcc-guile: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, no section header
<stikonas>that seems like 64-bit
<stikonas>maybe tcc uses them more
<stikonas>anyway, that immediate8 on guile assembles to
<janneke>hmm, push %ebp => 32bit i guess
<stikonas>mov____$i64,%rdi %0xffffffff %0x-1
<stikonas>oh, I didn't set TCC_TARGET_ARCH correctly
<stikonas>I've set TCC_TARGET_ARCH 1
<stikonas>that's wrong
<stikonas>let me retry...
<janneke>ah ;)
<janneke>cross compiling is waaay too easy with these bootstrap tools
<stikonas>yeah...
<stikonas>and my environment is a bit messy right now
<stikonas>as I've injected guile into my bootstrap chroot
<stikonas>but probably missed something and guile has trouble parsing command line...
<stikonas>so it was compiling output to file called -S.s
<janneke>oh :)
<stikonas>still, I think the main issue right now is that immediate8 function
<janneke>yeah, that seems quite plausible
<stikonas>ok, the diff between mes build and guile build https://paste.debian.net/1284827/
<stikonas>lots of noise due to wrong file name though
<stikonas>and I think remaining stuff is real diff from immediate8, e.g. %0xfffffffc instead of %0x0
<janneke>yeah, sed'ing the -S.s tcc.s bit first would help
<stikonas>janneke: ok, so there are more issues with amd64 build...
<stikonas>it looks like final binary that I've got now loops...
<stikonas>(with the memory leak)
<stikonas>I guess something is wrong in x86_64 backend then
<stikonas>anyway, probably worth to first sort out immediate8
<janneke>yes, it's often better not to guess beyond the first known bug
<stikonas>indeed and immediate8 happens earlier
<stikonas>with it solved, we should at least be able to run "tcc -v" or tcc --help"
<stikonas>ok, after sedding it's indeed just immediate8 diff between mes and guile
<janneke>good
<stikonas>janneke: hmm, so I think mes just doesn't support anything more than 32-bits
<stikonas>and stuff with highest bit set is interpreted as negative number
<stikonas>would you be happy to restrict immediate8 to 32-bits then?
<stikonas>hmm, though on the other hand that might cause some issues with 32-bit constants in tcc...
<janneke>hmm
<janneke>ACTION vas going to say...
<janneke>i think not supporting 8-byte immediates (and bombing out?) is much better than silently doing the wrong thing
<stikonas>hmm, I was thinking of just putting either 0 or #xffffffff in the first 4 bytes
<janneke>i wonder how broken mes is, would a 64bit gcc-built mes possibly do the right thing?
<stikonas>(depending on the size of o)
<stikonas>janneke: hmm, good question
<stikonas>that's worth trying
<stikonas>no, that's also broken
<janneke>otoh, if that "just putting either 0 or #xffffffff" works, that would be OK too?
<stikonas>mes> #100000000
<stikonas>[sexp=0]
<janneke>OK
<stikonas>that I'm not yet sure...
<stikonas>because tcc has some 32-bit expressions
<stikonas>stuff like o(0xf02444dd)
<stikonas>and it's hard to tell whether interpreting this as negative number
<stikonas>would break things
<janneke>yea
<stikonas>though perhaps we could patch it out on tcc side
<stikonas>I suspect something like
<stikonas>o(0x44dd); o(0xf024); would do the same thing
<janneke>right
<stikonas>anyway, at least I have a better understanding of problems with bootstrapping x86_64 now
<stikonas>though I'm surprised that we can bootstrap mes-m2 -> mes with that immediate8 issue
<janneke>indeed...
<stikonas>hmm, I'm confused where mes 32-bit variable limit comes from...
<stikonas>it seems R1 global struct has the correct value...
<stikonas>let's instead workaround mescc...
<janneke>R1 has it right?
<janneke>so what about this, in src/display.c:
<janneke> else if (t == TNUMBER)
<janneke> {
<janneke> fdputs (itoa (x->value), fd);
<janneke> }
<janneke>together with...
<janneke>char *
<janneke>itoa (int x)
<janneke>stikonas: did you test using the REPL? (i.e., printing/display?)
<janneke>i guess that itoa here is wrong and truncates the value
<stikonas>hmm, let me check
<stikonas>it might be here
<stikonas>I didn't check display...
<janneke>display.c should use ltoa
<stikonas>janneke: yep
<stikonas>I think that fixes it
<janneke>(and seeing this, there may be other int/long problems)
<stikonas>mes> #x100000000
<stikonas>[sexp=4294967296]
<stikonas>though I have a lot of other int->long replacements
<stikonas>let me test it in isolation
<stikonas>ok, that alone fixes
<stikonas>the following commmand #x100000000
<stikonas>janneke: can you apply that patch to branches then?
<stikonas>I guess wip and wip-riscv
<stikonas>well, wip-x86_64...
<janneke>the itoa=>ltoa? sure
<stikonas>yes
<stikonas>so that we have
<stikonas>fdputs (ltoa (x->value), fd);
<stikonas>perhaps this will need further testing...
<stikonas>I'll try to do it later today
<stikonas>since there is a small chance that something breaks
<stikonas>still, it seems the right thing to do here
<janneke>stikonas: pushed (i also changed the other usages in display.c, you never know, any value could be 64bit i guess)
<stikonas>thanks
<janneke>u2!
<stikonas>will retest now, though will take some time
<stikonas>I need to rebase my scripts, and I had hand-patched some extra tcc stuff
<stikonas>janneke: where did you push?
<stikonas>savannah or gitlab?
<stikonas>I can't see in either of them
<janneke>stikonas: both, wip-x86_64, wip-riscv
<janneke>e52f29f6 * core: Avoid displaying truncated 64bit values.
<stikonas>ok, I can see it now
<stikonas>perhaps browser cache...
<janneke>we're not using that many INTs in src/
<janneke>(...but possibly in lib/ somewhere..., we'll see...)
<stikonas>indeed. I'll check how tccpp.c is assembled now
<stikonas>in particular that immediate8 stuff...
<stikonas>janneke: so the bad news is that just fixing display did not fix compile output of hex2:immediate8
<stikonas>there must be something else...
<janneke>crap
<stikonas>but I can reproduce it outside mescc
<stikonas>so that will help
<stikonas>I just grabbed hex? mesc? dec->hex and hex2:immediate8 functions
<stikonas>and I could reproduce it
<stikonas>so with "mes> (hex2:immediate8 -1)" I get
<stikonas>$1 = "%0x0 %0x-1"
<stikonas>oh, perhaps it's the workaround if mesc? 0 that messes things up
<stikonas>which was added to avoid division by 0
<janneke>oh my, hidden in plain sight
<stikonas>indeed
<stikonas>now I've got $0 = "%0xffffffff %0x-1"
<stikonas>and I think we can completely remove mesc? function
<janneke>that would be nice, and a great fix!
<stikonas>let me prepare the patch
<stikonas>janneke: I've pushed it here https://git.stikonas.eu/andrius/mes/src/branch/wip-riscv
<stikonas>(though still testing)
<stikonas>it takes about 12 minutes to rebuild tcc
<stikonas>janneke: ok, so no more crashing when I run "mes-tcc -vv"
<stikonas>that's a good sign
<janneke>that's great!
<stikonas>anything more complicated, e.g. mes-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/${MES_ARCH} -o crt1.o lib/linux/${MES_ARCH}-mes-gcc/crt1 causes infinite loop/memory leak
<stikonas>which is what I observed with guile earlier
<stikonas>still, we solved quite a few issues today
<janneke>stikonas, yeah; pretty nice
<janneke>being on-par with guile isn't bad
<stikonas>and by the way, this is the patch I used for tinycc (but this is not for merging, at least not yet)
<stikonas> https://git.stikonas.eu/andrius/tinycc/commit/3eadcf95d88b1673a47df1ac250fa8614b32fa8e
<janneke>OK
<stikonas>(abort I think would work, I just didn't bother changing build scripts...)
<stikonas>but that typedef enum X86_64_Mode was causing some real build issues
<oriansj>janneke: you would not want to use mov $0x2a,%eax as it would not set the upper 32bits and if the previous register value was wrong, you would end up with the wrong value in eax and elsewhere
<oriansj>you need to use rax which then would be the sign extend or zero extend form which you actually want
<oriansj>(or the previous register value had any of the upper 32bits set or you wanted to populate a negative number, etc)
<muurkha>oriansj: which instructions exactly are they that clear the upper 32 bits when they operate on the lower 32 bits?
<muurkha>I know xor is one, so xor %eax, %eax is equivalent to xor %rax, %rax (which is a byte longer)
<muurkha>but it sounds like you're saying that mov-immediate is not one
<muurkha>that seems to be incorrect though
<muurkha>I just stepped through a test program:
<muurkha>Temporary breakpoint 2, main () at test.s:3
<muurkha>3 mov $0xdeadbeeffee1dead, %rax
<muurkha>(gdb) p/x $rax
<muurkha>$1 = 0x401bf5
<muurkha>(gdb) si
<muurkha>4 mov $2, %eax
<muurkha>(gdb) display/x $rax
<muurkha>1: /x $rax = 0xdeadbeeffee1dead
<muurkha>(gdb) si
<muurkha>main () at test.s:5
<muurkha>5 ret
<muurkha>1: /x $rax = 0x2
<oriansj>muurkha: the proper instructions would be movsx and movzx to sign or zero extend respectively
<stikonas>we only saw eax there because I accidentally made a cross-compiler...
<stikonas>our early tools can cross-compile really easily
<muurkha>oriansj: gas tells me movzx can't take an immediate argument: test.s:4: Error: unsupported syntax for `movzx'
<muurkha>so I think the proper instruction is mov, for better or worse
<oriansj>muurkha: depends on if you are using at&t x86 assembly syntax or intel assembly syntax I guess
<muurkha>is it? did you get an intel assembly syntax assembler to successfully assemble a movzx instruction with an immediate operand?
<muurkha> https://www.felixcloutier.com/x86/movzx doesn't list an immediate operand as a possibility
<oriansj>you are probably right
<muurkha>are there any amd64 instructions that affect only the lower 32 bits of a 64-bit register?
<oriansj>well all eax instructions should only touch the bottom 32bits, all ax instructions should only touch the bottom 16bits and ah/al instructions only touch 8bits(high and low half of the 16bit word respectively)
<muurkha>is there *at least one* eax instruction that behaves as you say? because above I've demonstrated that mov from immediate does not, and I know from previously that xor does not; they both zero the upper 32 bits
<muurkha>or by "should" do you mean that AMD shouldn't have defined them to do that?
<muurkha>if there is at least one such instruction, can you provide a runnable example so I can reproduce that behavior?