IRC channel logs

2022-08-24.log

back to list of logs

<oriansj>theruran: assuming you are talking about: https://www.cl.cam.ac.uk/~mom22/jitawa/ well there is no license information so completely unusable
<oriansj>all software *MUST* be explicitly licensed under an FSF approved license to be useful in bootstrapping
<theruran>looks like a MIT or BSD license. source files are here: https://github.com/HOL-Theorem-Prover/HOL/blob/master/COPYRIGHT
<theruran>but you're right. the distributed C code is unlicensed, it seems
<theruran>oh, no, wrapper.c is there too
<theruran> https://github.com/HOL-Theorem-Prover/HOL/tree/master/examples/theorem-prover/lisp-runtime
<oriansj>it also seems to reference a verified_code.s https://github.com/HOL-Theorem-Prover/HOL/tree/master/examples/theorem-prover/lisp-runtime/bin
<oriansj>ok, it seems to be doing: cd ../implementation && Holmake && cd ../bin
<oriansj>so it depends upon a generate file which is just a hex dump of a binary blob built using Holmake
<oriansj>So if getting a lisp was a problem, it would provide an option *after* we successfully bootstrap the HOL-Theorem-Prover using A Standard ML compiler
<oriansj>The proof of correctness is neat
<stikonas>sometimes with those no-license projects I had success writing an author
<stikonas>anyway, bootstrap and proof of correctness are somewhat orthogonal
<stikonas>you have to start with non checked theorem-proven code
<stikonas>build theorem prover
<stikonas>and only then you can prove things with it
<stikonas>and NOT theorem prover itself
<muurkha>proofs of correctness could be very useful for bootstrapping
<muurkha>the metamath zero approach takes your theorem prover out of the trusted computing base
<stikonas>yes, but none of the theorem provers can prove their own correctness
<muurkha>that is, the theorem prover doesn't need to be correct
<muurkha>because what it does is that it emits a proof witness
<stikonas>well, that is true
<muurkha>the proof witness checker does need to be correct
<stikonas>anyway, signing off for the night
<muurkha>but there are something like two dozen proof witness checkers for metamath zero written in different languages
<muurkha>and some of them are very small and simple
<muurkha>(also the underlying logical theory needs to be consistent, so that you can't "prove" things that are false)
<oriansj>muurkha: well I would certainly love to see more proofs of correctness in general as we are long past the era where such omissions should be thought of as either strange or of a very specialized reason.
<theruran>Milawa is a ACL2-like theorem prover the proves itself minus the ~2000 lines for 'socially verified' code
<muurkha>I wish that were true!
<theruran>I was thinking the Jitawa verified_code.s could be translated by hand to M1 assembly
<theruran>but anyway, it doesn't have setq so I doubt its practical usefulness here.
<theruran>Milawa self-verification takes over 8 GB of memory
<stikonas[m]>theruran: how does that work given Gödel's incompleteness theorem?
<muurkha>GIT shows that there are some theorems, and thus some programs that can't be proven correct, in any given consistent formal system
<muurkha>it doesn't imply that no theorems can be proven or that no programs can be proven correct
<theruran>stikonas[m]: literally, the ~2000 line of socially-verified code part
<stikonas[m]>But git 2 says no system can prove its own consistency
<theruran>'completely' right?
<theruran>you may have to read the thesis to dispel your doubts
<stikonas[m]>In any case isn't self verification same as self compilation? Compiler binary might be backdoored and emit malware even if source is good
<stikonas[m]>Doesn't the same apply to theorem provers?
<stikonas[m]>Maybe I'm wrong since I'm not expert on theorem provers...
<oriansj>theruran: well converting the verified_code.s into M1 would probably result in a 10Kline project which puts it in about double the size of cc_* so logically about 4x harder to audit
<theruran>hmm, well it produces verification conditions in over 8 stages, with lower-levels verifying the level above each
<theruran>oriansj: but it's just bytes? like hex1
<theruran>~8000 lines
<oriansj>theruran: well proper M1 wouldn't be just bytes, it would be Assembly instructions and comments
<theruran>I know that
<theruran>translate to x86_64 instructions, then convert to M1
<oriansj>and assuming the current code is only 1 instruction per line; just basic whitespacing for comments would result in an additional 2K lines
<theruran>OK - I didn't realize that's much bigger than cc_*
<oriansj>yeah: 5408 19234 210936 cc_amd64.M1
<theruran>what are the column names here
<oriansj>lines words characters
<oriansj>as standard for wc
<oriansj>8238 43581 261524 verified_code.s
<theruran>I see
<theruran>I always just wc -l
<muurkha>self-verification is not the same as self-compilation because the theorem prover can't emit malware
<oriansj>so, yeah just basic comments would easily get verified_code.s to 10K lines and if there is more instructions per line, it could easily go much higher.
<muurkha>I mean, it can claim that a false proof of correctness is true, or vice versa
<oriansj>muurkha: assuming the output is only verified or not verified
<muurkha>but it can't actually change the software
<muurkha>and the metamath zero approach means the prover itself can't even claim that a false proof of correctness is true
<muurkha>because the metamath-zero verifier will examine the proof witness and tell you whether it's true or not. you'd have to backdoor the verifier, not the prover
<muurkha>moreover, you'd have to backdoor *every* everifier
<muurkha>*verifier
<oriansj>in exactly the same way
<muurkha>not necessarily exactly the same way; maybe you could backdoor some of them to just always claim that any putative proof witness is valid
<muurkha>and hope nobody notices
<stikonas[m]>Yes but same is true for compilers, you have to backdoor every compiler
<stikonas[m]>So for verifiers it makes sense that you need to backdoor all of them
<muurkha>potentially, yes, but different compilers don't produce the same output
<muurkha>different metamath zero verifiers do, unless one of them is broken
<muurkha>as I understand it
<stikonas[m]>OK, so maybe easier to spot if something tries to give you fake verifier
<muurkha>hopefully!
<muurkha>also the verifier is much, much simpler than a compiler
<oriansj>muurkha: I think we just proved that differently
<stikonas[m]>Yeah, so potentially can inspect it by hand
<stikonas[m]>Well cc_* is tiny
<oriansj>stikonas[m]: I could probably cut it in half if I was willing to get ugly and strip more features out
<muurkha> https://us.metamath.org/other.html#verifiers
<muurkha>I think I was confusing Metamath Zero with Metamath itself
<stikonas[m]>Probably not worth cutting it, that would make m2-planet harder to work with
<oriansj>muurkha: cc_amd64 is just 16,140bytes of binary
<muurkha>oriansj: but how many lines of source code?
<oriansj>muurkha: currently: 5,408
<muurkha>verify.lua is almost the same number of bytes but it's only 380 lines of Lua
<stikonas[m]>Plus lua interpreter which is huge
<oriansj>muurkha: and cc_* always runs as close to the metal as possible
<oriansj>only the syscalls/interrupts depend upon on external bits
<stikonas[m]>And it's only read byte, write byte and memory allocation
<oriansj>and when we go bare metal, it'll be byte for byte the same for all of the logic but the read_byte, malloc and write_byte functions
<stikonas[m]>Well open file/tape too
<oriansj>close and lseek if we are being pedantic
<stikonas[m]>Does cc_* uses lseek?
<stikonas[m]>I throught it just reads it into memory
<stikonas[m]>Only hex1/2 lseeks
<stikonas[m]>Or am I misremembering
<stikonas[m]>Anyway, it's late, going to sleep...
<oriansj>stikonas[m]: you are right on the current version of cc_*, it doesn't lseek. A prototype used lseek to figure out the size of the files for reading, create a single buffer and dump the files into memory to improve performance
<muurkha>goodnight stikonas[m]!
<muurkha>it's true that the Lua interpreter could conceivably have a bug or backdoor that causes verify.lua to execute incorrectly
<oriansj>added about 30 instructions of complexity but cut runtime in half when double emulated
<muurkha>for Metamath Zero they are working on a proof of the Metamath Zero binary: https://arxiv.org/abs/1910.10703
<muurkha>this wouldn't prove the Metamath system consistent; it would only prove that the executable implements the system as specified
<muurkha>moreover, it would still be possible that if all the verifiers capable of checking the proof were flawed in a particular way, that could allow an incorrect proof to slip through
<muurkha>but what it would show is that *if at least one* verifier correctly implements the Metamath semantics, then Metamath Zero correctly implements them as well
<muurkha>as I understand it
<oriansj>muurkha: except for the common dev pattern of assuming you wrote a bug rather than all other "verified" systems are bugged and wrong.
<muurkha>I think this case is probably immune to that problem
<oriansj>muurkha: that is probably one of the most dangerous phrases in our industry.
<muurkha>yeah, but it's not in our industry. it's theorem proving
<muurkha>if your theorem prover accepts incorrect proofs it's wrong, there's no gray area
<muurkha>same if it rejects correct ones. though maybe "out of memory" is a valid response in any case
<theruran>part of the motivation for Milawa theorem prover is that ACL2 has had a number of defects (bugs) in its history that could have affected the prover system
<theruran>as I see it, systems like this and Z3 are only socially verified in a similar way to how Erlang systems are correct-ish until a defect is found in production
<oriansj>well, look if a human verified one of them correct that would be one thing but has that actually been done and did they publish that work?
<theruran>Milawa is verified correct down to the x86 ISA semantics
<theruran>I'm not aware of ACL2 self-verification projects. there is Meta-Coq though but I doubt they have completely verified Coq
<theruran>FStar's type system is verified with assistance with Coq. they wrote a paper about it
<theruran>nice lil diagram to go with
<theruran>but you can see it is not at all straightforward because the systems can't verify themselves
<oriansj>and errata in *ALL* architectures bring problems
<theruran>yes. Jitawa targeting a simpler ISA would've been more interesting
<theruran>if Jitawa could be run on a x86 softcore, then maybe we could have more confidence. but like, I couldn't even get it to run properly on my two computers. this computer gives no output when executing (consp)
<oriansj>well binary dumps can embed bad assumptions, even when formally verified.
<theruran>sure
<theruran>another way to cross-verify is to run it through ACL2's x86 simulator
<oriansj>makes me wonder why your computers couldn't run it properly.
<theruran>one is Void Linux-musl. and it compiles with gcc but running any expression fails on memory address violation
<theruran>this one is Hardened Gentoo-glibc with some hardening on the kernel as well. dunno why consp doesn't work since everything else seems to work
<theruran>oh and I have also compiled it with musl here, and with CompCert
<oriansj>aslr (address space layout randomization) enabled?
<theruran>hol up.. i guess it is working but since newlines are not printed I don't see the output *sometimes*
<theruran>rlwrap helps
<theruran>ASLR enabled on both systems (I think) it is on this one anyway
<theruran>but yeah, linking in libc sucks anyway, imo. wrapper.c just allocates some memory and runs the JIT
<theruran>the stripped static-linked binary is 64K
<oriansj>and probably should do a putc('\n'); to ensure you see output
<theruran>Jitawa does not even have a concept of strings, as far as I can tell. (print "a") errors out
<theruran>no need, for Milawa!
<theruran>muurkha: I am glad to see there is a Metamath proof checker done in Ada!
<oriansj>theruran: I was referring to the output not being seen "sometimes" which would indicate your libc buffering output for performance reasons failing to dump the buffer prior to close
<theruran>oh buffering makes sense
<theruran>it's not really designed for interactive REPL. it was designed for reading Milawa verification conditions :P
<oriansj>well I'm one which believes in human centered software design. If the use of the program in the hands of a naive user results in harm or no useful information to enable proper use, it failed to do its job correctly.
<oriansj>heck even a simple: DONE\n after processing would have made the user experience much better.
<oriansj>and I'm not even asking for the technical user experience improvements like fuzzing the input processing to ensure the user can't hit a segfault
<oriansj>because if I can segfault your verifier, I can compromise its output too
<oriansj>I wonder if we could express the need for bootstrappability in terms of Andresson's corollary
<theruran>what is Adresson's corollary?
<theruran>sorry gotta go to sleep. night!
<muurkha>there's a good list of self-verification papers in the bibliography of the arxiv MM0 paper I linked above
<muurkha>they think that they can do the MM0 proofs with the semantics of only about 25 instructions
<muurkha>oriansj: there are lots of programs that are not useful to naive users, but some of them are worthwhile anyway
<muurkha>you probably aren't going to be able to learn C from GCC, for example
<muurkha>agreed about segfaulting the verifier
<oriansj>theruran: https://www.youtube.com/watch?v=0wtvQZijPzg
<oriansj>muurkha: I'm surprised they didn't reduce it to: sub eax, ebx; mov eax, [ebx]; mov ebx, eax; mov [ebx], eax; cmp eax, ebx; call eax; ret; pop eax; push ebx; inc eax
<oriansj>oh and call 5 (as a cheap way to get the pc onto the stack)
<ekaitz>hi all is anyone here familiarized with tinyCC's internals? I need some help
<stikonas>I doubt it...
<stikonas>I think janneke was the only one who actually was patching tcc
<stikonas>does tcc have mailing list or IRC channel? That might be a better place
<ekaitz>stikonas: I tried the mailing list with no success
<ekaitz>also tried janneke :)
<stikonas>perhaps the better approach would be too add more features to mescc
<stikonas>hmm...
<stikonas>but that's longer term project
<ekaitz>yes... I just want to fix a simple thing but the code is really obscure
<ekaitz>there was a big change in the past and i'm trying to translate that style back to Mes's tinycc fork and I'm having issues
<ekaitz>this is my specific question: https://lists.gnu.org/archive/html/tinycc-devel/2022-08/msg00026.html
<bauen1>stikonas[m]: tcc does have a mailing list (somewhere)
<stikonas_>bauen1: well, that didn't help ekaitz...
***stikonas_ is now known as stikonas
<stikonas>main developer was just asking: why don't you apply the patch to newer tcc
<bauen1>oof, but I also guess thats kind of expected, tinyccs git branch is a wildwest
<ekaitz>stikonas: yep... i almost answered: because you never wanted to simplify tcc's code so we have to do it ourselves
<stikonas>well, but simplifying code is not necesserily the right thing to do
<stikonas>it is a valid argument on their side that long term it's better to improve bootstrapping compiler
<ekaitz>ofc, but the's a reason for us to fork
<stikonas>e.g. M2-Planet improved significantly over the last couple of years
<stikonas>though we also simplified mes to bridge M2-Planet -> mes gap
<ekaitz>yup
<ekaitz>otoh, the code of tcc is very obscure too, which doesn't help at all
<ekaitz>(maybe it's just me who can't read it??)
<stikonas>it does seem to have a lot of one character variables...
<ekaitz>yes
<acrow>touching down in bootstrappable to lurk after seeing comments by orianjs in #guix that the guile maintainers have walked away from bootstrapability? Tell me it isn't so.
<stikonas>acrow: it depends
<stikonas>acrow: guile ships some pre-expanded scheme files
<stikonas>somebody here figured it out how to build those from source
<stikonas>but I think guile didn't want to incorporate it
<stikonas> https://github.com/schierlm/guile-psyntax-bootstrapping
<stikonas>but this targets specific version of guile
<stikonas>possibly can be adapted to the latest guile but doesn't work out of box
<stikonas>acrow: this is the file we are talking about https://git.savannah.nongnu.org/cgit/guile.git/tree/module/ice-9/psyntax-pp.scm
<stikonas>so it's not like e.g rust that depends on itself more strongly
<acrow>I share your concerns about software integrity (and people too). I suspect that at least one former guile maintainer also shares these concerns. I skimmed the objectionable source file but claim no understanding. I'm just going to lurk and try to absorb the wisdom so I might later have a more informed opinion that I would try to defend.
<acrow>I'll also say that having seen some of the old software that your group has managed to pull into bootstrapability I think you are miracle workers. I applaud your skills. \o/
<stikonas>yes, I can't really judge that psyntax-pp.scm either since I barely know any guile
<stikonas>s/guile/scheme/
<acrow>stikonas: I have not yet seen any computer science asserting that rust must depend on itself in any way. In fact, IIUC, Turing would have none of it. If it were not for the existence of the guix recipes I wouldn't trust it. I'm not investing in them. There is some destructive hubris there.
<stikonas>I don't mean it has to depend, there is mrustc that is written in C and can bootstrap it
<stikonas>but that's just one example and we can bootstrap rust
<stikonas>some others aren't bootstrappable yet, like ghc
<stikonas>of course with enough work one can always bootstrap it
<stikonas>in the worse case you can port line by line existing compiler to another language
<acrow>haskell? is not bootstrappable? I didn't know that. Scala is my example of bad behavior.
<stikonas>scala too
<stikonas>but haskell as well
<stikonas>there are more examples I think
<stikonas>and even more if you don't want to allow pre-generated files
<stikonas>e.g. automatic conversions into low-level C or header files
<acrow>Glad to hear the story redeeming rust. Is the rust team keeping mrustc up to date or is this team doing that?
<stikonas>no, that's #mrustc
<stikonas>but some people here (includign me) have used mrustc
<stikonas>but sometimes even non compilers are in this dependency loop
<stikonas>e.g. GNU Autogen uses itself to generate some of it's header files
<acrow>GNU autogen! I thought GNU was more concerned about foundations.
<stikonas>yes, and GCC uses GNU autogen to build some of it's files (top-level Makefile.in if I remember correctly)
<stikonas>though you could build gcc without using top-level configure
<acrow>IIUC, bootstrappable is all about foundations. Bootstrappable is fine with any automated build tools provided that the provenance of the build tools is freely available. Please feel free to take this apart. I think this in many ways bootstrappability is tied or akin to proper documentation but that may be too glib. Does that sound right to you?
<muurkha>oriansj: you mean as an instruction set rather than an instruction sequence?
<oriansj>acrow: I prefer to think of bootstrappable as just the logical conclusion of Free Software principals.
<oriansj>muurkha: as reducing the number of instruction semantics they would need in their proofs