<AlmuHS>the fix for... IRQ? I don't remember <youpi>(buildds haven't picked it up yet thoiugh) <AlmuHS>damo22: the glibc fix for the clock is ready <youpi>one is busy with gcc-9, the other with qemu <youpi>*stuck* with qemu actually, let me fix that <AlmuHS>then I keep my Qemu VM without upgrade for a time <youpi>only rump is affected by the clock_nanosleep issue <youpi>there was also an issue with eatmydata + sudo, but that's really a niche thing <AlmuHS>in my SMP code, in real hardware, I noticed that the tty doesn't reply to keyboard. But mach_ncpus is set to 1, there are not concurrency yet, so I don't know the reason <AlmuHS>in Qemu, the same SMP code works without problems, and the tty reply to keyboard properly <youpi>you mean with your patch but without actually setting mach_ncpus > 1 ? <youpi>possibly try to disable pieces of code that you have introduced <youpi>to determine which one poses problem <AlmuHS>the only strange piece can be a kalloc() <youpi>that can't be it if you don't actually access the data <youpi>really, dichotomy between a working state and a non-working state is a boring, but extremely effective methodology <youpi>the thing is: the non-boring part comes after the boring dichotomy <AlmuHS>my code is not enabling cpus yet. Currently, its only find and enumeration <youpi>just keep patient during the boring dichotomy part <youpi>ok but possibly it buffer-overruns or such <youpi>but dichotomy can reveal which piece exactly poses problem <youpi>and then you can track what's actually happening wrong <AlmuHS>i have to prepare unit-tests for my code <youpi>when you have a working state and a non-working state, you just need to disable/enable code <AlmuHS>ok. But It will be a few long. because this problem only exists over real hardware, not in Qemu <youpi>just be prepare to put an eyebrow on really innocent-looking code <youpi>then work on making the real hardware testing loop really short <youpi>that'll be useful anyway, longterm wise <youpi>I mean never assume any innocent-looking code is actually innocent <youpi>- get_char(vc, (u_short *)&tmp_pos + 1, &temp) > SPACE) { <youpi>+ get_char(vc, (u_short *)tmp_pos + 1, &temp) > SPACE) { <youpi>this typo has been looked over for DECADES <youpi>and actually posed problems to people recently, bringing kernel crashes etc. <youpi>while it should have for the decades <youpi>I have no idea why nobody really reported it before <AlmuHS>this is a line difficult to debug <youpi>when you have a good backtrace, it's obvious where things are wrong <youpi>I didn't have anything more than a backtrace to track it down actually <youpi>since I never could reproduce it myself <AlmuHS>i like to use temporary variables, to avoid doing too things in a line <youpi>but after a few decades, once a user actually sent a backtrace pointing at this very line, I could review it and say "hey that's actually completely wrong" <AlmuHS>maybe an unit-test could detect this typo, because the value is not the expected <youpi>no, it's very hard to settle <youpi>because it happens in a rare condition that's actually produce by a real user typing on the keyboard <youpi>if you wanted to track *that* case you would need thousands of unit-tests <youpi>unit-tests are good when you know what you want to track down <youpi>but they're not enough when you have bugs <youpi>because bugs are almost by definition the cases that you didn't expect <youpi>and thus didn't thought you'd need a unit test for <AlmuHS>yes. In SMP, the higher difficult is the addressing <youpi>there, glibc is at the top for buildds <youpi>(well, almost, yet another haskell rebuild ahead of it) <AlmuHS>i have many errors in addressing, because I try to call a physical address, or I try to access "out of range"... <youpi>you need to be always sure of what you are managing, a pointer to the data or the data itself, yes <youpi>and in the kernel case, physical vs virtual address, indeed :) <youpi>and nothing like sigsev to catch you <youpi>when I see students depressed by a sigsegv, I claim "but that's a blessing!! you have a whole backtrace!!" <youpi>and not a mere panic completely out of the place where the bug actually is <AlmuHS>in a kalloc(), I had a panic because I called free() after the new kalloc() <AlmuHS>nope, the panic was call free() before assign the new memory to the pointer <AlmuHS> apic_data.cpu_lapic_list = new_list; <AlmuHS>if this lines are put in inverse order, the kernel shows a panic <youpi>that's what I meant: the panic points you at something, but it's not really exactly what you should look at, and thus you need to be inventive in what could have been going wrong <youpi>and even worse, quite often you have to fix *several* things until the bad effect goes <youpi>so you shouldn't say "nah, it's not that since it doesn't seem to be fixing the issue" <youpi>yes, it is that,but not only <AlmuHS>in this function, I try to resize a dynamic array. I reserved 255 elements. But, after enumerate the processors, maybe I don't need 255. So I resize to fit the array size to the real number of processors <youpi>heh, yet more haskel rebuilds came on the way ***Server sets mode: +nt
<AlmuHS>yes, but manage the list manually it's a source of errors <youpi>always remember that grep and find are you best friends <damo22>i like that "git grep" only searches in current subtree <AlmuHS>then, maybe I can avoid the array resizing. Simply I store the processors in a temporary linked-list and, after enumerate, I add them to the array with the correct size <youpi>or reallocate the array twice longer each time you need more <youpi>so that the reallocation cost is actually constant <AlmuHS>yes, but the resizing is a delicate process, I think. It's easy to produce errors <damo22>is a fixed size of 255 too much for all pcs? <youpi>managing a list is as well ;) <AlmuHS>by this reason I was searching a library ;) <youpi>per-cpu data often grows quite quickly actually <youpi>even using a linked-list library is delicate <AlmuHS>by this reason I set the NCPUS to 255 in my patch <damo22>if there are never more than 256 cpus could you not just preallocate an array of fixed size? <youpi>that still looks like a waste <youpi>we can live with it initially, but meh :) <AlmuHS>at first, i reserved 255 as fixed size. But I want to optimize this <damo22>but as a proof of concept, theres not much point optimising code until you have something that works <AlmuHS>i can disable the resizing, to test if this is the cause of the hang <AlmuHS>in my case, I add 4) add it modular, and manageable <youpi>beautiful = readable, maintainable, modular, elegant, nice, ... <AlmuHS>richard told me that the SMP code must be modular, and independant of architecture. For this reason I added the SMP pseudoclass <AlmuHS>the next step, once I got work this properly, will be implementing cpu_number() and CPU_NUMBER(). And the first might be architecture-independent <AlmuHS>really, my current work is mainly to refactor the work of last year <AlmuHS>Richard was angry when he saw my code <youpi>the cleanup part is a boring and long , but necessary step <youpi>otherwise the long-term maintenance is a nightmare <AlmuHS>because there are many globals, extern, arch-dependent code... <youpi>we suffered quite a bit after Zheng Da's work <AlmuHS>so I take note about his advices, and I'm refactoring following these <AlmuHS>the most delicated step will be the cpu enabling. I will have to find a way to implement atomic operations <youpi>you can use gcc's atomic operations <youpi>c11's atomic operations, even <AlmuHS>thanks by this info. I didn't know <AlmuHS>I will have to be careful to avoid a cpu tries to execute the assembly routine at this time to other <AlmuHS>I have to fully serialize the cpus enabling and configuration <AlmuHS>even the adding cpus to the kernel might be serialized <damo22>perhaps have a look at coreboot's code that inits the cpus <AlmuHS>but coreboot works in BIOS level. It has less restrictions <AlmuHS>the cpus initialization into the BIOS has other instructions <damo22>yes but there is code in coreboot that sends IPI's and enumerates cpus <AlmuHS>the enumeration and IPI sending it's not the problem. The concurrency is <AlmuHS>I have to go sleep. Tomorrow I have to wakeup early <AlmuHS>good night youpi. good morning damo22 ;) ***_Posterdati_ is now known as Posterdati
***jma is now known as junlingm
<youpi>I don't know, I never really played with libddekit <youpi>but I doubt it'd be a usage error <youpi>(hint: printfs are extremely efficient ways to determine where the problem actually lies) <junlingm>true. I will checkout libddekit and try to debug it. <youpi>(well, you can start with gdb, in case single-stepping does actually work) <junlingm>it does not step into ddekit functions :( <youpi>did you install the -dbgsym package? <junlingm>probably a version mismatch. I recompiled the libddekit from the git master branch, and the problem disappeared.