IRC channel logs

<youpi>"soon" is actually today :)

<AlmuHS>what?

<youpi>glibc

<AlmuHS>the fix for... IRQ? I don't remember

<youpi>the fix for clock_nanosleep

<AlmuHS>true

<youpi>(buildds haven't picked it up yet thoiugh)

<AlmuHS>damo22: the glibc fix for the clock is ready

<youpi>one is busy with gcc-9, the other with qemu

<youpi>*stuck* with qemu actually, let me fix that

<AlmuHS>ok

<AlmuHS>then I keep my Qemu VM without upgrade for a time

<youpi>upgrading should be fine

<youpi>only rump is affected by the clock_nanosleep issue

<youpi>there was also an issue with eatmydata + sudo, but that's really a niche thing

<AlmuHS>in my SMP code, in real hardware, I noticed that the tty doesn't reply to keyboard. But mach_ncpus is set to 1, there are not concurrency yet, so I don't know the reason

<AlmuHS>in Qemu, the same SMP code works without problems, and the tty reply to keyboard properly

<youpi>you mean with your patch but without actually setting mach_ncpus > 1 ?

<AlmuHS>yeah

<youpi>possibly try to disable pieces of code that you have introduced

<youpi>to determine which one poses problem

<AlmuHS>the only strange piece can be a kalloc()

<youpi>that can't be it if you don't actually access the data

<youpi>really, dichotomy between a working state and a non-working state is a boring, but extremely effective methodology

<youpi>the thing is: the non-boring part comes after the boring dichotomy

<AlmuHS>my code is not enabling cpus yet. Currently, its only find and enumeration

<youpi>just keep patient during the boring dichotomy part

<youpi>ok but possibly it buffer-overruns or such

<youpi>but dichotomy can reveal which piece exactly poses problem

<youpi>and then you can track what's actually happening wrong

<AlmuHS>i have to prepare unit-tests for my code

<youpi>no need for unit-test

<youpi>just #if 0

<youpi>that's fine enough

<youpi>when you have a working state and a non-working state, you just need to disable/enable code

<AlmuHS>ok. But It will be a few long. because this problem only exists over real hardware, not in Qemu

<youpi>just be prepare to put an eyebrow on really innocent-looking code

<youpi>then work on making the real hardware testing loop really short

<youpi>that'll be useful anyway, longterm wise

<AlmuHS>eyebrow?

<youpi>bad eye

<youpi>frowned eye

<AlmuHS>what do you refers exactly?

<youpi>I mean never assume any innocent-looking code is actually innocent

<AlmuHS>it's true

<youpi>- get_char(vc, (u_short *)&tmp_pos + 1, &temp) > SPACE) {

<youpi>+ get_char(vc, (u_short *)tmp_pos + 1, &temp) > SPACE) {

<youpi>this typo has been looked over for DECADES

<AlmuHS>LOL

<youpi>and actually posed problems to people recently, bringing kernel crashes etc.

<youpi>while it should have for the decades

<youpi>I have no idea why nobody really reported it before

<AlmuHS>this is a line difficult to debug

<youpi>not really

<youpi>when you have a good backtrace, it's obvious where things are wrong

<youpi>I didn't have anything more than a backtrace to track it down actually

<youpi>since I never could reproduce it myself

<AlmuHS>i like to use temporary variables, to avoid doing too things in a line

<youpi>but after a few decades, once a user actually sent a backtrace pointing at this very line, I could review it and say "hey that's actually completely wrong"

<AlmuHS>maybe an unit-test could detect this typo, because the value is not the expected

<youpi>no, it's very hard to settle

<youpi>because it happens in a rare condition that's actually produce by a real user typing on the keyboard

<youpi>if you wanted to track *that* case you would need thousands of unit-tests

<AlmuHS>real

<youpi>unit-tests are good when you know what you want to track down

<youpi>yes, for real

<youpi>but they're not enough when you have bugs

<youpi>because bugs are almost by definition the cases that you didn't expect

<youpi>and thus didn't thought you'd need a unit test for

<AlmuHS>yes. In SMP, the higher difficult is the addressing

<youpi>there, glibc is at the top for buildds

<youpi>(well, almost, yet another haskell rebuild ahead of it)

<AlmuHS>i have many errors in addressing, because I try to call a physical address, or I try to access "out of range"...

<youpi>you need to be always sure of what you are managing, a pointer to the data or the data itself, yes

<youpi>and in the kernel case, physical vs virtual address, indeed :)

<youpi>and nothing like sigsev to catch you

<AlmuHS>kernel panic, simply

<youpi>when I see students depressed by a sigsegv, I claim "but that's a blessing!! you have a whole backtrace!!"

<youpi>and not a mere panic completely out of the place where the bug actually is

<AlmuHS>in a kalloc(), I had a panic because I called free() after the new kalloc()

<AlmuHS>nope, the panic was call free() before assign the new memory to the pointer

<AlmuHS> apic_data.cpu_lapic_list = new_list;

<AlmuHS> kfree(old_list);

<AlmuHS>if this lines are put in inverse order, the kernel shows a panic

<youpi>that's what I meant: the panic points you at something, but it's not really exactly what you should look at, and thus you need to be inventive in what could have been going wrong

<AlmuHS>yes, it's true

<youpi>and even worse, quite often you have to fix *several* things until the bad effect goes

<youpi>so you shouldn't say "nah, it's not that since it doesn't seem to be fixing the issue"

<youpi>yes, it is that,but not only

<AlmuHS>in this function, I try to resize a dynamic array. I reserved 255 elements. But, after enumerate the processors, maybe I don't need 255. So I resize to fit the array size to the real number of processors

<youpi>heh, yet more haskel rebuilds came on the way

<youpi>I'll just bump glibc

<AlmuHS>haskell? the language?

***Server sets mode: +nt

<damo22>struct blob *next ?

<youpi>$ find . -name \*list\*

<youpi>./kern/list.h

<youpi>that could be it

<AlmuHS>yes, but manage the list manually it's a source of errors

<youpi>looks so, at least

<youpi>always remember that grep and find are you best friends

<damo22>i like that "git grep" only searches in current subtree

<AlmuHS>then, maybe I can avoid the array resizing. Simply I store the processors in a temporary linked-list and, after enumerate, I add them to the array with the correct size

<youpi>that's a way, yes

<youpi>or reallocate the array twice longer each time you need more

<youpi>so that the reallocation cost is actually constant

<AlmuHS>yes, but the resizing is a delicate process, I think. It's easy to produce errors

<damo22>is a fixed size of 255 too much for all pcs?

<youpi>managing a list is as well ;)

<AlmuHS>by this reason I was searching a library ;)

<youpi>per-cpu data often grows quite quickly actually

<youpi>even using a linked-list library is delicate

<AlmuHS>xAPIC has the limit in 256 cpu

<AlmuHS>by this reason I set the NCPUS to 255 in my patch

<damo22>if there are never more than 256 cpus could you not just preallocate an array of fixed size?

<youpi>that still looks like a waste

<youpi>we can live with it initially, but meh :)

<AlmuHS>at first, i reserved 255 as fixed size. But I want to optimize this

<damo22>ok

<damo22>but as a proof of concept, theres not much point optimising code until you have something that works

<youpi>sure

<youpi>1) make it work

<AlmuHS>i can disable the resizing, to test if this is the cause of the hang

<youpi>2) make it beautiful

<youpi>3) make it fast

<AlmuHS>in my case, I add 4) add it modular, and manageable

<youpi>that's part of 2)

<youpi>beautiful = readable, maintainable, modular, elegant, nice, ...

<AlmuHS>richard told me that the SMP code must be modular, and independant of architecture. For this reason I added the SMP pseudoclass

<AlmuHS>the next step, once I got work this properly, will be implementing cpu_number() and CPU_NUMBER(). And the first might be architecture-independent

<AlmuHS>really, my current work is mainly to refactor the work of last year

<AlmuHS>Richard was angry when he saw my code

<youpi>the cleanup part is a boring and long , but necessary step

<youpi>otherwise the long-term maintenance is a nightmare

<AlmuHS>because there are many globals, extern, arch-dependent code...

<youpi>we suffered quite a bit after Zheng Da's work

<AlmuHS>so I take note about his advices, and I'm refactoring following these

<AlmuHS>the most delicated step will be the cpu enabling. I will have to find a way to implement atomic operations

<youpi>you can use gcc's atomic operations

<youpi>no need to reimplement them

<youpi>c11's atomic operations, even

<AlmuHS>thanks by this info. I didn't know

<AlmuHS>I will have to be careful to avoid a cpu tries to execute the assembly routine at this time to other

<AlmuHS>**at same time

<AlmuHS>I have to fully serialize the cpus enabling and configuration

<AlmuHS>even the adding cpus to the kernel might be serialized

<damo22>perhaps have a look at coreboot's code that inits the cpus

<AlmuHS>but coreboot works in BIOS level. It has less restrictions

<damo22>yes, so it has simpler code

<AlmuHS>the cpus initialization into the BIOS has other instructions

<damo22>yes but there is code in coreboot that sends IPI's and enumerates cpus

<AlmuHS>the enumeration and IPI sending it's not the problem. The concurrency is

<damo22>you can take ideas from that

<AlmuHS>yes, this is true

<AlmuHS>I have to go sleep. Tomorrow I have to wakeup early

<AlmuHS>good night youpi. good morning damo22 ;)

<damo22>ok bye!

***_Posterdati_ is now known as Posterdati

***jma is now known as junlingm

<junlingm>youpi: when I use ddekit_large_malloc twice in a row, the send allocation failed with an invalid argument error. Here is a minimal test case: http://pastebin.com/embed_js/82unk12r

<junlingm>Did I forget to init something?

<youpi>I don't know, I never really played with libddekit

<youpi>but I doubt it'd be a usage error

<youpi>rather a bug inside libdde

<youpi>libddekit*

<junlingm>ok. thanks.

<youpi>(hint: printfs are extremely efficient ways to determine where the problem actually lies)

<junlingm>true. I will checkout libddekit and try to debug it.

<youpi>(well, you can start with gdb, in case single-stepping does actually work)

<junlingm>it does not step into ddekit functions :(

<youpi>did you install the -dbgsym package?

<junlingm>I did

<junlingm>probably a version mismatch. I recompiled the libddekit from the git master branch, and the problem disappeared.

IRC channel logs

2020-07-23.log