IRC channel logs

2016-05-25.log

back to list of logs

<amz3>héllo
<amz3>I just met an occasion to use the recursive fold but in javascript
<amz3>does anyone one a good implementation of the recursive fold in javascript?
<galex-713>Hi, what is or-map ? I can’t find doc for it
<lloda>according to boot-9.scm, '(or-map fn lst) is like (or (fn (car lst)) (fn (cadr lst)) (fn...) ...)'
<lloda>sounds a lot like (srfi srfi-1) any
<galex-713>yeah, people on #scheme talked me about when I asked for a potentially-parallel any
<galex-713>lloda: why isn’t there doc for in either in r5rs, srfi or guile doc?
<galex-713>Erm, I’ve another question: are the basic procedures/syntax like “map”, etc. which do their things in “unspecified order” according spec, parallel in guile?
<davexunit>galex-713: no, but there is par-map
<galex-713>davexunit: is it standard or guile-specific?
<davexunit>guile specific
<galex-713>ok
<galex-713>and is there anything such as par-any or par-every?
<davexunit>refer to the manual
<avoine> https://www.gnu.org/software/guile/manual/html_node/Parallel-Forms.html
<avoine>there is a par-for-each
<davexunit>if you like to live dangerously
<galex-713>avoine: oh thanks :D is that standardized or to be standardized?
<davexunit>no
<galex-713>even in a srfi or in a future rnrs?
<galex-713>is there a special reason for that?
<davexunit>no
<davexunit>basically, someone thought guile should have these parallel forms, and they added them.
<davexunit>I don't even know if any scheme standard says anything about threads.
<galex-713>ah ok, so if someone brings the idea maybe one day the standard could really say some implementation could make something parallel right?
<davexunit>yeah if the folks making the standards were to agree on that
<davexunit>but if you're suggesting that 'map' should just be parallel by default, that would never happen.
<davexunit>threads are dangerous.
<galex-713>why?
<galex-713>why are they I mean
<mthl>davexunit: I am not an expert but threads seems safe when using immutable data structures, no?
<davexunit>mthl: a persistent data structure is not necessarily thread-safe, see guile's vlists
<galex-713>if they are that’s cool because I was searching something like a parallel any or or-map or or procedure to calculate primality of numbers
<galex-713>I mean, what can go wrong using threads to calculate math things?
<davexunit>well, it depends.
<justin_smith>galex-713: something needs to consume the value surely, and that's where the ugly usually comes in
<davexunit>if you're using pure functions then you're well on your way to having something safely parallelizable
<justin_smith>davexunit: super weird to have a non-threadsafe immutable data structure (to my clojure eyes that is), but good to know
<galex-713>davexunit: and is there an automatable way of knowing if something is going to be thread-safe so it could be automatically parralelizable in that case?
<davexunit>justin_smith: I'm just trying to demonstrate that one doesn't imply the other.
<davexunit>galex-713: no
<galex-713>oh ok
<galex-713>I see
<davexunit>galex-713: or rather, maybe there's a phd paper that needs writing
<davexunit>;)
<justin_smith>davexunit: understood - I'd just never thought of that possibility, clearly I'm spoiled.
<galex-713>yeah ok, I see, like still nobody did that
<davexunit>spoiled?
<davexunit>not the way I'd describe using Clojure but OK :)
<justin_smith>davexunit: in my day job I have a language where I'd never have to worry about that
<justin_smith>davexunit: in some ways clojure is a huge pain in the ass, admitted, but data structures is not one of them
<davexunit>well to be fair you could also not worry about it in Guile
<galex-713>So I wonder the best way of doing that is doing local “(set! map par-map)” etc. when you do thread-safe things
<davexunit>Guile admittedly doesn't have a great concurrency story right now, but it's certainly possible.
<davexunit>galex-713: that would most likely have a lot of negative side effects.
<davexunit>your programs might crash, and they might also run slower.
<justin_smith>davexunit: sadly I'm too stupid to write correct code that uses locks and mutexes, but yes I did establish it was at least possible for some people :)
<davexunit>justin_smith: I'm not talking about that stuff.
<davexunit>that stuff is terrible.
<justin_smith>oh, OK
<davexunit>my point is that Guile can just as well have persistent vectors, hashes, etc. that are thread-safe
<galex-713>davexunit: oh wait you mean the binding would also affect calls made by procedure you call… is there a way to avoid that, at least locally?
<justin_smith>does "can" mean if I go and write them, or if I import the right library?
<davexunit>it's just that it hasn't been a focus for Guile, but that's starting to change.
<justin_smith>davexunit: that's good to hear
<davexunit>justin_smith: there's nothing baked into guile, but a few people have implemented various purely functional data structures.
<justin_smith>cool - is there a good writeup somewhere describing the sanest way to do concurrency in guile as of today?
<davexunit>not sure
<davexunit>using those parallel forms is probably a good start
<davexunit>you can map and fold over lists all you want
<galex-713>davexunit: maybe making more parallel forms and implement imutable types in a C module then modify the implementation of map, etc. to use the parallel forms when operating on immutable objects? or that would be horrible for efficiency/terribly complex to implement?
<justin_smith>what about coordination between threads? eg. maybe a good concurrent queue or channel system?
<davexunit>justin_smith: I don't know about those things.
<davexunit>I just write stuff when I can't find an existing implementation.
<justin_smith>galex-713: my design-sense tingles at that - because then you end up with behaviors that are different without visible differences in code and experience warns me that can go very badly
<davexunit>galex-713: writing more C modules is not what we want.
<davexunit>it wouldn't make sense to modify map
<davexunit>because *most* problems do not need parallel anythiong
<davexunit>anything*
<davexunit>most of the time, threads are the wrong thing.
<galex-713>davexunit: I thought designing new type to make imutable things would require such low level thing as to require C
<davexunit>but sometimes they are exactly what the doctor ordered.
<davexunit>galex-713: most likely not, but I don't understand enough about what you propose to say one way or the other.
<davexunit>all I know is that changing the semantics of 'map' would be very bad.
<davexunit>ACTION goes afk
<galex-713>I was just thinking to the thing you said when you said “my point is that Guile can just as well have persistent vectors, hashes, etc. that are thread-safe”, I thought it would require C code
<justin_smith>galex-713: I could imagine a variant on map that knows how to handle immutable values, or even "polymorphic", using the regular map for lists, and then calling other code for other types, but conditionally replacing map itself seems iffy.
<galex-713>justin_smith: yeah what I was thinking about was some kind of polymorphism, but by modifying the code bound to “map” directly
<galex-713>Also I supposed that if semantic is changed only with some new thread-safe type, most of cases (i.e. all other cases) would continue to do the same
<galex-713>but maybe there are some good arguments against polymorphism I still don’t know that davexunit could say us about when coming back ^^
<justin_smith>galex-713: polymorphism is nice, implementing it by replacing code with other code implicitly can lead to bad debugging experiences for me
<galex-713>ahhhhh I see
<galex-713>that’s a point :)
<galex-713>justin_smith: then is there some other cleaner way to do polymorphic stuff in guile?
<justin_smith>galex-713: it's definitely possible via goops
<galex-713>Ok, so via goops as I thought
<justin_smith>I don't know if there's another option or not.
<galex-713>But would it be a nice thing to implement parallelization via goops?
<galex-713>Btw, why is polymorphism implemented directly into goops? is polymorphism so linked to oop?
<justin_smith>galex-713: it's much more fundamental than eg. inheritence is
<justin_smith>(if the inventors of OOP are to be trusted on the subject)
<galex-713>to oop you mean?
<justin_smith>on the other hand, the inventors of OOP would say polymorphism is just a weird way to do message passing
<justin_smith>yes, to OOP
<galex-713>ok
<galex-713>but is it fundamental only to oop? that’s my point
<galex-713>Like is it more fundamental to oop than to anything else than inheritence could be?
<justin_smith>I think so, as do the people who invented OOP. Most others probably disagree.
<galex-713>Because, I mean, inheritence doesn’t have sense outside of oops, while polymorphism has
<justin_smith>galex-713: all the more proof of how important polymorphism is, and how nearly useless inheritence is :P
<galex-713>like in our case, where we could use polymorphism to implement better concurrency without needing oop at all
<galex-713>inheritence is useful when you make new types/objects, otherwise I don't see how you could use it
<justin_smith>galex-713: to me, if you are polymorphic, you are already doing the interesting part of OOP, and you don't need the rest (beyond maybe encapsulation, but since fortran everyone has that)
<justin_smith>galex-713: but we were talking about new data structures - those are new types/objects right?
<galex-713>justin_smith: yeah, that’s where I can’t say more :) I thought that was low level stuff needing C code (thus no oop thing) but davexunit said that could made with scheme, so I wonder if effectively inheritence can be used at that level…
<galex-713>But that seems weird to me since lisp is all about axioms and sequences
<galex-713>so I thought making a new type would just mean same lines of C defining current mutable lists/vectors/etc. but with some “const” keyword somewhere, so with no inheritence need
<justin_smith>well, to use inheritence to do a new map, map would need to be a thing you could inherit. I think the cleaner thing given the existing world / API here is to make a new thing (a polymorphic map) and use that.
***petercom1and is now known as petercommand
<davexunit>guile supports generic functions via goops, so you could indeed redefine map as a generic function if you wanted
<davexunit>for example, when goops is initialized, procedures like + become generic procedures
<justin_smith>oh, so this is already an established way of doing things
<justin_smith>OK
<galex-713>davexunit: but would that be a clean way of doing to make goops a requirement to easy/simple way of making things more concurrent?
<davexunit>galex-713: I believe the two are orthogonal
<galex-713>which two?
<galex-713>davexunit:
<galex-713>*^
<davexunit>goops and concurrency
<galex-713>ah ok, as I thought
<galex-713>davexunit: so do you see any clean way of doing such things?
<davexunit>use the parallel forms that guile provides
<davexunit>I don't understand why this is not an acceptable solution
<galex-713>well, it means you have to be aware of threads, their usage, what parallel forms does exist, how to use them, when to use them, when not to use them, etc. to doing anything concurrent, i.e. most of code will continue not to be concurrent, while making more code concurrent can help to exploit resources of a growing number of CPUs
<galex-713>(since moore law expire next year and industries already starts to do multicore everywhere)
<galex-713>*expires
<davexunit>well par-map and co. are pretty friendly introductions to parallel computing because they do the hard work for you.
<galex-713>yeah, that’s a good start as you said :)
<galex-713>for me the best example of parallel-stuff-where-you-don’t-even-need-to-know-about-thread are makefiles
<davexunit>and yet many makefiles don't successfully work if you build in parallel :)
<galex-713>well written makefiles (that mean makefile are poor enough to make possible to write them bad though) are automatically parralel
<galex-713>Yeah that’s when you put too much shell code in proportion to makefile code i.e. when things aren’t functional enough
<galex-713>big and complex projects fail in parralel, but I built guile, emacs, etc. in parralel and that worked fine
<davexunit>yeah they are good projects that do things right ;)
<galex-713>^^
<galex-713>Is there anything like python/elisp’s docstring with scheme/guile?
<mark_weaver>galex-713: yes, guile has docstrings
<galex-713>mark_weaver: how do you access them? and how do you define them in scheme?
<davexunit>galex-713: use procedure-documentation to get the docstring of a procedure, or #f if there is none.
<mark_weaver>galex-713: you can use the ",describe" REPL command, geiser from emacs, or 'object-documentation' from (ice-9 documentation) to access
<davexunit>that's a more complete answer :)
<mark_weaver>davexunit: unfortunately, it seems that 'procedure-documentation' doesn't work for some kinds of procedures.
<mark_weaver>galex-713: as for defining them, if the body of a procedure starts with a string literal, that's taken to be the docstring
<galex-713>Oh cooool :D
<galex-713>Like in elisp ^^
<mark_weaver>right
<galex-713>:)
<davexunit>mark_weaver: oh, bummer.
<galex-713>Also, now I’m asking myself, why is there or-map and and-map while we can do (any identity ...) (every identity ...)?
<mark_weaver>galex-713: it's historical. most likely or-map and and-map were conceived before SRFI-1 was produced
<galex-713>oh I see
<galex-713>so it’s better to use any and every now right?
<mark_weaver>it's certainly more portable
<mark_weaver>galex-713: btw, I should mention that things like 'par-map' and 'par-for-each' have a significant overhead for synchronization of threads, and generally only make sense to use when the amount of work done by the procedure passed to them is non-trivial.
<mark_weaver>if you were to do something like (par-map 1+ <big list of numbers>) you would find that most of the execution time of wasted doing thread sychronization
<mark_weaver>s/of wasted/is wasted/
<galex-713>mark_weaver: Ah ok, I was going to use it to try to test the remained of some number divided by all numbers inferior to its square root, and that “some number” will go from 1 to some millions, is that sufficient amount of work?
<mark_weaver>no
<galex-713>or maybe would it be better to parallelize the test for each number?
<galex-713>mark_weaver: then, for each potential prime but not for each potential divisor? since I’m going to divide each potential prime by a lot of number on average?
<mark_weaver>even with C, you'd find that spawning so many threads and joining them would be very expensive
<mark_weaver>in order for something like 'par-map' to pay off, you need to do a lot more work in each call to the procedure that you pass to it
<galex-713>I thought par-map wasn’t making one thread per list all the time but did that in a portable way according the number of cpu?
<mark_weaver>well, yes, it does use a thread pool, but still there are overheads in the thread syncrhonization
<mark_weaver>which is generally very expensive
<galex-713>oh ok
<mark_weaver>also, doing so many divisions is going to suck anyway
<galex-713>Yes I know, latter I will implement atkin crible
<mark_weaver>better to use eratosthenes sieve
<galex-713>*sieve
<galex-713>I read atkin is better than eratosthene
<mark_weaver>oh, I don't know the atkin method, maybe that's better
<galex-713>found that on wikipedia x)
<galex-713>Anyway, from which complexity threads do pay?
<galex-713>*what complexity
<mark_weaver>I'm sorry, I don't understand the question
<galex-713>I mean, from what “amount of work” precisely making threads will improve efficiency?
<mejja>no one knows, basically an open question
<mark_weaver>there is a non-trivial administrative overhead associated with each call to the procedure passed to 'par-map'. you can measure it by using a trivial procedure and measuring the total runtime
<galex-713>ok I see
<mark_weaver>it varies by architecture
<mark_weaver>you need to ensure that the amount of work done by 'proc' is large enough that the administrative overhead is a small percentage
<galex-713>ah, so there’s no automated way to make concurrent code with guile?
<galex-713>ah, so it needs to be a lot of work whatever the architecture
<galex-713>Don’t know how to know that (I got only x86 here)
<mark_weaver>no, there's no automated way
<galex-713>is that inherent to concurency or just to guile implementation of it?
<mark_weaver>it's inherent to concurrency, but admittedly in guile the overheads are higher because our thread synchronization primitives have some added features such as fairness that make them a bit slower than they need to be.
<mark_weaver>but basically, shared memory multiprocessing simply doesn't scale in practice, so what is actually implemented is significantly different than what you intuitively expect.
<galex-713>ok
<mark_weaver>what is implemented in hardware is essentially message passing, but it's done automatically int he hardware.
<galex-713>is all that sync necessary even when dealing with constants? (i.e. dividing stuff)
<mark_weaver>and it sort of tries to approximate your intuitive notions of mutating a shared memory, except that it fails badly
<mark_weaver>e.g. different cpus observe writes in a different order than they were done, and so things that you wouldn't expect could happen in fact happen.
<mark_weaver>e.g. things like this: you initialize a data structure and then, after it's fully initialized, make a global variable point to it.
<mark_weaver>then another thread fetches the global variable and finds that from its point of view, the structure hasn't yet been initialized.
<mark_weaver>because the writes to RAM are done out of order
<galex-713>ok
<mark_weaver>basically, the semantics of modern weakly ordered memory architectures is kind of a nightmare, and I suspect that's the reason that some languages such as scheme and haskell avoid introducing this mess into their language standards.
<galex-713>oh, even haskell isn’t parallel?
<mark_weaver>GHC uses green threads
<galex-713>what’s green threads?
<mark_weaver>it means that only one kernel-level thread is used, and it switches between threads at certain "safe points"
<mark_weaver>so it can't make use of multiple cores within a single process
<mark_weaver>but on the other hand, the semantics become sane, and the synchronization overheads become much less.
<galex-713>ok
<mark_weaver>here's a talk that mentions some of the difficulties: https://media.ccc.de/v/31c3_-_6574_-_en_-_saal_1_-_201412301245_-_why_are_computers_so_and_what_can_we_do_about_it_-_peter_sewell
<galex-713>do lisp has a cache for things such as (sqrt 50)? like is it better in terms of perfs to do the sqrt once instead of putting it inside the do loop test?
<mark_weaver>there's no cache for that, no
<mark_weaver>definitely better to do the sqrt just once
<galex-713>ok
<mark_weaver>better yet, avoid the square root and instead keep track of the square of the divisor you're testing.
<galex-713>once again: would a such automated cache thing would be a nightmare to implement too?
<mark_weaver>which you can do without multiplication
<galex-713>even better than doing sqrt once you mean?
<mark_weaver>well, automatic caching raises several issues. it would either have to be a per-thread cache, or each there would have to be thread sychronization to protect the cache, which would be too expensive given the cost of the operation
<galex-713>so if per-thread it’s possible
<galex-713>*?
<mark_weaver>and then there's the fact that in many numerical applications, the cache is not likely to pay off, if you are rarely asking for the same sqrt.
<mark_weaver>but you'd still be paying for the cache, in terms of both memory usage and time spent checking it
<galex-713>would that be expansive to check you’ve done the same things several times before to cache?
<mark_weaver>IMO, it doesn't make sense to build caches into numeric primitives like this
<galex-713>I thought into compiler more than numeric primitives
<galex-713>like for all functions treating constant values
<galex-713>i.e. all the purely functional things
<mark_weaver>oh, sure, we have constant propagation in guile
<mark_weaver>so yeah, if you really do write (sqrt 50) with a literal '50' in there, it might do the sqrt at compile-time, if 'sqrt' is one of the things that we assume will not be changed by the user. I don't remember offhand
<galex-713>ohhh ok
<galex-713>cool
<mark_weaver>we *do* make assumptions that basic math operations like + and - won't be changed by the user
<galex-713>oh wait, it only works for my test thing in my case…
<mark_weaver>galex-713: another issue with 'sqrt' is that it involves inexact math, which is best to avoid if you want things to work reliably
<galex-713>But do the compiler at least do things like exiting the most things out of loops/recursive calls?
<mark_weaver>that's the other reason why it would be better to square the trial divisor and compare it with the prime you're testing.
<galex-713>but wouldn’t that mean to square each divisor instead of sqrt-ing only once the number to divide?
<mark_weaver>that's true, that would be more expensive. but if you're trying to divide by every integer (or odd integer) in a range, then you can avoid the multiplication. the difference between successive squares are the odd integers.
<mark_weaver>but even that might be more expensive than what you suggested.
<galex-713>so there’s no way a compiler looks at a loop which sqrt the same number each time and decide to sqrt once at the beginning?
<mark_weaver>btw, your trial divisors can be limited to the *primes* <= the sqrt, which can be used to speed things up significantly.
<mark_weaver>galex-713: I don't know off-hand if our compiler can do that yet
<mark_weaver>galex-713: you can use the ,optimize REPL command to see what our partial evaluation pass is able to do
<mark_weaver>the compiler in guile-2.1.x, which are the prereleases leading up to 2.2.x, is much better than the one in guile-2.0.x, but I don't know off-hand if it can do what you're looking for here.
<random-nick>iirc 2.2 can do loop peeling
<random-nick> http://wingolog.org/archives/2015/07/28/loop-optimizations-in-guile
<galex-713>random-nick: does that affects looping with recursive tails?
<random-nick>galex-713: that's basically the most used loop technique in scheme
<random-nick>the alternatives being do or continuations
<mark_weaver>'do' is a macro that expands to something that does a recursive call in tail position.
<random-nick>so just continuations
<mark_weaver>in scheme calls in tail position are GOTO with arguments, semantically.
<mark_weaver>hmm, how do you make a loop with continuations?
<mark_weaver>I would go further and say that in scheme, recursive calls in tail position is *the* way to do looping
<mark_weaver>or any state machine, actually
<mark_weaver>well, finite state machine
<galex-713>so, could an (sqrt n) --with n not being affected by a loop-- be moved outside of the loop?
<mark_weaver>of course, we have higher-level procedures like 'fold' that can abstract away those details
<mark_weaver>galex-713: if the compiler makes assumptions about what procedure 'sqrt' is bound to, then it can certainly be done in principle. I don't know off hand if we have that implemented yet.
<mark_weaver>galex-713: but I've told you about the ,optimize REPL command
<galex-713>I tried it, seems it’s not implemented on my version of guile
<galex-713>(well, my versions of guile, I have the debian testing one which is 2.0, and the guile-emacs git one which is either 2.1 or 2.2)
<davexunit>it's in both 2.0 and 2.1
<galex-713>What?
<davexunit>the ,optimize REPL command
<davexunit>or are you saying that the particular optimization you were wondering about is not present?
<davexunit>if so, nvm. :)
<galex-713>what’s nvm ?
<mark_weaver>nvm == nevermind
<galex-713>ah ok
<ArneBab_>justin_smith: I use fortunes for paralellism in Guile, as well as par-map.
<ArneBab_>galex-713: you could easily add a cache to a function
<galex-713>ArneBab_: how? and yet that would be only for a function, not all-mostly-functional-functions
<galex-713>Anyway my case should be solved by loop peeling :)
<justin_smith>ArneBab_: any chance "fortune" was a typo for "future"? I'm not seeing anything in google for fortune that looks relevant
<ArneBab_>justin_smith: argl, yes, it was
<ArneBab_>galex-713: ^
<galex-713>ok
<ArneBab_>galex-713: (define cache '())(define (sqrt/cached number) (let ((cached (assoc-ref number cache))) (if cached cached (let ((res (sqrt number))) (set! cache (acons number res cache)) res))))
<ArneBab_>^ for a cached sqrt, simplest version (not fastest: alists aren’t fast for a huge number of entries)
<ArneBab_>galex-713: also you might want to do some LRU stuff
<ArneBab_>also the global cache is pretty dangerous (you’ll want to hide that)
<ArneBab_>(but I don’t see a way to do it right now without exposing λ)
<justin_smith>ArneBab_: and you might want an thread-safe collection if you plan to parallelize at all
<justin_smith>err, I mean, if you plan to parallelize you will need a thread-safe collection or your code will be incorrect
<ArneBab_>justin_smith: since I don’t change anything here, I don’t think that’s required — it would save some calls.
<ArneBab_>would/might
<ArneBab_>(each thread might have its own cache)
<ArneBab_>I assume that acons should be thread safe, because the old memory location doen’t change, but I’m not sure
<wingo>optimizing sqrt can be hard because it can be difficult to prove that the operand is an inexact positive real
<wingo>or non-negative anyway
<wingo>but optimizing sqrt is definitely a thing that guile should do :) though it doesn't do so right now
<justin_smith>ArneBab_: the issue would be that if thread a tries to put (25 5) into the alist, and at the same time thread b is putting (36 6) - you could lose one of those. So I guess it's not incorrect, just less than optimal.
<galex-713>wingo: you mean the sqrt algorithm or the cache thing?
<galex-713>If cache, what would be useful would “just” to find some efficient way to find everything that caching may make more efficient, and cache it
<galex-713>Like any purely functional function that *may* be reused several times on the same things
<wingo>i mean that guile should optimize the sqrt function
<galex-713>ah, so you mean the algorithm
<galex-713>I thought a such cache would be useful since some informations that you typically get on run time can help you to optimize things (the kind of things that you normally “optimize” just complexifying your code on the base of what you know it will need to do)
<galex-713>Like, in the case of the repetitive simple algorithm that checks primes numbers, you often do modulos and sqrts: if a such cache system existed, divisions/remainder/divisibility/sqrt/etc. would be cached and maybe a such function would be *automatically* get transformed in erathostene sieve (since divisions would be cached), with the dev/user never having to know about it (just
<galex-713>about the definition of a prime number), and if that works for finding primes, maybe that would work for other things (dunno, 3D stuff, etc.)
<wingo>ah if you are doing integer sqrts, by all means implement a cache or do your own thing
<wingo>guile will probably only optimize sqrt on non-negative reals
<wingo>inexact reals
<justin_smith>with the numbers you use for 3d stuff, there would be a real danger of blowing up the cache big enough that the cache lookup would be slower than just doing the math
<galex-713>the idea is not to cache things that aren’t reused enough, so to make the cache small enough to stay efficient
<davexunit>yeah best to not memoize things that accept floats
<davexunit>or rather things that you pass floats to
<galex-713>yes
<galex-713>of course
<random-nick>justin_smith: correct me if I am wrong, don't hashtables work as fast no matter how many elements they have?
<justin_smith>random-nick: not on real hardware
<justin_smith>random-nick: there's a phenomenon where having a large enough / fragmented enough source of data can make operations thousands of times slower
<justin_smith>due to the many layers of CPU cache
<justin_smith>random-nick: a good term to google regarding this is "mechanical affinity"
<justin_smith>sorry, that's "mechanical sympathy"