IRC channel logs


back to list of logs

***Server sets mode: +nt
<damo22>youpi: is it a common algorithm to deduplicate objects using an unordered map?
<youpi>I'm not sure what you mean by "unordered map"
<damo22>i have a collection of items and a way to iterate in the same order every time
<damo22>so i implemented a deduplication algorithm by putting them all into a data structure like a python dictionary
<damo22>where the key is the same for the identical items
<youpi>ok, so you don't care about the order, but it does index
<youpi>then yes it's a common algorithm, when the dictionary structure is efficient
<damo22>so i packed a chromosome, start and stop position into a uint64_t and used that as the key
<damo22>so i can deduplicate genetic sequence fragments that have the same position
<damo22>and length
<damo22>but to read out the fragments again that i wanted to keep, so i save memory, i dont write the whole fragment into ram, just the index
<damo22>then i do a second pass to read the correct fragments and write out the results
<damo22>it runs pretty fast on my data