Data Structures and Algorithms

172 readers

1 users here now

A community dedicated to topics related to data structures and algorithms.

founded 8 months ago

MODERATORS

lysdexic

B-Trees: More Than I Thought I'd Want to Know (benjamincongdon.me)

submitted 2 months ago by lysdexic to c/data_structures

5 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] arendjr 4 points 2 months ago* (last edited 2 months ago) (4 children)

Apart from all the interesting performance characteristics and their use in databases, the reason I tend to recommend B-Tree maps over hash maps for ordinary programming is consistent iteration order. It is simply too easy to run into a situation where you think iteration order doesn’t matter, but then it turns out it does in some subtle unforeseen way.

Of course it’s the way of our trade that unforeseen things cause bugs. But if there’s one kind of bug that is particularly annoying, it’s the hard-to-reproduce ones: those introduced by timing issues or (semi-)randomness. The moment you start iterating over a hash map you risk falling prey to the second one. So I’ll just prefer to default to a B-Tree map or set instead.

[–] lysdexic 3 points 2 months ago (3 children)

the reason I tend to recommend B-Tree maps over hash maps for ordinary programming is consistent iteration order.

Hash maps tend to be used to take advantage of constant time lookup and insertion, not iterations. Hash maps aren't really suites for that usecase.

Programming languages tend to provide two standard dictionary containers: a hash map implementation suited for lookups and insertions, and a tree-based hash map that supports sorting elements by key.

[–] arendjr 3 points 2 months ago (2 children)

Oh, I agree, they both have their use cases. But that doesn’t mean there’s not plenty of situations where the performance is effectively irrelevant, but where people tend to default to using a hash map because they heard it’s faster (probably because lookups are O(1) indeed). So that’s where I would say, as long as performance doesn’t matter it’s better to default to B-Tree maps than to hash maps, because the chance of avoiding bugs is more valuable than immeasurable performance benefits (not to mention that for smaller data sets B-Tree maps can often outperform hash maps due to better cache locality, but again that’s hardly relevant since the data set is small anyway).

[–] lysdexic 1 points 2 months ago (1 children)

So that’s where I would say, as long as performance doesn’t matter it’s better to default to B-Tree maps than to hash maps, because the chance of avoiding bugs is more valuable than immeasurable performance benefits (...)

I don't quite follow. What leads you to believe that a B-Tree map implementation would have a lower chance of having a bug when you can simply pick any standard and readily available hash map implementation?

Also, you fail to provide any concrete reasoning for b-tree maps. It's not performance on any of the dictionary operationd, and bugs ain't it as well. What's the selling point that you are seeing?

[–] arendjr 2 points 2 months ago

I mentioned it in the first comment:

the reason I tend to recommend B-Tree maps over hash maps for ordinary programming is consistent iteration order. It is simply too easy to run into a situation where you think iteration order doesn’t matter, but then it turns out it does in some subtle unforeseen way.

I’m not talking about bugs in the implementation of the map itself, I’m talking about unforeseen consequences in the user’s code since they may not anticipate properly for the randomness in iteration.