this post was submitted on 07 Sep 2023
4 points (100.0% liked)

Data Engineering

377 readers
1 users here now

A community for discussion about data engineering

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 1 year ago
MODERATORS
 

cross-posted from: https://programming.dev/post/2656516

What are your real-world applications of this versatile data structure?

They are useful for optimization in databases like sqlite and query engines like apache spark. Application developers can use them as concise representations of user data for filtering previously seen items.

The linked site gives a short introduction to bloom filters along with some links to further reading:

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either definitely is not in the set or may be in the set.

you are viewing a single comment's thread
view the rest of the comments
[–] Reader9 2 points 1 year ago

I can see why this data structure might be abused and/or chosen for an inappropriate use-case since it seems to offer a lot of value for the tiny amount of space required.

if you need to know a key is definitely in that space. You still have to perform the lookup.

This is a good description. I think the name “filter” is appropriate for their best use cases, when you want to remove members of some other set if they are probably members of the bloom filter set, and can accept that you might remove some extras due to false positives.

Problems like that come up from time-to-time.