this post was submitted on 17 May 2024
65 points (100.0% liked)

Data Structures and Algorithms

172 readers
1 users here now

A community dedicated to topics related to data structures and algorithms.

founded 8 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 12 points 6 months ago

In a recent paper, computer scientists have described a new way to approximate the number of distinct entries in a long list, a method that requires remembering only a small number of entries. The algorithm will work for any list where the items come in one at a time — think words in a speech, goods on a conveyor belt or cars on the interstate.

The CVM algorithm … is a significant step toward solving what’s called the distinct elements problem, which computer scientists have grappled with for more than 40 years. It asks for a way to efficiently monitor a stream of elements — the total number of which may exceed available memory — and then estimate the number of unique elements.

“The new algorithm is astonishingly simple and easy to implement,” said Andrew McGregor of the University of Massachusetts, Amherst. “I wouldn’t be surprised if this became the default way the distinct elements problem is approached in practice.”