this post was submitted on 19 Mar 2024
-9 points (20.0% liked)
C Sharp
1532 readers
2 users here now
A community about the C# programming language
Getting started
Useful resources
- C# documentation
- C# Language Reference
- C# Programming Guide
- C# Coding Conventions
- .NET Framework Reference Source Code
IDEs and code editors
- Visual Studio (Windows/Mac)
- Rider (Windows/Mac/Linux)
- Visual Studio Code (Windows/Mac/Linux)
Tools
Rules
- Rule 1: Follow Lemmy rules
- Rule 2: Be excellent to each other, no hostility towards users for any reason
- Rule 3: No spam of tools/companies/advertisements
Related communities
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
For 99% of use cases this string pool is just slower. Whether intentionally or not, the benchmark code is strange and misleading.
String and StringPool are only slower in the final benchmark because doing 100,000 allocations in a synchronous loop while retaining a reference to each one is the worst case scenario for a generational GC. It forcibly and artificially breaks the generational hypothesis.
Conversely, caching 100,000 samples of the same 16 strings (!!!) is the best possible case for the string pool. It spends zero time in GC because the benchmark code contains this very unrealistic pattern.
Most real code is going to quickly forget intermediate strings and clean them up very cheaply in the nursery generation. If you do need to sample 100,000 substrings in a synchronous loop, you can just use ReadOnlySpan.
There are real use-cases for string caches and tries, but they are pretty rare.
I think the focus of the article is in highlighting the allocation performance (which is the goal of the
StringPool
) vs. overall performance (i.e. speed) and so the benchmark, while being artificial, is designed to focus on that specific thing. This is actually pointed out in the article just before showing the benchmark results:I agree that an additional benchmark, showing it in a more real-world scenario could prove helpful, but the existing benchmark does a good job of highlighting the allocation reduction seen when processing large numbers of char data. A more real world example would be something like a file upload validation method which is first checking the file extension against a
HashSet<string>
of valid extensions. In that scenario we would be able to take the filename as aSpan
and extract the extension from it as a Span, but we cannot callHashSet.Contains()
with aSpan
, we have to use astring
. So that would require callingextensionSpan.ToString()
. In this scenario, we could use theStringPool
to avoid unnecessary string allocation (while the article does not use this particular example, it does mention other related scenarios).Overall, as you mention, the real use-cases for string caches (such as
StringPool
) are pretty rare, it is a niche topic, but for those who need to do something like that, I think the article helps to present an accessible introduction.Oh ok. Thanks for that extra info. I was wondering why this (apparent) performance tip was getting downvoted, but maybe that's it.