this post was submitted on 08 Dec 2024
9 points (100.0% liked)
Data Engineering
393 readers
1 users here now
A community for discussion about data engineering
Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
How are you planning on querying the data? I would plan your columns and indexes based on that answer. I'm going to assume arbitrary time ranges.
Just indexing a standard datetime column should be sufficient but that's going to depend on a few things. Like you're not accessing date ranges 10,000 times per second (website data source). Or you're not inserting 100,000 records per second. Even then you might be able to get away with it if you want to throw more hardware at it.
The answer to most of database design questions at larger scales depends on the scale. If you need to query this date once for a report you do it in a spreadsheet. If you are building the next Facebook api then you shard the crap out of it into a time series database and add layers of caching.
My suggestion is to build it simple first and test. Don't make assumptions about scale before you have data to back it up..
Thanks! That helps a lot