this post was submitted on 28 May 2024
75 points (83.8% liked)
Technology
60113 readers
2560 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The implication of a 200 to 1 algorithm would be that the data they're collecting is almost entirely noise. Specifically that 99.5% of all the data is noise. In theory if they had sufficient processing in the implant they could filter the data down before transmission thus reducing the bandwidth usage by 99.5%. It seems like it would be fairly trivial to prove that any such 200 to 1 compression algorithm would be indistinguishable in function from a noise filter on the raw data.
It's not quite the same situation, but this should show some of the issues with this: https://matt.might.net/articles/why-infinite-or-guaranteed-file-compression-is-impossible/
Absolutely, they need a better filter and on-board processing. It is like they are just gathering and transmitting for external processing instead of cherry picking the data matching an action that is previously trained and sending it as an output.
I'm guessing they kept the processing power low because of heat or power availability, they wanted to have that quiet "sleek" puck instead of a brick with a fanned heatsink. Maybe they should consider a jaunty hat to hide the hardware.
Gathering all the data available has future utility, but their data transmission bottleneck makes that capability to gather data worthless. They are trying to leap way too far ahead with too high of a vanity prioritization and getting bit for it, about par for the course with an Elon project.
There is a way they could make the majority of it noise - if they reduced their expectations to only picking up a single type of signal, like thinking of pressing a red button, and tossing anything that doesn't roughly match that signal. But then they wouldn't have their super fancy futuristic human-robot mind meld dream, or dream of introducing a dystopian nightmare where the government can read your thoughts...
The problem isn't "making the majority of it noise",
the problem is tossing-out the actual-noise, & compressing only the signal.
Without knowing what the actual-signal is, & just trying to send all-the-noise-and-signal, they're creating their problem, requiring 200x compression, through wrongly-framing the question.
What they need to actually do, is to get a chip in before transmitting, which does the simplification/filtering.
That is the right problem.
That requires some immense understanding of the signal+noise that they're trying to work on, though, and it may require much more processing-power than they're committed to permitting on that side of the link.
shrug
Universe can't care about one's feelings: making-believing that reality is other than it actually-is may, with politial-stampeding, dent reality some, temporarily, but correction is implacable.
In this case, there's nothing they can do to escape the facts.
EITHER they eradicate enough of the noise before transmission,
XOR they transmit the noise, & hit an impossible compression problem.
Tough cookies.
_ /\ _
NAND - one of the 2 you listed, or they give up.
I'm not sure that's accurate.
Take video for example. Using different algorithms you can get a video down half the file size of the original. But with another algorithm you can get it down to 1/4 another can get it down to 1/10. If appropriate quality settings are used, the highly compressed video can look just as good as the original. The algorithm isn't getting rid of noise, it's finding better ways to express the data. Generally the fancier the algorithm, the more tricks it's using, the smaller you can get the data, but it's also usually harder to unpack.
It's important to distinguish between lossy and lossless algorithms. What was specifically requested in this case is a lossless algorithm which means that you must be able to perfectly reassemble the original input given only the compressed output. It must be an exact match, not a close match, but absolutely identical.
Lossless algorithms rely generally on two tricks. The first is removing common data. If for instance some format always includes some set of bytes in the same location you can remove them from the compressed data and rely on the decompression algorithm to know it needs to reinsert them. From a signal theory perspective those bytes represent noise as they don't convey meaningful data (they're not signal in other words).
The second trick is substituting shorter sequences for common longer ones. For instance if you can identify many long sequences of data that occur in multiple places you can create a lookup index and replace each of those long sequences with the shorter index key. The catch is that you obviously can't do this with every possible sequence of bytes unless the data is highly regular and you can use a standardized index that doesn't need to be included in the compressed data. Depending on how poorly you do in selecting the sequences to add to your index, or how unpredictable the data to be compressed is you can even end up taking up more space than the original once you account for the extra storage of the index.
From a theory perspective everything is classified as either signal or noise. Signal has meaning and is highly resistant to compression. Noise does not convey meaning and is typically easy to compress (because you can often just throw it away, either because you can recreate it from nothing as in the case of boilerplate byte sequences, or because it's redundant data that can be reconstructed from compressed signal).
Take for instance a worst case scenario for compression, a long sequence of random uniformly distributed bytes (perhaps as a one time pad). There's no boilerplate to remove, and no redundant data to remove, there is in effect no noise in the data only signal. Your only options for compression would be to construct a lookup index, but if the data is highly uniform it's likely there are no long sequences of repeated bytes. It's highly likely that you can create no index that would save any significant amount of space. This is in effect nearly impossible to compress.
Modern compression relies on the fact that most data formats are in fact highly predictable with lots of trimmable noise by way of redundant boilerplate, and common often repeated sequences, or in the case of lossy encodings even signal that can be discarded in favor of approximations that are largely indistinguishable from the original.
Ugh? That's not what it means at all. Compression saves on redundant data, but it doesn't mean that data is noise. Or are you using some definition of noise I'm not aware of?
I can try to explain, but there are people who know much more about this stuff than I do, so hopefully someone more knowledgeable steps in to check my work.
What does ‘random’ or ‘noise’ mean? In this context, random means that any given bit of information is equally as likely to be a 1 or a 0. Noise means a collection of information that is either random or unimportant/non-useful.
So, you say “Compression saves on redundant data”. Well, if we think that through, and consider the definitions I’ve given above, we will reason that ‘random noise’ either doesn’t have redundant information (due to the randomness), or that much of the information is not useful (due to its characteristic as noise).
I think that’s what the person is describing. Does that help?
I agree with your point, but you're arguing that noise can be redundant data. I am arguing that redundant data is not necessarily noise.
In other words, a signal can never be filtered losslessly. You can slap a low pass filter in front of the signal and call it a day, but there's loss, and if lossless is a hard requirement then there's absolutely nothing you can do but work on compressing redundant data through e.g. patterns, interpolation, what have you (I don't know much about compression algos).
A perfectly noise free signal is arguably easier to compress actually as the signal is more predictable.