Ask Lemmy
A Fediverse community for open-ended, thought provoking questions
Please don't post about US Politics. If you need to do this, try [email protected]
Rules: (interactive)
1) Be nice and; have fun
Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them
2) All posts must end with a '?'
This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?
3) No spam
Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.
4) NSFW is okay, within reason
Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either [email protected] or [email protected].
NSFW comments should be restricted to posts tagged [NSFW].
5) This is not a support community.
It is not a place for 'how do I?', type questions.
If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email [email protected]. For other questions check our partnered communities list, or use the search function.
Reminder: The terms of service apply here too.
Partnered Communities:
Logo design credit goes to: tubbadu
view the rest of the comments
Short answer: to record a sound, take samples of the sound "really really often" and store them as a sequence of numbers. Then to play the sound, create an electrical signal by converting those digital numbers to a voltage "really really often", then smooth it, and send it to a speaker.
Slightly longer answer: you can actually take a class on this, typically called Digital Signal Processing, so I'm skipping over a lot of details. Like a lot a lot. Like hundreds of pages of dense mathematics a lot.
First, you need something to convert the sound (pressure variation) into an electrical signal. Basically, you want the electrical signal to look like how the audio sounds, but bigger and in units of voltage. You basically need a microphone.
So as humans, the range of pitches of sounds we can hear is limited. We typically classify sounds by frequency, or how often the sound wave "goes back and forth". We can think of only sine waves for simplicity because any wave can be broken up into sine waves of different frequencies and offsets. (This is not a trivial assertion, and there are some caveats. Honestly, this warrants its own class.)
So each sine wave has a frequency, i.e. how long many times per second the wave oscillates ("goes back and forth").
I can guarantee that you as a human cannot hear any pitch with a frequency higher than 20000 Hz. It's not important to memorize that number if you don't intend to do technical audio stuff, it's just important to know that number exists.
So if I recorded any information above that frequency, it would be a waste of storage. So let's cap the frequency that gets recorded at something. The listener literally cannot tell the difference.
Then, since we have a maximum frequency, it turns out that, once you do the math, you only need to sample at a frequency of exactly twice the maximum you expect to find. So for an audio track, 2 times 20000 Hz = 40000 times per second that we sample the sound. It is typically a bit higher for various technical reasons, hence why 44100 Hz and 48000 Hz sample frequencies are common.
So if you want to record exactly 69 seconds of audio, you need 69 seconds × 44100 [samples / second] = 3,042,900 samples. Assuming space is not a premium and you store the file with zero compression, each sample is stored as a number in your computer's memory. The samples need to be stored in order.
To reproduce the sound in the real world, we feed the numbers in the order at the same frequency (the sample frequency) that we recorded them at into a device that works as follows: for each number it receives, the device outputs a voltage that is proportional to the number it is fed, until the next number comes in. This is called a Digital-to-Analog Converter (DAC).
Now at this point you do have a sound, but it generally has wasteful high frequency content that can disrupt other devices. So it needs to get smoothed out with a filter. Send this voltage to your speakers (to convert it to pressure variations that vibrate your ears which converts the signal to an electrical signal that is sent to your brain) and you got sound.
Easy peazy, hundreds of pages of calculus squeezy!
Yes, but it is astronomically unlikely to happen before you or the monkeys die.
If you have any further questions about audio signal processing, I would be literally thrilled to answer them.
When you talk about a sample, what does that actually mean? Like I recognize that the frequency of oscillations will tell me the pitch of something, but how does that actually translate to a chunk of data that is useful?
You mention a sample being stored as a number, which makes sense, but how is that number utilized? Again assuming uncompressed, if my sample "value" comes up as 420, does that include all of the necessary components of that sound bite in a 1/44100th of a second? How would a sample at value 421 compare? Is this like a RGB type situation where you'd have multiple values corresponding to different attributes of the sample (amplitude, frequencies, and I'm sure other things)? Is a single sample actually intelligible in isolation?