this post was submitted on 07 Feb 2024
40 points (91.7% liked)

No Stupid Questions

35393 readers
10 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)


Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.



Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That's it.



Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.



Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



Rule 8- All comments should try to stay relevant to their parent content.



Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



Rule 10- Majority of bots aren't allowed to participate here.



Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 1 year ago
MODERATORS
 

Doesn’t using mode just make a lot more sense? You are much more likely to be the mode than you are likely to be ~~average~~ in the mean class

[Originally said average in the title, fixed thanks to jbrains]

all 34 comments
sorted by: hot top controversial new old
[–] [email protected] 48 points 9 months ago* (last edited 9 months ago) (1 children)

Mode is easily be screwed up by data distribution, it's used for almost nothing.

1, 2, 3, 4, 5, 45, 98, 100, 100

Mode = 100 Mean = 39.8 Median = 45

Mean is used because it handles most data-sets well, it just gets screwed up a lot when there are big outliers. That's when we use Median instead.

For example we use Median a lot for household incomes, there's a lot of data points, but there are some ridiculously wealthy people and some people that make absolutely nothing. Mode doesn't really work in this situation, because it would likely just be 0.

[–] [email protected] 6 points 9 months ago

1, 2, 3, 4, 5, 45, 98, 100, 100 => 100

1, 2, 3, 3, 5, 45, 98, 99, 100 => 3

1, 2.999, 3, 4, 5, 45, 98, 99.999, 100 => ?

It will only work on datasets where you have few very distincts values but also lots of values.

I guess we don't like it eather because shitty headlines would have a field day if people even knew "mode" might be about in statistics (or am I thinking too french?). Thank god for that naming BTW.

[–] [email protected] 27 points 9 months ago* (last edited 9 months ago) (4 children)

Because they mean very different things. Imagine you tallied the spending of 5 people in your restaurant:

10 15 30 100 150

First of all that distribution has no mode, so let's then check the next 2 customers.

10 15 20 30 100 150 150

Cool, now checked with 7 people this, and we can say the following.

The mode is to spend 150. Almost no one does this, but that is the mode regardless.

The median is 30, this tells you that half the people spend more than this, and half the people spend less than it. However it doesn't give you an accurate idea, because the people who spend less spend close to it, but the people who spend more spend way more. So if a guy spends 35 he would look like a high spender, but in fact he probably should be in the low spending category.

The average is 67.85, no one spent this amount, but this tells you that if a person spends more than that he's a high spender, so of someone came in and spent 35 you would know he's not one of your high spending customers.

Now let's see how each of those numbers is at predicting how much 7 customers would spend, let's look at the same values, where the customers spent 475. The mode tells you that people will spend 1050, that's absolutely wrong. The median tells you that they'll spend 210, that's also very wrong. The average however tells you that they'll spend 475 which is the exact number.

This is the same for every other statistics, even if it doesn't make any sense to say that people have an average of 2.3 kids, if you were planning on receiving 10 random families they would probably have 23 kids in total. Average is good at predicting large groups, and that's the information we usually care about when we're trying to express a large group in a single number. If you want a second number the obvious choice is the standard deviation, in the example above the standard deviation is 63.76 this gives you an idea on how accurate is your average at predicting, so in the case above not very accurate at all, but if we imagine that the number of kids above had a standard deviation of 0.2 you can be 68% certain that the 10 families will have between 21-25 kids, or 95% certain that they will have between 19-27 kids, or 99.7% certain that they will have between 17-29 kids. Working with the level of confidence in a prediction allows you to evaluate certainty at doing things. If you only knew that the median was 2 kids or that the mode was 1 kid you couldn't predict things with any accuracy.

[–] [email protected] 7 points 9 months ago* (last edited 9 months ago) (2 children)

Mean is average

Median is the middle value

[–] my_hat_stinks 2 points 9 months ago* (last edited 9 months ago)

They're all averages. Mean is the sum divided by the how many numbers there are.

[–] [email protected] 2 points 9 months ago

Oops, sorry, english is not my first language. You're correct, I'll edit my post.

[–] Tramort 4 points 9 months ago

Outstanding response

[–] derpgon 3 points 9 months ago (1 children)

Mode would probably work great for the # of kids statistic.

Just to add, mode works best for data sets with low amount of values (number of kids is usually 1-3). It completely breaks with high amount of distinct values (like $ spent).

[–] [email protected] 2 points 9 months ago

Yes, it works best for small integer numbers, but it doesn't provide any meaningful degree of confidence in the amount of kids, because 0,1,2,2,2,3,5 and 1,2,2,2,3,5,6 have the same mode but express very different groups.

[–] [email protected] 1 points 9 months ago (1 children)

This makes the most sense! I’ll add though that over a large data set, i still think mode still gives you a better idea about what you should expect, mean makes more sense if you are talking solely about stats and numbers, and want to make a decision based on a ‘trend’

[–] [email protected] 5 points 9 months ago

Not really, it depends on extremes, imagine you have 1001 couples, 400 have 0 kids, 201 have 1 kid, 100 have 2 kids, 100 have 3 kids, 50 have 4, 50 have 5, 30 have 6, 30 have 7, 20 have 8, 20 have 9. The mode is 0, the median is 1, the average is 1.88.

In this case you get two extremes, a lot of people with 0 kids, and people with lots of kids that move the average up.

[–] [email protected] 23 points 9 months ago* (last edited 9 months ago)

If you generate 1000 random numbers from 0 to 100 with a uniform distribution, the mean is going to be roughly 50, whereas the mode couid be anything. Which one is more representative?

[–] [email protected] 18 points 9 months ago* (last edited 9 months ago) (4 children)

Mode is a kind of average. I infer that you mean "mean" when you say "average" here.

The mean takes into account outliers in a way that the mode doesn't.

The joke about the average number of legs among humans being less than 2 describes a situation where mode provides more meaning than mean. In the case of scattered values, mode makes less sense, such as the average net worth of the people in a country.

I don't know why the mean is the "default" average. In many situations, the mode or median makes more sense.

[–] [email protected] 9 points 9 months ago (1 children)

Mean is the default "average" because it's easy to calculate and were taught it first in school. Like, years before median or mode. That's it, that's the whole reason.

[–] [email protected] 3 points 9 months ago (1 children)

That makes perfect sense to me. I should have known it was that boring.

[–] [email protected] 1 points 9 months ago (1 children)

Poor old Chebychev is not getting much love here is he.

I guess people might think he was a fan of inequality. (sorry - maths joke, at least I didn't do the one about the LOL numbers).

[–] [email protected] 1 points 9 months ago

I didn't get that far in probability and statistics. I know of his polynomials, but that was a long time ago.

[–] [email protected] 7 points 9 months ago

Mode also is only similar to median in a normal distribution. Anything else and mode and median are very different.

[–] [email protected] 5 points 9 months ago

such as the average net worth of the people in a country.

And the mean average makes no sense here either, which is why incomes and wealth are almost always quoted as medians instead of means, unless they're totals which is a roundabout way of reporting the mean (if you know the population baseline).

The more unequal a country, the larger the difference between mean and median incomes.

[–] [email protected] 3 points 9 months ago* (last edited 9 months ago) (2 children)

Sorry, i meant mean, but yeah mode definitely seems better to me, you are very unlikely to find a person/thing who/that has average qualities/stats

Edit: mean only seems good to create kind of a middle line of some sort

[–] [email protected] 5 points 9 months ago (1 children)

Now I understand better how you're thinking. Indeed, the notion of "what the average person has" is answered better by the median, but the notion of "What's most typical" is answered by the mode.

[–] [email protected] 3 points 9 months ago (1 children)

Actually I think that the notion ‘what the average person has’ is bettered answered by mode, I feel like mean is better for kind of plotting a data or points to find a ‘trend’ or something like that, I am hella confused now actually

[–] [email protected] 6 points 9 months ago* (last edited 9 months ago)

The mode can't hope to answer how much money the average person has, because there are far too many possible values.

The mean answers how much money people have on average, but the outliers exert too much influence to answer how much money the average person has.

The median moderates the influence of both the very rich and the very poor, so it better approximates the amount of money that those in the middle of the population have, which is what 'the average person" tends to be.

For populations where the number of possible values is much lower, the mode and the median tend to be closer to together. Emphasis on "tend".

[–] [email protected] 3 points 9 months ago

Well, the middle line is the median. 😉

[–] [email protected] 13 points 9 months ago

It all depends on how it is distributed, discretely or continuously, and mode tends to fail the most often which is why it's rarely used.

If you asked a random group of people what their annual salary was, you may get many answers of "zero" or something round like "100k/six figs,", which would become the mode which doesn't accurately represent a middle value.

If you ask a group of working people (to remove zero) to give their last year's post-tax salary to the nearest cent as close as possible, then there is a good possibility there is no mode, as everyone has a different exact salary, due to taking different benefits, hours, tax obligations and so on.

[–] [email protected] 9 points 9 months ago

It's not that we don't use mode, there are definitely times mode is used. It's just that mean (and median as well) contain a lot more useful information about distributions that we often care about. For a normal distribution mean, median, and mode should all be identical. So why do we use mean? Because mathematically, the mean is what underpins the formula for the normal distribution, not median or mode, and when you're talking about doing math with normal distributions mean is the thing to talk about (along with standard deviation).

We use median a lot too, you probably just don't hear it called median very often. The median is useful in non-normal distributions, and it defines the 50th percentile, so along with the 25%-ile and 75%-ile you've got your quartile distributions. We use these all the time to talk about grades in schools, or when we talk about home prices distributions in a given area, or salaries within a given field.

We use mode too, again just by a different name most of the time. Any time you've asked "what's the most common blank" you're basically asking for a mode. When we talk about "average" income in a country, we're usually actually talking about median or mode. Favorite animal? Answered as a mode.

You have to use the right statistical tool for your question: unfortunately English doesn't do a good job of conveying this without math jargon.

[–] [email protected] 5 points 9 months ago (1 children)

When you have a qualitative measure on an axis mode can get really weird. Let's say we have a dance class with 3 13 year olds, 4 14 yos, 5 15 yos, 4 16 yos, 5 17 yos, 3 18 yos, and a group of six retirees that signed up as a lark for someone's birthday and are all 72 years old.

That dance class is mostly composed of teenagers but the mode is 72 year olds. I think mode is a terrible measure but it's very close to what we'd like to phrase as "mostly" the issue is that it's very strict about bucket boundaries while we humans like to group things arbitrarily.

[–] [email protected] 3 points 9 months ago* (last edited 9 months ago)

This sample also shows why the median is often more representative than the mean.

[–] [email protected] 2 points 9 months ago

In addition to the other answers here, the mode doesn't work with continuous variables.

[–] [email protected] 1 points 9 months ago

I came here to say that I highly enjoyed your title, OP.

titlegold +10

[–] [email protected] -2 points 9 months ago* (last edited 9 months ago)

Because we can't have nice things.

If you downvote this post, you are a humorless pedant