this post was submitted on 11 Feb 2024
42 points (100.0% liked)

Science

3191 readers
32 users here now

General discussions about "science" itself

Be sure to also check out these other Fediverse science communities:

https://lemmy.ml/c/science

https://beehaw.org/c/science

founded 2 years ago
MODERATORS
 

Almas Heshmati, a professor of economics at Jönköping University in Sweden, used Excel’s autofill function to mend the data for one of his studies.He had marked anywhere from two to four observations before or after the missing values and dragged the selected cells down or up, depending on the case. The program then filled in the blanks. If the new numbers turned negative, Heshmati replaced them with the last positive value Excel had spit out.

But Heshmati’s data also showed that in several instances where there were no observations to use for the autofill operation, the professor had taken the values from an adjacent country in the spreadsheet. New Zealand’s data had been copied from the Netherlands, for example, and the United States’ data from the United Kingdom.

Replacing missing observations with substitute values – an operation known in statistics as imputation – is a common but controversial technique in economics that allows certain types of analyses to be carried out on incomplete data. Researchers have established methods for the practice; each comes with its own drawbacks that affect how the results are interpreted.

There is no evidence that Excel’s autofill function is among these methods, especially not when applied in a haphazard way without clear justification.

top 5 comments
sorted by: hot top controversial new old
[–] [email protected] 11 points 9 months ago* (last edited 9 months ago) (1 children)

Just think of all the cases where the people are not faking stuff in such an obvious way. When they know to just add a bit of noise or not outright use the same picture but modify it here and there etc. Fuck it is so wide spread and we still do not value ~~copying~~ reproducing results nearly as much as new results.

[–] [email protected] 5 points 9 months ago

Read up on Alzheimer research, a case where a fake study determined direction of research for years.

[–] [email protected] 10 points 9 months ago (1 children)

Autofill is a bad way to interpolate data. If you're going to do it, you gotta have an idea of how to do it more realistically and obviously comment on the choice.

I can imagine him doing this without even noticing how much data he made up. When a spreadsheet is big enough that the filtered parameters take up more than a screen, you don't really notice if you autofill 100 or 1000 or 100000 lines. It's just "top to bottom" anyway.

[–] [email protected] 0 points 9 months ago

@bstix

This is one reason why I haven't been using Excel for years. I encourage everyone to use Python or R for analysing data.

[–] [email protected] 1 points 9 months ago

From the immortal Journal of Irreproducible Results, "The Data Enrichment Method": ". . .its principal shortcoming is that before the enrichment process can be started, some data must be collected. It is quite true that a great deal is done with very little information, but this should not blind one to the fact that the method still embodies the 'raw-data flaw'. The ultimate objective, complete freedom from the inconvenience and embarrassment of experimental results, still lies unattained before us."