Almas Heshmati, a professor of economics at Jönköping University in Sweden, used Excel’s autofill function to mend the data for one of his studies.He had marked anywhere from two to four observations before or after the missing values and dragged the selected cells down or up, depending on the case. The program then filled in the blanks. If the new numbers turned negative, Heshmati replaced them with the last positive value Excel had spit out.
But Heshmati’s data also showed that in several instances where there were no observations to use for the autofill operation, the professor had taken the values from an adjacent country in the spreadsheet. New Zealand’s data had been copied from the Netherlands, for example, and the United States’ data from the United Kingdom.
Replacing missing observations with substitute values – an operation known in statistics as imputation – is a common but controversial technique in economics that allows certain types of analyses to be carried out on incomplete data. Researchers have established methods for the practice; each comes with its own drawbacks that affect how the results are interpreted.
There is no evidence that Excel’s autofill function is among these methods, especially not when applied in a haphazard way without clear justification.
Autofill is a bad way to interpolate data. If you’re going to do it, you gotta have an idea of how to do it more realistically and obviously comment on the choice.
I can imagine him doing this without even noticing how much data he made up. When a spreadsheet is big enough that the filtered parameters take up more than a screen, you don’t really notice if you autofill 100 or 1000 or 100000 lines. It’s just “top to bottom” anyway.
This is one reason why I haven’t been using Excel for years. I encourage everyone to use Python or R for analysing data.
Just think of all the cases where the people are not faking stuff in such an obvious way. When they know to just add a bit of noise or not outright use the same picture but modify it here and there etc. Fuck it is so wide spread and we still do not value
copyingreproducing results nearly as much as new results.Read up on Alzheimer research, a case where a fake study determined direction of research for years.
From the immortal Journal of Irreproducible Results, “The Data Enrichment Method”: “. . .its principal shortcoming is that before the enrichment process can be started, some data must be collected. It is quite true that a great deal is done with very little information, but this should not blind one to the fact that the method still embodies the ‘raw-data flaw’. The ultimate objective, complete freedom from the inconvenience and embarrassment of experimental results, still lies unattained before us.”