Almas Heshmati, a professor of economics at Jönköping University in Sweden, used Excel’s autofill function to mend the data for one of his studies.He had marked anywhere from two to four observations before or after the missing values and dragged the selected cells down or up, depending on the case. The program then filled in the blanks. If the new numbers turned negative, Heshmati replaced them with the last positive value Excel had spit out.

But Heshmati’s data also showed that in several instances where there were no observations to use for the autofill operation, the professor had taken the values from an adjacent country in the spreadsheet. New Zealand’s data had been copied from the Netherlands, for example, and the United States’ data from the United Kingdom.

Replacing missing observations with substitute values – an operation known in statistics as imputation – is a common but controversial technique in economics that allows certain types of analyses to be carried out on incomplete data. Researchers have established methods for the practice; each comes with its own drawbacks that affect how the results are interpreted.

There is no evidence that Excel’s autofill function is among these methods, especially not when applied in a haphazard way without clear justification.

  • bstix@feddit.dk
    link
    fedilink
    English
    arrow-up
    10
    ·
    11 months ago

    Autofill is a bad way to interpolate data. If you’re going to do it, you gotta have an idea of how to do it more realistically and obviously comment on the choice.

    I can imagine him doing this without even noticing how much data he made up. When a spreadsheet is big enough that the filtered parameters take up more than a screen, you don’t really notice if you autofill 100 or 1000 or 100000 lines. It’s just “top to bottom” anyway.

    • 0x815@feddit.deOP
      link
      fedilink
      arrow-up
      2
      arrow-down
      2
      ·
      11 months ago

      @bstix

      This is one reason why I haven’t been using Excel for years. I encourage everyone to use Python or R for analysing data.