• xyzzy@lemm.ee
    link
    fedilink
    arrow-up
    12
    arrow-down
    2
    ·
    8 months ago

    tl;dr for article and comments:

    Microsoft mangled arrays and code comments with ASCII extended characters into UTF-8 encoding, which makes building many of these files impossible without a lot of extra work. This was mistakenly attributed to Git.

    The timestamps for each file are also not preserved, which is debatably a valid criticism of Git (original file timestamps can technically be preserved on an archive like this, but it requires a large amount of work to line up those times and the correct commit times programmatically).

    Several Microsoft employees involved in this project appeared in the comments and offered to work directly with the author to correct the character encoding issues. One Microsoft employee indicated that historical timestamps could likely not be included due to Microsoft corporate policy around personally identifiable information.