Reading the article the issue was that there was some difference in the cells in the sample. This difference was not in the training data, and so the AI fell over at what appears to be a cellular anomaly it had never seen before. Because it’s looking for any deviations, to catch things a human would be totally oblivious to, it tried to analyze a pattern without training data that fits.
After the fact, the anomaly when analyzed by humans with broader experience and reasoning determined it to be a red herring correlated to race not cancer. The model has no access to racial/ethnicity data, it was simply influenced by a data point there was inadequate data on. For all its data, it could have been a novel consequence of that sample’s cancer that it, rightfully, should keep it from identifying it as something lacking this observed phenomenon. The article said it failed to identify the subtype of cancer, but didn’t say of it would, for example, declare it benign. If the result was “unknown cancer, human review required”, that would be a good given the data. If the outcome was “no worries, no dangerous cancer detected” that would be bad, and this article doesn’t clarify which case we are talking about.
As something akin to machine vision, it’s even “dumber” than LLM, the only way to fix this is by securing a large volume of labeled training data to change the statistical model to ignore those cellular differences. If it flagged unrecognized phenomenon as “safe”, that would more likely be a matter of traditional programming to assign a different value statement for low confidence output from the model.
So the problem was less that the model was “racist”, it is that the training data was missing a cellular phenomenon.
The non LLM AI are actually useful and relatively cheap on the Inference end. It’s the same stuff that has run for years on your phone to find faces while getting ready to take a picture. The bad news maybe be that LLM “acts” smarter (but it’s still fundamentally stupid in risky ways), but the good news is that with these “old fashioned” machine learning models are more straightforward and you know what you get or don’t get.
Thanks for the quick breakdown. I checked in here because the headline is sus AF. The phrase “race data” is suspect enough and reeks of shitty journalism. But the idea that a program for statistical analysis of images can “be racist” is so so dumb and anthropomorphizing. I’m the kind of person who thinks everything is racist, but this article is just more grifter bullshit.
I actually think this is kind of a case of overfitting. The AI is factoring in extra data to the analysis that isn’t the important variables.
This difference was not in the training data, and so the AI fell over at what appears to be a cellular anomaly it had never seen before.
The difference was in the training data, it was just less common.
This is like if someone who knows about swans, and knows about black birds, saw a black swan for the first time and said “I don’t know what bird that is”. It’s assuming the whiteness is important when it isn’t.
Reading the article the issue was that there was some difference in the cells in the sample. This difference was not in the training data, and so the AI fell over at what appears to be a cellular anomaly it had never seen before. Because it’s looking for any deviations, to catch things a human would be totally oblivious to, it tried to analyze a pattern without training data that fits.
After the fact, the anomaly when analyzed by humans with broader experience and reasoning determined it to be a red herring correlated to race not cancer. The model has no access to racial/ethnicity data, it was simply influenced by a data point there was inadequate data on. For all its data, it could have been a novel consequence of that sample’s cancer that it, rightfully, should keep it from identifying it as something lacking this observed phenomenon. The article said it failed to identify the subtype of cancer, but didn’t say of it would, for example, declare it benign. If the result was “unknown cancer, human review required”, that would be a good given the data. If the outcome was “no worries, no dangerous cancer detected” that would be bad, and this article doesn’t clarify which case we are talking about.
As something akin to machine vision, it’s even “dumber” than LLM, the only way to fix this is by securing a large volume of labeled training data to change the statistical model to ignore those cellular differences. If it flagged unrecognized phenomenon as “safe”, that would more likely be a matter of traditional programming to assign a different value statement for low confidence output from the model.
So the problem was less that the model was “racist”, it is that the training data was missing a cellular phenomenon.
The non LLM AI are actually useful and relatively cheap on the Inference end. It’s the same stuff that has run for years on your phone to find faces while getting ready to take a picture. The bad news maybe be that LLM “acts” smarter (but it’s still fundamentally stupid in risky ways), but the good news is that with these “old fashioned” machine learning models are more straightforward and you know what you get or don’t get.
Thanks for the quick breakdown. I checked in here because the headline is sus AF. The phrase “race data” is suspect enough and reeks of shitty journalism. But the idea that a program for statistical analysis of images can “be racist” is so so dumb and anthropomorphizing. I’m the kind of person who thinks everything is racist, but this article is just more grifter bullshit.
I actually think this is kind of a case of overfitting. The AI is factoring in extra data to the analysis that isn’t the important variables.
The difference was in the training data, it was just less common.
This is like if someone who knows about swans, and knows about black birds, saw a black swan for the first time and said “I don’t know what bird that is”. It’s assuming the whiteness is important when it isn’t.