"(A)nd (I)n the end..."

this ecommerce life@lemmy.world · edit-2 7 days ago

"(A)nd (I)n the end..."

queermunist she/her@lemmy.ml · 7 days ago

An LLM might just lie and say that the link is malicious, or not malicious, and you’d never know. That’s kind of a problem.

sylver_dragon@lemmy.world · 7 days ago

Actually, that’s the start of a solution.

I’ve personally implemented something similar to this in the past. At one site we had an issue with people browsing porn on their office PCs. Some folks got pretty creative in getting around the blocks we had in place. However, we had full packet capture at the firewall; so, all of the evidence was there. I setup a system which pulled images above a certain size out of those packet captures and passed them through an open source image classifier which used a model based on machine learning. Anything above a certain threshold was flagged for human review, everything else was ignored. It wasn’t perfect, I looked as quite a few images of sand dunes, but it did 90% of the work. And sure, some false negatives likely got through. But, it let us run down the worst offenders.

Right now, Google seems to be ignoring the problem and has no incentive to do anything about it. Google is directly profiting from those malvertising links and so should bear some responsibility for ensuring that they are not serving malware to users. We can certainly work out the fine details around their duty of care and how they can meet it (e.g. LLM scanning with human review), but holding our collective dicks with both hands and claiming “nothing can be done” because it would cost Google money is a bad answer.

queermunist she/her@lemmy.ml · edit-2 6 days ago

flagged for human review,

And there we go. Google processes over 5.9 trillion searches per year, if even .01% of those were flagged for human review the cost burden would be so huge the system would collapse.

A small-scale internal solution for a single office does not scale to the entire internet.

sylver_dragon@lemmy.world · 6 days ago

Google processes over 5.9 trillion searches per year

That number has nothing to do with the problem. They don’t need to review every search, they need to review every advertising link they have been paid to place (not every link indexed). Presumably, they already have the infrastructure in place to track those links and verify that they comply with laws such as CSAM, copyright or other areas where they actually have some accountability in those areas. The number of paid advertisement links will be far smaller than that 5.9 trillion number.

queermunist she/her@lemmy.ml · 5 days ago

So they need to review every website? That’s not as daunting, there’s only 1.1 billion websites with only about 17% (roughly 193 million) being actively maintained and updated. Compared to the number of searches it’s certainly much smaller, but that’s still a huge dataset that has to be reviewed.

Face it, this is not a simple thing that can just be solved by throwing AI at it. The only way search could exist in this environment is if it was subscription based or a public utility.

For the record, I favor search being a public utility. Nationalize Google.

sylver_dragon@lemmy.world · 5 days ago

So they need to review every website?

I’m going to assume you’re just trolling now. I refuse to believe that someone can be this stupid, without actually doing it intentionally. Well done, you got me for a few comments. But, I’m done feeding the troll.

queermunist she/her@lemmy.ml · edit-2 5 days ago

You have to review every website that the search engine can access or else you can’t actually stop the problem. Literally how else could you do it? A chatbot can’t reliably flag everything on its own, humans are going to have to actually look for false-positives and false-negatives, and with a billion websites that’s a lot of labor.

Also thanks for taking sledge hammer to my self-esteem for literally no fucking reason, as I expect from reddit.world