Stubsack: weekly thread for sneers not worth an entire post, week ending 5th January 2025
https://awful.systems/post/3174449
=> More informations about this toot | More toots from BlueMonday1984@awful.systems
An interesting thing came through the arXiv-o-tube this evening: “The Illusion-Illusion: Vision Language Models See Illusions Where There are None”.
Illusions are entertaining, but they are also a useful diagnostic tool in cognitive science, philosophy, and neuroscience. A typical illusion shows a gap between how something “really is” and how something “appears to be”, and this gap helps us understand the mental processing that lead to how something appears to be. Illusions are also useful for investigating artificial systems, and much research has examined whether computational models of perceptions fall prey to the same illusions as people. Here, I invert the standard use of perceptual illusions to examine basic processing errors in current vision language models. I present these models with illusory-illusions, neighbors of common illusions that should not elicit processing errors. These include such things as perfectly reasonable ducks, crooked lines that truly are crooked, circles that seem to have different sizes because they are, in fact, of different sizes, and so on. I show that many current vision language systems mistakenly see these illusion-illusions as illusions. I suggest that such failures are part of broader failures already discussed in the literature.
=> More informations about this toot | More toots from blakestacey@awful.systems
It’s definitely linked in with the problem we have with LLMs where they detect the context surrounding a common puzzle rather than actually doing any logical analysis. In the image case I’d be very curious to see the control experiment where you ask “which of these two lines is bigger?” and then feed it a photograph of a dog rather than two lines of any length. I’m reminded of how it was (is?)easy to trick chatGPT into nonsensical solutions to any situation involving crossing a river because it pattern-matched to the chicken/fox/grain puzzle rather than considering the actual facts being presented.
Also now that I type it out I think there’s a framing issue with that entire illusion since the question presumes that one of the two is bigger. But that’s neither here nor there.
=> More informations about this toot | More toots from YourNetworkIsHaunted@awful.systems
I think there’s a framing issue with that entire illusion since the question presumes that one of the two is bigger
I disagree, or rather I think that’s actually a feature; “neither” is a perfectly reasonable answer to that question that a human being would give, and LLMs would be fucked by since they basically never go against the prompt.
=> More informations about this toot | More toots from V0ldek@awful.systems This content has been proxied by September (3851b).Proxy Information
text/gemini