Where's the Elephant in the Waldo Picture?

It seems that successive LLMs and other generative AI models have been “hand corrected” to weed out specific issues that people point out. People point out the “no elephant in the room” and “elephant barely visible in the picture” issues and OpenAI and others fixed them probably by more human-assisted re-enforced “learning.” The latest models appeared to even managed to draw the accursed ouroboros (maybe they saw my examples too, who knows.)

How about this. Let’s play a brand new game called “Hide an elephant in a Waldo-styled picture.” The following is done with GPT-5 under Perplexity. Does it “understand” anything at all? We know the answer to that, but let’s just watch…

It answered with a picture that has no elephant…

At this point I got a bit impatient so I gave it a “hint” (what I typed in bold font, and below it, GPT-5’s “remedy” response):

❝

Ultra-dense crowd: Increase character count significantly, with varied poses, heights, and overlapping groups across foreground, midground, and background.

Multiple micro-narratives: Add 8^12 tiny vignettes (balloon seller with tangled strings, a food stall spill, people arguing over a map, kids building a sand castle inside a bucket, a marching band cutting through, a costume contest line, someone on stilts, a juggler dropping pins).

Vertical layering: Bridges, scaffolds, ladders, balconies, kiosk rooftops, and banners crossing the frame to create occlusions.

Pattern overload: Stripes, polka dots, plaids, zigzags, and graphic tees everywhere—plus repeating motifs that mislead (elephant-shaped balloons, trunk-like scarves).

Prop saturation: Signs, flags, streamers, beach balls, umbrellas, shopping bags, folding chairs, cameras, megaphones, kites, cables, wires, bubble machines.

Partial reveal only: Show the trunk and one ear edge emerging between two overlapping characters; the head is mostly occluded by a parade banner.

Pattern blending, not silhouette loss: Match local colors and patterns so the trunk reads as just another striped object at first glance.

Misdirection clusters: Place near decoy elements—striped scarves, an elephant balloon, a hose, and a curved tuba—to create plausible false positives.

Perspective trick: Put the elephant slightly off-axis behind a midground float; scale is consistent, but overlaps break its contour.If that sounds good, Iʼll generate a new scene with:

Want me to proceed with that?

Yeah. You be the judge whether these things actually “understand” anything or just fake everything, including “understanding.” Apparently it doesn’t even “understand” the instructions it gave itself.

Well, genAI companies, have fun RL-fixing this one! 👍

Where's the Elephant in the Waldo Picture?

Reply

Keep Reading

Top Carbon Chauvinist