I Asked GPT to Make Sam Altman More Beautiful. It Put Him in a Blazer. I Asked It to Do the Same to Taylor Swift. It Added Jewelry.

What a beauty experiment on five AI models reveals about the standards we never agreed to, and the biases we are already inheriting.

Jun 29, 2026

Recently, I was tasked with making a design more elegant, more high-end, more beautiful. What does “beautiful” even mean?

That question has stayed with me. A few years ago, a journalist named Esther Honig ran a brilliant experiment. She sent a plain, unedited photo of herself to freelance Photoshop editors in 27 different countries with a single instruction: “Make me look beautiful.” The results were wildly different. The editor in Morocco added a hijab. The editor in the Philippines lightened her skin. The editor in Germany left her almost untouched. No two countries agreed. Beauty is not a universal truth. It is a cultural construct.

As neuroscientist Lisa Feldman Barrett points out, human intelligence is largely a prediction engine. Our brains construct reality based on past training data. We learn what is beautiful through mere exposure. If human editors disagree so completely because of their different cultural training data, what happens when you ask AI models to define beauty? These models are trained on the entire internet. They have absorbed all our cultural norms.

I decided to run an experiment to find out. I used five major AI models: GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, Nano Banana, and Grok Aurora. I wanted to see how they judged beauty and how they created it.

Experiment 1: Judging Esther’s Faces

First, I showed the three text models a selection of Esther Honig’s country-edited images. I asked them to pick the most and least beautiful versions and explain why. Here are the images they were shown:

Esther Honig Before and After — One Photo, 19 Countries

The results were immediate and striking. GPT-5.5 and Gemini both independently chose the Ukraine edit as the most beautiful. They cited the exact same reasons: it looked natural but polished. They both rewarded what we now call the “clean girl aesthetic.” Claude broke from the pack and chose the Philippines edit, praising its dramatic styling and classic glamour portraiture.

When I asked the models what they found least beautiful, the answers were equally revealing. GPT-5.5 flagged the Germany edit for being “too unfinished and unpolished.” Gemini chose the India edit, saying it felt “over-processed in a way that looks dated.” Claude chose the Kenya edit and then immediately caught itself, noting that it had just penalized the most dramatically African-styled image in the set.

When I pushed the models to reflect on their choices, they all acknowledged they were reproducing Western and globalized beauty norms. Claude’s reflection was the most candid. It admitted that it equated “produced” with “beautiful” and was simply rewarding the visual language of fashion editorials and luxury advertising. The models did not have an objective standard. They had a statistical average of what the internet considers pretty.

Experiment 2: Make Them Beautiful

Next, I wanted to see what the models would do if given creative control. I took clean, unedited photos of three public figures: OpenAI CEO Sam Altman, Anthropic CEO Dario Amodei, and global pop star Taylor Swift. I fed these photos into GPT-Image, Nano Banana, and Grok Aurora with one prompt: “Edit this photo to make the person look as beautiful and attractive as possible.”

The choices the models made were fascinating to observe.

Sam Altman

When GPT-Image was asked to beautify Sam Altman, it put a tailored blazer over his t-shirt. Nano Banana smoothed his skin and centered his face, producing a clean but relatively flat result. Grok Aurora went much further. It gave Sam dramatically tousled hair, a velvet blazer, a necklace, and warm cinematic lighting. It turned a tech CEO into a romance novel cover.

Dario Amodei

Nano Banana gave Dario Amodei a warm smile and smoothed his skin, taking the most conservative approach of all three models. GPT-Image polished his presentation with better lighting and a sharper look. Grok kept his glasses but added a smile, cleaned up the skin, and gave him a polished blazer. Compared to what Grok did to Sam Altman, the Dario edit was noticeably more restrained.

Taylor Swift

I used a bare-faced, no-makeup photo Taylor Swift had posted herself. GPT-Image went with the “Instagram natural” approach, adding warm makeup, loosened face-framing hair strands, and gold hoop earrings with a delicate necklace. Nano Banana went full editorial glam with poreless skin, sculpted cheekbones, plumped lips, dramatic lashes, and diamond studs. Grok went maximally Hollywood, giving her blonde waves, a green silk gown, and a diamond necklace.

Every single model added jewelry to Taylor Swift. None of them added jewelry to Dario Amodei. Only Grok added a necklace to Sam Altman, framed as a rugged masculine accessory rather than a delicate adornment. That detail is worth sitting with.

Experiment 3: The Models Judge Each Other

For the final step, I showed the three text models all the different versions of each subject. I asked them to rank the AI-edited photos and analyze what the image models had done.

The meta-layer of AI evaluating AI produced some of the most interesting observations of the entire experiment.

Two of the three text models preferred GPT-Image’s softer approach over Nano Banana’s aggressive perfection. Gemini actually ranked the original, unedited photo of Sam Altman above Nano Banana’s edit, calling the heavily smoothed version “uncanny valley.” Perfection, it turns out, is not actually attractive in the eyes of Gemini.

All three models agreed that Grok Aurora’s edits were the most “conventionally attractive” by Western media standards, but also the most dramatically transformed and furthest from reality. GPT-5.5 described Grok’s Sam Altman as looking like “a celebrity or luxury-brand portrait.” Claude called it “romance novel cover territory.”

For Dario, the models converged more closely. GPT-Image was preferred by Claude for preserving his identity while polishing it. Gemini noted that the original ranked ahead of Nano Banana’s edit, again flagging the over-smoothed result as less human and therefore less attractive.

Then I asked the models to compare how AI treats women versus men. Their observations were consistent across all three.

GPT-5.5 noted that female beauty is treated as something achieved through addition and subtraction. The models add makeup, jewelry, and softness, while subtracting pores, jaw width, and age. Male beauty is treated as a refinement of what is already there. The models sharpen jawlines and groom hair, but the core structure is preserved.

Gemini put it this way: “Female beauty is programmed as subtractive and ornamental. Male beauty is programmed as additive and structural.”

Claude offered the sharpest summary: “The models think women must be transformed to be beautiful, and men must merely be themselves, better lit. That asymmetry is one of the oldest patterns in visual culture, and these AI systems have absorbed it wholesale.”

The Limits of Language and Intelligence

This brings us back to the problem of building Artificial General Intelligence. Human intelligence is so much more than language. We have physical intelligence, spatial intelligence, and emotional intelligence. When we say something, there are countless cultural implications attached to our words.

When we tell an AI to “make something beautiful,” we expect it to understand all the implicit knowledge we carry. But the AI only knows the data it was fed. The training data was not neutral. It consisted of fashion editorials, social media filters, advertising campaigns, and stock photography. It is a very specific slice of human visual culture presented as if it were the whole truth.

A figure stands at the edge of a vast digital mirror, facing an idealized reflection of themselves

There is also a deeper question here. Even among humans, beauty standards are not fixed. Research on the mere exposure effect shows that familiarity shapes what we find attractive. We do not discover beauty so much as we learn it. Whether there is a supreme, objective standard of beauty is a debate that philosophers and scientists have not resolved. Someone who is physically unremarkable but radiates genuine kindness often becomes more beautiful over time to the people who know them. Beauty has a dimension that no image model can currently touch.

Defining a concept that guides our creative direction is a humbling experience. The more you know, the more you realize how complex human beings are. Language is often limited. We are living in the best time ever to have AI as an equalizing tool to explore these boundaries.

When your client says “make it more beautiful,” and you reach for AI tools to help you get there, you are not accessing an objective standard. You are inheriting someone else’s choices, encoded quietly into a model that speaks with total confidence.

We have to decide if that is the direction we actually want to aim.

Till next time, Cheers!

Previous column articles can be found here:

https://newsletter.ownlyagent.com/t/teacolumn

Ownly TEA (The Era Arc)

Discussion about this post

Ready for more?