Dubbed “Project Ellmann,” after biographer and literary critic Richard David Ellmann, the idea would be to use LLMs like Gemini to ingest search results, spot patterns in a user’s photos, create a chatbot, and “answer previously impossible questions,” according to a copy of a presentation viewed by CNBC. Ellmann’s aim, it states, is to be “Your Life Story Teller.”
It’s unclear if the company has plans to produce these capabilities within Google Photos, or any other product. Google Photos has more than one billion users and four trillion photos and videos, according to a company blog post.
Project Ellman is just one of many ways Google is proposing to create or improve its products with AI technology. On Wednesday, Google launched its latest “most capable” and advanced AI model yet, Gemini, which in some cases outperformed OpenAI’s GPT-4. The company is planning to license Gemini to a wide range of customers through Google Cloud for them to use in their own applications. One of Gemini’s standout features is that it’s multimodal, meaning it can process and understand information beyond text, including images, video and audio.
A product manager for Google Photos presented Project Ellman alongside Gemini teams at a recent internal summit, according to documents viewed by CNBC. They wrote that the teams spent the past few months determining that large language models are the ideal tech to make this bird’s-eye approach to one’s life story a reality.
Ellmann could pull in context using biographies, previous moments, and subsequent photos to describe a user’s photos more deeply than “just pixels with labels and metadata,” the presentation states. It proposes to be able to identify a series of moments like university years, Bay Area years, and years as a parent.
“We can’t answer tough questions or tell good stories without a bird’s-eye view of your life,” one description reads alongside a photo of a small boy playing with a dog in the dirt.
“We trawl through your photos, looking at their tags and locations to identify a meaningful moment,” a presentation slide reads. “When we step back and understand your life in its entirety, your overarching story becomes clear.”
The presentation said large language models could infer moments like a user’s child’s birth. “This LLM can use knowledge from higher in the tree to infer that this is Jack’s birth, and that he’s James and Gemma’s first and only child.”
“One of the reasons that an LLM is so powerful for this bird’s-eye approach, is that it’s able to take unstructured context from all different elevations across this tree, and use it to improve how it understands other regions of the tree,” a slide reads, alongside an illustration of a user’s various life “moments” and “chapters.”
Presenters gave another example of determining one user had recently been to a class reunion. “It’s exactly 10 years since he graduated and is full of faces not seen in 10 years so it’s probably a reunion,” the team inferred in its presentation.
The team also demonstrated “Ellmann Chat,” with the description: “Imagine opening ChatGPT but it already knows everything about your life. What would you ask it?”
It displayed a sample chat in which a user asks “Do I have a pet?” To which it answers that yes, the user has a dog which wore a red raincoat, then offered the dog’s name and the names of the two family members it’s most often seen with.
Another example for the chat was a user asking when their siblings last visited. Another asked it to list similar towns to where they live because they are thinking of moving. Ellmann offered answers to both.
Ellmann also presented a summary of the user’s eating habits, other slides showed. “You seem to enjoy Italian food. There are several photos of pasta dishes, as well as a photo of a pizza.” It also said that the user seemed to enjoy new food because one of their photos had a menu with a dish it didn’t recognize.
The technology also determined what products the user was considering purchasing, their interests, work, and travel plans based on the user’s screenshots, the presentation stated. It also suggested it would be able to know their favorite websites and apps, giving examples Google Docs, Reddit and Instagram.
A Google spokesperson told CNBC, “Google Photos has always used AI to help people search their photos and videos, and we’re excited about the potential of LLMs to unlock even more helpful experiences. This is a brainstorming concept a team is at the early stages of exploring. As always, we’ll take the time needed to ensure we do it responsibly, protecting users’ privacy as our top priority.”
Big Tech’s race to create AI-driven ‘Memories’
The proposed Project Ellmann could help Google in the arms race among tech giants to create more personalized life memories.
Google Photos and Apple Photos have for years served “memories” and generated albums based on trends in photos.
In November, Google announced that with the help of AI, Google Photos can now group together similar photos and organize screenshots into easy-to-find albums.
Apple announced in June that its latest software update will include the ability for its photo app to recognize people, dogs, and cats in their photos. It already sorts out faces and allows users to search for them by name.
Apple also announced an upcoming Journal App, which will use on-device AI to create personalized suggestions to prompt users to write passages that describe their memories and experiences based on recent photos, locations, music and workouts.
But Apple, Google and other tech giants are still grappling with the complexities of displaying and identifying images appropriately.
For instance, Apple and Google still avoid labeling gorillas after reports in 2015 found the company mislabeling Black people as gorillas. A New York Times investigation this year found Apple and Google’s Android software, which underpins most of the world’s smartphones, turned off the ability to visually search for primates for fear of labeling a person as an animal.
Companies including Google, Facebook and Apple have over time added controls to minimize unwanted memories, but users have reported they sometimes still surface unwanted memories and require the users to toggle through several settings in order to minimize them.