Ask a Techspert: How does AI understand my visual searches?

The Power of Visual Search: How AI Understands Your Visual Searches

Imagine you're browsing through social media and come across a beautifully styled outfit that catches your eye. You want to know where everything came from, but searching for each item individually can be time-consuming and frustrating. Until recently, visual search was a one-item-at-a-time process, but a major update to Google's Circle to Search and Lens has changed the game. Now, you can search multiple objects within a single image simultaneously, making it easier to find exactly what you want and learn more about the world around you.

Behind the Scenes: How Google's AI Understands Visual Searches

To better understand these breakthroughs, we talked to Search Senior Engineering Director Dounia Berrada, who focuses on multimodal search, aka Google Lens. Visual search is redefining how we interact with information, and Lens should be intelligent enough to understand the "why" behind your search, making it effortless to get help with what you see on your screen or in the world around you.

How Does It Work?

Imagine you're redesigning a room and upload a photo of a mid-century modern space for inspiration. You probably aren't just looking for the side table; you want to recreate the entire vibe. Previously, you'd have to search for the lamp, then the rug, then the chair individually. Now, AI Mode can break down that complex image, identify each individual piece, and issue multiple visual searches simultaneously. You can see this in action right now using Circle to Search.

The Advanced Gemini Models

Our advanced Gemini models make AI Mode possible, and its multimodal capabilities benefit from the visual expertise we've built into Lens over the years. When you search with an image, Gemini analyzes the image alongside your question to decide which tools to use. Let's say you're scrolling on your phone and see an outfit on social media that you love. When you search it, the model knows to use Lens to retrieve image results for the hat, shoes, and jacket of the outfit simultaneously. It then weaves those individual results into one easy-to-read response.

The Fan-Out Technique

AI Mode is basically doing a dozen searches for you in the time it takes to do one. If you upload a photo of a garden you admire, you might have several questions: Will these plants survive in the shade? Are they right for my climate? How much maintenance do they need? Before, you'd ask those one by one. Now, AI Mode identifies all those necessary "fan-out" searches. This way, it gathers care requirements for every plant in the photo using helpful web results, breaks down the info, and even suggests next steps you might want to take.

Practical Implications

The implications of this technology are vast and exciting. You could take a photo of a wall at a museum and ask for explanations of each painting. Or take a photo of a bakery window and ask what all the different pastries are. It's about moving from "What is this one thing?" to "Explain this entire scene to me." This technology has the potential to revolutionize the way we interact with information and make it more accessible and user-friendly.

Forward-Looking Thoughts

As we continue to push the boundaries of visual search and AI, we can expect to see even more innovative applications in the future. Imagine being able to take a photo of a piece of art and having the AI provide you with a detailed analysis of its history, significance, and cultural context. Or being able to take a photo of a product and having the AI provide you with a list of similar products, their prices, and where to buy them. The possibilities are endless, and we can't wait to see what the future holds.

Conclusion

The power of visual search is changing the way we interact with information, and Google's AI is at the forefront of this revolution. With the ability to search multiple objects within a single image simultaneously, we can expect to see even more innovative applications in the future. As we continue to push the boundaries of visual search and AI, we can expect to see a future where information is more accessible, user-friendly, and tailored to our individual needs.

Source: https://blog.google/company-news/inside-google/googlers/how-google-ai-visual-search-works/