Soon Your Google Searches Can Combine Text and Images
In May, Google executives unveiled experimental new artificial intelligence trained with text and images they said would make internet searches more intuitive. Wednesday, Google offered a glimpse into how the tech will change the way people search the web.
Starting next year, the Multitask Unified Model, or MUM, will enable Google users to combine text and image searches using Lens, a smartphone app that’s also incorporated into Google search and other products. So you could, for example, take a picture of a shirt with Lens, then search for “socks with this pattern.” Searching “how to fix” on an image of a bike part will surface instructional videos or blog posts.
Google will incorporate MUM into search results to suggest additional avenues for users to explore. If you ask Google how to paint, for example, MUM can detail step-by-step instructions, style tutorials, or how to use homemade materials. Google also plans in coming weeks to bring MUM to YouTube videos in search, where the AI will surface search suggestions below videos based on video transcripts.
MUM is trained to make inferences about text and imagery. Integrating MUM into Google search results also represents a continued march toward the use of language models that rely on vast amounts of text scraped from the web and a kind of neural network architecture called Transformer. One of the first such efforts came in 2019, when Google injected a language model named BERT into search results to change web rankings and summarize the text below results.
Google vice president Pandu Nayak said BERT represented the greatest change to search results in the better part of a decade but that MUM takes the language understanding AI applied to Google search results to the next level.
For example, MUM uses data from 75 languages instead of English alone, and it is trained on imagery and text instead of text alone. It’s 1,000 times larger than BERT when measured in the number of parameters or connections between artificial neurons in a deep learning system.
While Nayak calls MUM a major milestone in language understanding, he also acknowledges that large language models come with known challenges and risks.
BERT and other Transformer-based models have been shown to absorb bias found in the data used to train them. In some instances, researchers have found that the larger the language model, the worse the amplification of bias and toxic text. People working to detect and change the racist, sexist, and otherwise problematic output of large language models say scrutinizing text used to train these models is critical to reducing harm and that the way that data is filtered can have a negative impact. In April, the Allen Institute for AI reported that block lists used in a popular data set Google used to train its T5 language model can lead to the exclusion of entire groups, like people who identify as queer, making it difficult for language models to understand text by or about those groups.
In the past year, multiple AI researchers at Google, including former Ethical AI team coleads Timnit Gebru and Margaret Mitchell, have said they faced opposition from executives to their work showing that large language models can harm people. Among Google employees, the ousting of Gebru following a dispute over a paper critical of the environmental and social costs of large language models led to allegations of racism, calls for unionization, and the need for stronger whistleblower protections for AI ethics researchers.
In June, five US senators cited multiple incidents of algorithmic bias at Alphabet and the ousting of Gebru among reasons to question whether Google products like search or Google’s workplace are safe for Black people. In a letter to executives, the senators wrote, “We are concerned algorithms will rely on data that reinforces negative stereotypes and either exclude people from seeing ads for housing, employment, credit, and education or show only predatory opportunities.”