On Tuesday, Meta AI unveiled a demo of Galactica, a large-scale language model designed to “store, combine, and justify scientific knowledge.” While writing scientific literature was intended to be accelerated, opposing users running tests found that this could also be done produce realistic nonsense. After several days of ethical criticismMeta took the demo offline, reports MIT Technology Review.
Large language models (LLMs) like OpenAI’s GPT-3 learn to write text by studying millions of examples and understanding the statistical relationships between words. As a result, they can produce compelling-sounding documents, but these works can also be full of untruths and potentially harmful stereotypes. Some critics call LLMs “stochastic parrots” for their ability to persuasively spit out text without understanding its meaning.
Enter Galactica, an LLM aimed at writing scholarly literature. Its authors schooled Galactica on “a large and curated corpus of mankind’s scientific knowledge,” including over 48 million articles, textbooks and lecture notes, scholarly websites, and encyclopedias. According to Galactica’s paper, meta-AI researchers believed that this supposedly high-quality data would result in high-quality output.
Beginning Tuesday, visitors to the Galactica website could enter prompts to create documents such as literature reviews, wiki articles, lecture notes, and answers to questions according to examples provided by the site. The website presented the model as “a new interface to access and manipulate what we know about the universe”.
While some people found the demo promising and usefulothers soon discovered that anyone could type racist or potentially offensive promptsto just as easily create authoritative content on these topics. For example, someone used it author a wiki entry about a fictional research paper titled “The Benefits of Eating Crushed Glass”.
Even if the production of Galactica didn’t violate social norms, the model could attack and spit out well-understood scientific facts inaccuracies such as incorrect dates or animal names that require a thorough knowledge of the subject to be caught.
I asked #Galactica about some things that I know and am concerned about. In all cases it was wrong or biased, but sounded right and authoritative. I think it’s dangerous. Here are some of my experiments and my analysis of my concerns. (1/9)
— Michael Black (@Michael_J_Black) November 17, 2022
As a result meta drawn the Galactica demo on Thursday. Then Meta’s lead AI scientist Yann LeCun tweeted“The Galactica demo is offline for now. It is no longer possible to have fun by casually abusing them. Happy?”
The episode evokes a shared ethical dilemma with AI: when it comes to potentially harmful generative models, is it up to the public to use them responsibly, or the publishers of the models to prevent abuse?
Where industry practice falls between these two extremes will likely vary by culture and by the maturity of deep learning models. Ultimately, government regulation can play a large role in shaping the response.