Naver trained a Korean language model of type “ GPT-3 ”
Elevate your enterprise data technology and strategy to Transform 2021.
Naver, the Seongnam, South Korea-based company that operates the eponymous Naver search engine, announced this week that it has formed one of the largest AI language models of its kind, called HyperCLOVA. Naver says the system has learned 6500 times more Korean data than OpenAI’s GPT-3 and contains 204 billion parameters, the parts of the machine learning model learned from historical training data. (GPT-3 has 175 billion parameters.)
For nearly a year, OpenAI’s GPT-3 remained among the greatest AI language models ever created. Via an API, people used it to write emails and articles, summarize text, compose poetry and recipes, create website layouts and generate code for deep learning in Python. But GPT-3 has some key limitations, the main one being that it is only available in English.
According to Naver, HyperCLOVA was trained on 560 billion Korean data tokens – 97% of the Korean language – against the 499 billion tokens on which GPT-3 was formed. Tokens, a way to separate pieces of text into smaller natural language units, can be words, characters, or parts of words.
In a translated Press release, Naver said it will use HyperCLOVA to deliver “differentiated” experiences across its services, including the Naver search engine auto-correction feature. “Naver plans to support HyperCLOVA [for] small and medium businesses, creators and startups, ”the company said. “Since AI can be harnessed with a few-step learning method that provides straightforward explanations and examples, anyone who is not an AI expert can easily create AI services.”
Jack Clark, policy director of OpenAI, called HyperCLOVA a “notable” achievement both because of the scale of the model and because it fits in with the trend of generative dissemination of models, or multiple actors. are developing “GPT-3 style” models. In April, a research team from Chinese company Huawei quietly detailed PanGu-Alpha (stylized PanGu-α), a 750-gigabyte model with up to 200 billion parameters that has been trained on 1.1 terabytes of Chinese e-books, encyclopedias, news, social media, and web pages.
“Generative models ultimately reflect and amplify the data they are trained on – so different nations care a lot about how their own culture is represented in these models. Therefore, Naver’s announcement is part of a general trend for different nations to assert their own AI capability. [and] capability via training frontier models like GPT-3, ”Clark wrote in his weekly Import AI newsletter. “[We’ll] wait for more technical details to see if [it’s] really comparable to GPT-3. “
Some experts believe that while HyperCLOVA, GPT-3, and PanGu-α and similar full-size models are impressive when it comes to performance, they don’t move the ball forward on the research side of the equation. These are prestigious projects which rather demonstrate the scalability of existing techniques or which serve as a showcase for a company’s products.
Naver does not claim that HyperCLOVA overcomes other natural language blockers, like answer math problems correctly or answer questions without paraphrasing the training data. More problematically, it’s also possible that HyperCLOVA contains the types of biases and toxicity found in models like GPT-3. Among other things, artificial intelligence researcher Timnit Gebru questioned the wisdom of creating large models of language, examining who benefits and who is disadvantaged. In particular, the effects of training on AI and machine learning models on the environment were highlighted.
The co-authors of the OpenAI and Stanford paper suggest ways to deal with the negative consequences of large language models, such as enacting laws that require companies to recognize when text is generated by AI – can -to be in California Robot Act. Other recommendations include:
- Formation of a separate model that acts as a filter for content generated by a language model
- Deploy a bias test suite to run models before allowing users to use the model
- Avoid certain specific use cases
The consequences of not taking any of these steps could be catastrophic in the long run. In recent research, the Center on Terrorism, Extremism, and Counterterrorism at the Middlebury Institute for International Studies says GPT-3 could reliably generate “informative” and “influential” text that could radicalize people with violent far-right ideologies and behaviors. And the toxic language models deployed in production might struggle to understand certain aspects of minority languages and dialects. This could force people using the models to switch to ‘white-aligned English’, for example, to make sure the models work best for them, which could discourage minority speakers from engaging with the models. at the beginning.
VentureBeat’s mission is to be a digital city place for technical decision-makers to gain knowledge about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in running your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more
Become a member