General Purpose AI (GPAI) regulation is currently being debated by European Union legislative bodies as they work on the Artificial Intelligence (AIA) Act. A change proposed by the Council of the EU (the Council) would take the unusual and harmful step of regulating open source GPAI. Although intended to allow safer use of these tools, the proposal would create legal liability for open source GPAI models, undermining their development. This could further concentrate power over the future of AI in big tech companies and prevent research essential to public understanding of AI.
What is GPAI?
The Board’s approach is to define a subset of general-purpose AI systems and then require GPAI developers to meet requirements for risk management, data governance, technical documentation, transparency, as well as accuracy and cybersecurity standards. The Council defines GPAI as an AI that performs “generally applicable functions” and can be used in a “multiple contexts”, but this definition is still quite vague. Although there is no widely used definition of GPAI, the current generation of GPAI is characterized by training deep learning models on large datasets, using relatively computationally intensive, to perform multiple, even hundreds of tasks. These tasks can include generating images, translating languages, moving a robotic arm, playing video games, or all of the above.
The Commission has reason to consider regulating GPAI models. The capabilities of these models are increasing rapidly and as a result they are being used in new applications, such as writing assistants or photo editing tools. There are also concerns about their use to generate misinformation and deepfakes, although this is less common.
The Council also seems concerned about the opacity of these models – training deep learning models on huge datasets has led to more complex and difficult to understand behaviors. Additionally, some companies make GPAI only available through application programming interfaces or APIs. This means that users can only send data to the GPAI system and then get a response. They cannot directly query or evaluate the model, which poses real challenges in developing downstream AI systems that would meet AIA requirements. These are some of the reasons the Council is considering requirements on GPAI models.
GPAI open source contributes to the responsible development of GPAI
While the objectives of the Council’s approach to AIPM are understandable, the explicit inclusion of open-source The GPAI undermines the Council’s ambitions. Open source GPAI models are freely available for use by anyone, rather than being sold or otherwise traded. The proposed AIA project will create legal liabilities, and therefore a chilling effect, on the development of open source GPAI. Open-source GPAI projects play two key roles in the future of GPAI: First, they spread power over the direction of AI from well-resourced tech companies to a more diverse group of stakeholders. Second, they enable critical research, and thus public knowledge, on the function and limitations of GPAI models.
Very few institutions have the resources to train state-of-the-art GPAI models and it is reasonable to estimate that an individual GPAI model could cost several million dollars to develop, although each additional model created by an institution is expected to cost many cheaper. While some big tech companies are making their models open source, such as Google’s BERT or Open AI’s GPT-2, the incentives for companies to release these models will diminish over time as they become more commercialized.
There are already very few open source models from non-profit initiatives, leaving the field dependent on big tech companies. The Allen Institute for AI released ELMo in 2019, but the organization announced earlier in July that they could be refocus to develop language models. Since mid-2020, a collaborative group of researchers called EleutherAI has successfully created open source versions of large language models and scientific AI models. The most promising is the recent publication of Bloom, a large language model developed by a large collaboration of more than 900 researchers in open science and organized by the company HuggingFace. These efforts enable a much more diverse set of stakeholders in the future of GPAI, perhaps best exemplified by Bloom’s support of 46 human languages. Notably, Bloom was developed using a French government supercomputer, which makes it more exposed to new regulations.
Beyond the general direction of GPAI research, specific knowledge of open source GPAI models contributes significantly to the public interest. In a previous Brookings article, I analyzed how open source AI software is accelerating AI adoption, enabling fairer and more reliable AI, and advancing science that uses AI – this is also largely true for GPAI.
Without open source GPAI, the public will know less and big tech companies will have more influence over the design and execution of these models.
Moreover, the public availability of GPAI models helps to identify problems and propose solutions for the benefit of society. For example, large open source language models have shown how biases manifest in the model’s associations with specific words and show how they could be intentionally manipulated. Other papers use open source GPAI models to compare their reliability in code generation, or to build new benchmarks to assess their understanding of language, or to measure the carbon cost of AI development. Especially as GPAI models become more common in impactful applications such as search engines and news feeds, as well as in factories or utilities, understanding their limitations will be paramount.
This research leads not only to scientific advances, but also to a more appropriate criticism of their use by large technological companies. For example, understanding the general operation of GPAI models can facilitate crowdsourced algorithmic audits, where groups of individuals collaborate to test the operation of a corporate algorithmic system from the outside. A group of content creators recently used this approach to demonstrate that YouTube was unfairly demonetizing LGBTQ content.
Allowing more open source GPAIs provides more transparency in their development. Without open source GPAI, the public will know less and big tech companies will have more influence over the design and execution of these models. Notably, researchers at these companies don’t have a completely free hand – recall that criticism of Google’s big language models was at the center of the dispute that resulted in the firing of one of the company’s star researchers, Dr. Timnit Gebru.
Additionally, by discouraging open source GPAI, there could be a greater reliance on enterprise GPAI models that are hidden behind APIs. Since APIs limit how a user can interact with a GPAI model, even a well-documented GPAI model that is only available through an API can be much more difficult to use securely than an open-source GPAI model. .
Regulate risky and harmful apps, not open source AI models
On the net, open source AI models offer considerable societal value, but the Council’s treatment of GPAI (open source and others) is also a notable departure from the broader perspective of AIA, called its “risk-based” approach. In the initial European Commission proposal, regulatory requirements were only applied to certain risky applications of AI (such as hiring, facial recognition or chatbots), rather than to the existence of a model. Thus, GPAI models would have been exempted until they were used for an application covered by the risk-based approach.
The Council’s Draft AIA includes two exemptions that apply circumstantially to open source GPAI models, but both pose serious problems. The first exemption excludes all AI models that are only used for research and development purposes from the full AIA. Yet open source developers are mostly driven by the idea of creating things that people use, which means this restriction diminishes the incentive to contribute open source AI. The second exemption allows GPAI models to be exempted if its developers prohibit and successfully prevent misuse of the model. However, it is completely impossible for open source developers to realistically monitor and prevent abuse once they release a model. These exemptions will not sufficiently relieve open source AI developers of their regulatory responsibilities or legal liability.
As a result, open source developers would be right to worry about how different regulators in EU member states interpret AIA. Moreover, it is not difficult to imagine that following a disastrous result of an application of a GPAI model, the company responsible tries to deflect blame and legal responsibility by suing the open developers source on which they built their work. These two sources of potential liability would create a significant incentive not to release the OSS GPAI models, or possibly any software containing a GPAI model.
Ultimately, the Council’s attempt to regulate open source could create a convoluted set of requirements that put open source AI contributors at risk, likely without improving the use of GPAI. Open-source AI models offer tremendous societal value by challenging the dominance of GPAI by big tech companies and increasing public awareness of the function of AI. The old European Council approach – exempting open-source AI until it is used for a high-risk application – would lead to much better outcomes for the future of AI.
Google is a general, unrestricted donor to the Brookings Institution. The findings, interpretations and conclusions published in this article are solely those of the author and are not influenced by any donation.