OpenAI, in a bid to further strengthen the resilience of its AI systems, has launched the OpenAI Red Teaming Network. This network comprises a selected group of experts who are entrusted with the task of informing the company’s strategies for AI model risk assessment and mitigation.
Red teaming is emerging as a crucial step in the process of AI model development, especially as generative technologies gain widespread acceptance. It serves as a mechanism to identify, although not necessarily rectify, biases in models such as OpenAI’s DALL-E 2, known to magnify stereotypes related to race and gender. It also identifies prompts causing text-generating models, including those like ChatGPT and GPT-4, to overlook safety filters.
Although OpenAI has previously engaged external experts to benchmark and scrutinize its models through programs like its bug bounty and researcher access program, the introduction of the Red Teaming Network formalizes these efforts. The objective, as stated in a company blog post, is to deepen and broaden OpenAI’s collaboration with scientists, research institutions, and civil society organizations.
OpenAI envisions this initiative as a supplement to externally-specified governance practices, like third-party audits. Members of the network, based on their expertise, will be invited to participate in red teaming at various stages of the model and product development lifecycle.
In addition to commissioned red teaming campaigns by OpenAI, members of the Red Teaming Network will have the opportunity to interact with each other about general red teaming practices and insights. OpenAI clarifies that not every member will be involved with every new model or product, and individual time contributions, which could range from 5 to 10 years annually, will be determined individually.
OpenAI is inviting a diverse range of domain experts to join, including those with backgrounds in linguistics, biometrics, finance, and healthcare. Prior experience with AI systems or language models is not a prerequisite for eligibility. However, OpenAI notes that participation in the Red Teaming Network may be subject to non-disclosure and confidentiality agreements that could potentially affect other research.
The company values a willingness to engage and contribute unique perspectives on the impact assessment of AI systems. OpenAI is welcoming applications from experts across the globe, prioritizing domain diversity as well as geographic diversity in its selection process.
However, doubts remain if red teaming alone is sufficient. Critics argue it might not be.
Aviv Ovadya, a contributor for Wired and an affiliate with Harvard’s Berkman Klein Center and the Centre for the Governance of AI, advocates for “violet teaming”. This involves identifying potential harm a system like GPT-4 might cause to an institution or public good, and subsequently supporting the development of tools using that same system to protect the institution and public good. While this seems like a prudent approach, as Ovadya points out, there are few incentives to undertake violet teaming or even slow down AI releases enough to allow it to work effectively.
For the time being, red teaming networks such as OpenAI’s appear to be the best available solution.