AI ‘harm models’ have the potential to create a new internet hell – wirescience

Representative image. Photo: Nikk/Flickr CC BY 2.0


  • A YouTuber in the AI ​​community has trained an AI language model called “GPT-4chan” to give anti-women answers to questions.
  • This suggests that many more people will be able to use and scale AI that embraces hate speech – which has attracted the attention of the AI ​​ethics community.
  • If AI developers are to be held accountable for the abuse of their tools, the code they publish must have sufficient safeguards to prevent abuse.

“How do you get a girlfriend?”

“Withdrawal of women’s rights.”

This exchange may be all too familiar in the dirtiest corners of the internet, but most readers may be surprised to discover that the anti-women response here was written by artificial intelligence.

Recently, a YouTuber in the AI ​​community posted a video explaining how he trained an AI language model called “GPT-4chan” on /pol/board in 4chan, a forum filled with hate speech, racism, sexism, anti-Semitism, and any content Another abuser imaginable. The model was made by tuning the open source language model GPT-J (not to be confused with the more popular OpenAI’s GPT-3). After the designer was trained in his language by the most stingy teachers, he unleashed AI on the forum, interacting with users and making over 30,000 posts (about 15,000 posted in one day, which is 10% of all posts that day ) . “Withdrawing Women’s Rights” was just one example of GPT-4chan’s responses to the poster’s questions.

After seeing what it could do, the model’s open source code received more than 1,500 downloads before officials at HuggingFace, the site that hosted it, took it down. This suggests that many more people will be able to use and scale AI that embraces hate speech – which has attracted the attention of the AI ​​ethics community.

Condemning an AI issuing hate speech was a bit of a no-brainer for AI ethicists, with many AI experts even doing so with an official letter penned by Stanford faculty. But there was one element in the entire ordeal that seemed more troubling. Yannick Kilcher, creator of GPT-4chan, responded to these criticisms of GPT-4chan by Mocking artificial intelligence ethicists, tweeted, “People just got pissed off about AI ethics, Rick rolled them.” His social media accounts contain similar attitudes towards the concept of ethical AI, just like those of 4chan users that AI sought to replicate. He referred to the release of the bar as “a joke and a light-hearted catch.” This claim of “phishing” is just one example of a growing phenomenon: disrespectful and provocative behavior online using the powerful capabilities of artificial intelligence.

Much of the AI ​​community has come to embrace open source development where the source code is made publicly available and can be used, modified, and analyzed. This contrasts with closed source software, a more traditional model where companies want to maintain control and confidentiality over their code. Open source tools are released to increase collaboration and stimulate development by crowdsourcing code to other engineers. In the case of open source AI, companies can reap the rewards of having more people checking and modifying the algorithms or models they create. It also democratizes the development of powerful AI applications by not restricting access to a small number of premium technology companies.

Representative image. Photo: Pixabay

This code post sounds warm and fuzzy, doesn’t it? but if anyone They can access the code to use or manipulate it to achieve their own goals, including bad actors. Free access to AI models means that most of the initial work to build a model has already been done, and anyone can now modify it to serve a malicious purpose. Lowering the barriers to accessing AI has a lot of benefits, but it also makes it very easy to use AI for offensive and malicious purposes.

term phishing It’s positively mainstream – like its signature and influences – but it grew out of online forums like 4chan. These bleak forums contained a mix of people posting anonymously from all over the world, attracting a lot of computer hordes and hackers. This led to the founding of hacking groups like Anonymous that began as a coordinated effort by 4chan users to eavesdrop on and prank organizations, such as defaming the Church of Scientology’s website. This behavior has evolved into more elaborate cyber attacks and ramifications, such as Anonymous launching Distributed Denial of Service (DDoS) attacks against government agencies such as the Department of Justice and the FBI. It even claimed recently that it shut down Russian government websites and state media in retaliation for the Russian invasion of Ukraine. What started as unorganized, ungoverned groups of online trolls (which Fox News infamously first referred to as the “online hate machine”) has grown into a legitimate social and political force.

Just as the online phishing culture has fueled hacking groups like Anonymous, something similar will happen with AI apps as more people get education and open-source tools to develop them. But this would be much more dangerous: Building and using AI models for the specific purpose of provoking or manipulating people goes beyond the traditional boundaries of online hunting, allowing a new degree of disrespect and harassment. AI can create disturbingly realistic content and can amplify and propagate such content to a degree that human users cannot. These are the AI ​​models that I call “harm models,” and we’re already seeing glimpses of how they are used.

Models of mischief often support the rapidly developing world of deepfake technology. Websites like 4chan have become hubs for fake porn: sexually explicit content powered by artificial intelligence created for harassment, money, or just because people can. There are artificial intelligence applications that are used to create new images for no reason other than to provoke responses and post offensive content, such as artificial intelligence that generates images of genitals. But intentionally built models of harm are not the only threat. Usually, benign AI applications can easily be leveraged for nefarious uses. The recent open-source publication of the DALL E Mini, an AI model that can create original images based on text prompts you provide, has led to a viral trend of using artificial intelligence to create all kinds of weird pictures, using a lot of abusive, racist, and intentionally sexist claims. Another example is from Microsoft, which in 2016 released the popular Twitter chatbot Tay to conduct research on “understanding the conversation”. Users from – where else? The forum /pol/ on 4chan manipulated AI to unleash a barrage of horrific tweets, causing Microsoft to shut down the bot within 24 hours of it coming online. AI is primarily a neutral tool and only becomes dangerous when it is improperly built or used, but this scenario is increasingly growing in turbulent online communities.

During my pre-teens, I spent a lot of time looking at miserable spaces online that were filled with trolling and lack of respect, curiously sorting out what I thought of the people and posts I saw. Every interaction on forums like 4chan was dripping with nihilism and sarcasm. The shock factor was the users’ preferred currency; They were inviting fellow forum participants to demonstrate their knowledge of ‘how the internet’. Are you willing to say some rotten nonsense to prove that you should be here? Do you “get” what we’re doing here? Are you one of us?

future tense

Whitney Phillips and Ryan Milner address this type of phenomenon in their book You are here: A field guide for navigating the polarized discourse, conspiracy theories, and the polluted media landscape. They trace the rise of an “internet culture” that emphasized the negative freedoms to publish any offensive or non-corrupt material one wanted. Members of this subculture saw themselves as protecting “freedom of speech” while creating a group of people who praised the ability to decode what particular language and concepts mean. Phillips and Milner argue that the “deeply detached rhetoric and deep irony” that became standard in this online subculture laid the foundation for violent white supremacy and other societal problems years later. This is how online subcultures, which emphasize above all that the things you say should not be taken seriously, contribute to appalling outcomes for real-world society. Nothing good will come of arming these disrespectful online spaces with artificial intelligence capabilities.

Learning how to build models of harm is becoming more feasible, as resources that teach AI development continue to proliferate and become publicly available. Moreover, bad actors can take off when looking to create malicious models using or manipulating code from available open source AI tools, or just using existing AI inappropriately. There is already a worrying lack of concern for ethics and responsibility among many AI developers. If they do not take into account how their tools can be abused, the code they publish will not have sufficient safeguards to prevent abuse.

Many experts advocate the integration of reasoning and ethical standards as part of any AI training. But for the perverted crowd we’ve seen will do anything to trol and harass online, we’ll need stricter safeguards. When it comes to open source AI, there isn’t much that organizations can do to prevent open source code from being abused once that code is publicly released. But companies can make wise decisions about which code will be published as open source, creating standards and governance models that evaluate models that, if published publicly, can become problematic. The scientists emphasized that AI developers may need to consider both “upstream” and “downstream” damages to make this assessment, focusing on the difference between “implementation” damages that can be addressed through “use” damages in code and which no amount of Code to do the fix (which may require developers to rethink their AI version at all). If the sober and reflexive processes of evaluating AI in this way cannot be implemented on a large scale, then harm models have the potential to create a new hell of an unfettered internet.

This article was originally published on Future Tense, a partnership between Slate, New America, and Arizona State University that studies emerging technologies, public policy, and society.


#harm #models #potential #create #internet #hell #wirescience

Leave a Comment

Your email address will not be published.