A video or a voice may not be shot or recorded by a real person. Someone may be swiping your face in the background of a mobile app, payment interface or access control machine you do not know. With the increasingly sophisticated artificial intelligence (AI) deep synthesis technology, the synthetic audio, video and other forged content can be more and more confused with the real. There is no doubt that the real world we live in is facing the risks and challenges of technology abuse.
It's no trouble to swipe faces or tamper with sounds
In the past two years, in Zhejiang, Anhui, Jiangsu and other places, several suspect who stole personal information were arrested by the public security departments. The suspect's crime process is very similar: first, he illegally obtained other people's photos or paid for other people's voices and other "materials", then used artificial intelligence technology to "activate" the photos and synthesize dynamic videos, and then directly cheated the face verification mechanism of social platforms and Alipay accounts to make illegal profits; Or swindle the manual review link in the mobile card registration process, and then use the mobile phone number under others' names to carry out telecom network fraud, online gambling, etc., so that the people who are collected information suffer security threats and property losses.
How can a picture of a stranger be "activated" into a video
Banyuetan reporter saw in front of the demonstration computer in the laboratory of the Artificial Intelligence Research Institute of Tsinghua University that after a static picture of the face of a stranger just downloaded from the wechat circle of friends was imported into the computer, the characters in the picture could instantly "live" under the operation of technicians, make corresponding fine movements and expression changes such as blinking, opening mouth and frowning according to instructions, and generate smooth video in just over ten seconds.
"The technology to complete the driving operation from static to dynamic is called deep synthesis technology, which is a kind of artificial intelligence content synthesis technology." Xiaozihao, an engineer at the Institute of artificial intelligence of Tsinghua University, said that the deep synthesis technology has derived a variety of technologies, including image synthesis, video synthesis, sound synthesis, text generation and so on.
With the blessing of technology, it is no longer difficult to swipe faces. In mobile card registration, bank card application, payment software login and other links that require dynamic face recognition, these forged synthetic videos can help criminals pass the background audit and verification.
The technician demonstrated the operation of sound synthesis to the half moon reporter. Through the deep synthesis technology, several 60 second strangers' voices can be generated, such as "don't punch in, just transfer to me through wechat", "you don't have to pick up the children today. I'm near the school, and I'll pick up the children on the way", and the effect is like the voice of a real person. This kind of sound synthesis makes people "think and fear".
Deep synthesis is disintegrating "seeing is believing"
On domestic and foreign content platforms and social platforms, in-depth synthetic content has shown a "simultaneous increase in quantity and quality". Among them, the synthesized film and television play clips and face changing videos of topic characters are widely spread because of their strong entertainment.
According to the report on top ten trends of in-depth synthesis (2022) jointly released by the Institute of artificial intelligence of Tsinghua University, Beijing Ruilai Intelligent Technology Co., Ltd., intelligent media research center of Tsinghua University, national industrial information security development research center and Beijing big data center, the average annual growth rate of the number of in-depth synthetic videos on mainstream audio and video websites and social media platforms at home and abroad from 2017 to 2021 exceeded 77.8%. The number of newly released deep composite videos in 2021 is 11 times that in 2017. At the same time, the exposure, attention and communication power of deep composite content have also increased exponentially. In 2021, the number of likes of newly released deep composite videos has exceeded 300million.
"The video and voice circulated on the Internet may not be shot or recorded by real people." Renkui, Dean of the school of Cyberspace Security of Zhejiang University, said that it is difficult to distinguish whether it is full face synthesis, audio synthesis or real shooting and recording with the help of human eyes.
Zhu Jun, Professor of the computer department of Tsinghua University and director of the basic theory research center of the Institute of artificial intelligence, believes that the deep synthesis technology is changing the underlying logic and complexity of the trust chain of information dissemination content, and the potential risks are increasing rapidly. On the one hand, the definition of "seeing is believing" has changed. Although the public has known that static information such as photos is easy to be tampered with, they still have a high degree of trust in dynamic information such as video and sound. Deep synthesis technology has once again disrupted the trust logic of "seeing is believing". Second, the wide spread of short video makes the abuse of deep synthesis technology have a wide range of influence and destructive power.
Xuelan, Dean and professor of sushimin College of Tsinghua University, believes that when artificial intelligence technologies such as deep synthesis are "abused", it will bring a series of ethical and governance problems: at least it will infringe on personal property security, hurt personal dignity and privacy, and at worst it will threaten national security and social stability.
Guide technology to improve and improve AI risk management system
Technology is a double-edged sword. Making good use of this double-edged sword can neither make technology a runaway wild horse nor let technological innovation stand still.
From the perspective of making good use of technology, wuhequan, an academician of the Chinese Academy of engineering and an information technology expert, proposed that the new application and development of technology should not be prohibited and intervened "across the board", so as not to hinder its innovation. Instead, the security problems derived from technology should be solved from the source, and the ability of iterative detection technology should be continuously improved by means of technological innovation and technological confrontation.
Zhu Jun believes that the current detection technology for deep synthesis application is still in the exploratory stage, and the means are not mature. It is suggested to give full play to the strength of scientific research institutes and scientific and technological enterprises to form an effective and efficient in-depth synthetic application technology detection capability as soon as possible, so as to strive for technical advantages in the war of public opinion and information war.
From the perspective of risk governance, qiuhuijun, deputy chief engineer of the national industrial information security development research center, pointed out that the digital transformation in recent years forced the implementation of artificial intelligence security risk governance in many countries. The EU took the lead in carrying out legislation in the field of artificial intelligence. Based on the method of risk analysis, it focused on clarifying the regulatory framework for high-risk artificial intelligence systems.
"Artificial intelligence security includes data security, framework security, algorithm security, model security, operation security and other components. In this regard, we should build an integrated governance rule system of 'regulations + Standards + laws', issue guidelines, standards and evaluation specifications for risk governance, and improve legislation when conditions permit." Qiuhuijun suggested that the focus should be on data, algorithms, models and operation and maintenance. First, build data collection quality specifications; The second is to classify the system risk of artificial intelligence according to the application scenarios; Third, establish a security responsibility system to clarify the respective responsibilities of the design and development unit, the operation and maintenance unit and the data provider.
Chenjihong, a partner of Zhonglun law firm, said that to crack down on the crime of "face changing" fraud, we should standardize the legal use boundary of technology, the security assessment procedure of technology, the legal regulation of technology abuse and other aspects, so as to increase the illegal cost of technology abuse.
Zhu Jun suggested that the public should form a correct understanding of new technologies and applications of deep synthesis, improve their awareness of adverse applications, protect personal voiceprints, photos and other information, and not easily provide personal biological information such as faces, fingerprints and irises to others.