All posts in “AI”

Scary deepfake tool lets you put words into someone’s mouth simply by typing them

If you needed more evidence that AI-based deepfakes are incredibly scary, we present to you a new a tool that lets you type in text and generate a video of an actual person saying those exact words. 

A group of scientists from Stanford University, the Max Planck Institute for Informatics, Princeton University, and Adobe Research created a tool and presented the research in a paper (via The Verge), titled “Text-based Editing of Talking-head Video.” The paper explains the methods used to “edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified.”

And while the techniques used to achieve this are very complex, using the tool is frighteningly simple. 

A YouTube video accompanying the research shows several videos of actual people saying actual sentences (yes, apparently we’re at that point in history where everything can be faked). Then a part of the sentence is changed — for example, “napalm” in “I love the smell of napalm in the morning” is exchanged with “french toast” — and you see the same person uttering a different sentence, in a very convincing manner. 

[embedded content]

Getting this tool to work in such a simple manner requires techniques to automatically annotate a talking head video with “phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame.” When the transcript of the speech in the video is altered, the researchers’ algorithm stitches all the elements back together seamlessly, while the lower half of the face of the person in the video is rendered to match the new text. 

On the input side, the tool allows a user to easily add, remove or alter words in a talking head video, or even create entirely new, full sentences. There are limitations — this tool can only be used on talking head videos, and the results vary widely depending on how much of the text is altered or omitted, for example. But the researchers note that their work is just the “first important step” towards “fully text-based editing and synthesis of general audio-visual content,” and suggest several methods for improving their results. 

Videos generated by the tool were shown to a group of 138 people; in 59.6% of their responses, the fake videos were mistaken to be real. For comparison, the same group was able to identify the real videos as real 80.6% of the time.

The tool isn’t widely available, and in a blog post, the researches acknowledge the complicated ethical considerations of releasing it. It can be used for valid causes, such as creating better editing tools for movie post production, but it can also be misused. “We acknowledge that bad actors might use such technologies to falsify personal statements and slander prominent individuals,” the post says. The researchers propose several techniques for making such a tool harder to misuse, including watermarking the video. But it’s quite obvious that it’s only a matter of time before these types of tools are widely available, and it’s hard to imagine they’ll solely be used for noble purposes. 

Uploads%252fvideo uploaders%252fdistribution thumb%252fimage%252f91675%252fd9ffc610 7e1b 4d2e 8cb4 50a8b44110ab.jpg%252foriginal.jpg?signature=naomhij2b7 ay isbtpytbvw0xu=&source=https%3a%2f%2fblueprint api production.s3.amazonaws

Watch Samsung’s new AI turn Mona Lisa into a realistic talking head

Soon, it might be trivially easy to create a fake video of someone -- anyone -- talking.
Soon, it might be trivially easy to create a fake video of someone — anyone — talking.

Image: Samsung AI Center, Moscow

Need more convincing that it will soon be impossible to tell whether a video of a person is real or fake? Enter Samsung’s new research, in which a neural network can turn a still image into a disturbingly convincing video. 

Researchers at the Samsung AI center in Moscow have achieved this, Motherboard reported Thursday, by training a “deep convolutional network” on a large number of videos showing talking heads, allowing it to identify certain facial features, and then using that knowledge to animate an image. 

The results, presented in a paper called “Few-Shot Adversarial Learning of Realistic Neural Talking Head Models,” are not as good as some of the deepfake videos you’ve seen, but to create those, you need a large number of images of the person you’re trying to animate. The advantage of Samsung’s approach is that you can turn a single still image (though the fidelity of the resulting video increases with more images) into a video. 

You can see some of the results of this research in the video, below. Using a single still image of Fyodor Dostoevsky, Salvador Dali, Albert Einstein, Marilyn Monroe and even Mona Lisa, the AI was able to create videos of them talking which are realistic enough — at moments — to appear to be actual footage. 

[embedded content]

None of these videos will fool an expert, or anyone looking close enough. But as we’ve seen in previous research on AI-based generated imagery, the results tend to vastly improve in a matter of years. 

The implications of this research are chilling. Armed with this tool, one only needs a single photo of a person (which are today easily obtainable for most people) to create a video of them talking. Add to that a tool that can use short snippets of sample audio material to create convincing, fake voice of a person, and one can get anyone to “say” anything. And with tools like Nvidia’s GAN, one could even create a realistic-looking, fake setting for such a video. As these tools become more powerful and easier to obtain, it will become tougher to tell real videos from fake ones; hopefully, the tools to discern between the two will get more advanced as well. 

Uploads%252fvideo uploaders%252fdistribution thumb%252fimage%252f85191%252f7a3ca773 92e9 4824 8e81 13eb1ea93fc0.jpg%252foriginal.jpg?signature=gwziawwcwvsch6g3wckzh tassg=&source=https%3a%2f%2fblueprint api production.s3.amazonaws

AI is now making ‘Joe Rogan’ talk about his chimp hockey team

[embedded content]

Say hello to Joe Rogan: podcaster, entertainer of problematic views, and man who believes that feeding his all chimp hockey team a diet of bone broth and elk meat will give them the power to rip your balls off. 

Or, at least that’s what the unaware listener might believe after listening to an entirely AI-generated clip of the popular podcaster. Unlike Rogan’s typical totally coherent rants, this one is a total fabrication. 

“The replica of Rogan’s voice the team created was produced using a text-to-speech deep learning system they developed called RealTalk,” explained the researchers behind the clip in a blog post, “which generates life-like speech using only text inputs.”

This obviously calls to mind deepfakes, the video editing tech that can convincingly edit videos to make it look like people did or said things they in fact did not. 

So what did these researchers make fake Rogan say?

“Friends, I’ve got something new to tell all of you,” the convincing Rogan voice explains in the above YouTube clip. “I’ve decided to sponsor a hockey team made up entirely of chimps. I’m tired of people telling me that chimps are not capable of kicking human ass in sports. Chimps are just superior athletes.” 

In case that’s not enough, the AI adds the following gem: “I’ve got them on strict diet of bone broth and elk meat. These chimps will rip your balls off.”

Notably, the researchers are aware that their tech could be used for less, shall we say, comical purposes — like spam callers impersonating your Mom’s voice and asking for personal information. 

“It’s pretty f*cking scary,” they write. 

But, thankfully, the technology isn’t there yet. In the meantime, the researchers leave us with a cautionary note.

“Please note that this project does not suggest that we endorse the views and opinions of Joe Rogan.”

Phew. I’m glad we got that straightened out. 

Whistleblower says Facebook’s algorithms generate extremist videos

Yay for "community."
Yay for “community.”

Image: Justin Sullivan / getty

Facebook has an automation problem. 

A confidential whistleblower complaint filed to the SEC and obtained by the Associated Press claims that the social network has been generating extremist videos, pages, and content by default. The content in question, which reportedly was manufactured entirely by Facebook independent of any specific human, ranges from white supremacist pages to pages for Al-Qaida.

Yeah, it’s bad. 

According to the AP, Facebook’s tools “[scrape] employment information from user’s pages to create business pages.” When those users’ pages contain extremist content, like, for example, pictures of suicide vests or mushroom clouds detonating in cities next to the words “The Islamic State” (two real examples), those images can make their way into Facebook autogenerated content.  

Not good.

Image: screenshot / facebook

We reached out to Facebook for comment, but received no response as of press time. That doesn’t mean, however, that the company hasn’t just recently spoken to the power of its AI systems. 

“AI powers a wide range of products at Facebook,” the company explained in a F8 blog post. “In recent years, this has included our work to proactively detect content that violates our policies. To help us catch more of this problematic content, we’re working to make sure our AI systems can understand content with as little supervision as possible.”

In allegedly assisting in the creation of propaganda material, Facebook’s algorithms have demonstrated that they still have a long way to go. 

TFW you are making sure your tools are used for good.

TFW you are making sure your tools are used for good.

Image: Justin Sullivan / getty

Mark Zuckerberg has often fallen back on the excuse that one day, perhaps soon, AI will be able to successfully moderate content on his platform. Today’s AP report, however, shows how quickly unsupervised programs can go awry. 

Uploads%252fvideo uploaders%252fdistribution thumb%252fimage%252f91166%252f998cf3a4 01be 4bc0 882f 3d234b7186e8.jpg%252foriginal.jpg?signature=f7qcxdi7decmxdyhdj7eafa6omi=&source=https%3a%2f%2fblueprint api production.s3.amazonaws

Google Lens gets more powerful, will make tipping and translating a breeze

Google Lens is getting more dynamic every year.
Google Lens is getting more dynamic every year.

Image: raymond wong / mashable

A whole bunch of cool and useful new features are coming to Google Lens.

At this year’s I/O developer conference, Aparna Chennapragada, Google’s vice president of and general manager for Camera and AR products, shared several ways in which Google Lens can be used to help better understand the world around us.

In 2017, Google Lens was introduced as a feature within Google’s camera app and Google Photos with limited machine learning and AI capabilities. At launch, Lens could use image recognition to detect simple things like landmarks, phone numbers, and addresses within photos.

But later this year, Google Lens will get a lot more powerful. For example, you’ll be able to point Lens at a restaurant menu and its AI will automatically highlight popular items and let you pull up photos of them. (You know you already search for this stuff — don’t deny it.) Clearly, this new feature is the latest shot aimed at taking down Yelp. 

In another demo, Google showed off Lens’ new tip-calculating feature. Simply point the Lens camera at a receipt and it’ll automatically calculate the tip. For someone who hates opening up the calculator app to figure out how much to tip, this feature will be very useful.

Lens will also bring recipes to life. Pointing the camera at a Bon Appetit article featuring a recipe, for example, will bring up a list of actions to do to make the dish as well as a link to purchase ingredients.

Perhaps the most impressive new Lens feature is Google Go. Aim the camera at any text, such as a sign, and Lens will use the Google Assistant to read the text out loud, highlighting the words as it goes. 

Even more useful is a built-in translation feature, which translates the text and overlays it on top of the original image in real-time. It then reads the translation out loud. Go leverages all of the company’s core AI technologies — translation, image recognition, and a voice assistant — and puts it into a single product to create a “more helpful Google for everyone,” Google CEO Sundar Pichai said.

Uploads%252fvideo uploaders%252fdistribution thumb%252fimage%252f86828%252f3f063f5d 8da3 4171 97e5 40125b07a217.jpg%252foriginal.jpg?signature=6fhcc9zapiutljezjmf3nl7duni=&source=https%3a%2f%2fblueprint api production.s3.amazonaws