What to know about the security of open-source machine learning models – TechTalks

6 minutes, 19 seconds Read
imageImage source: 123RF
” data-medium-file=”https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?fit=300%2C179&ssl=1″ data-large-file=”https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?fit=696%2C415&ssl=1″ src=”https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?resize=696%2C415&ssl=1″ alt=”AI cybersecurity” class=”wp-image-21231″ srcset=”https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?resize=1024%2C611&ssl=1 1024w, https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?resize=300%2C179&ssl=1 300w, https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?resize=768%2C458&ssl=1 768w, https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?resize=1536%2C917&ssl=1 1536w, https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?resize=2048%2C1222&ssl=1 2048w, https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?resize=696%2C415&ssl=1 696w, https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?resize=1068%2C637&ssl=1 1068w, https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?resize=704%2C420&ssl=1 704w, https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?resize=1920%2C1146&ssl=1 1920w, https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2024/04/ai-cybersecurity.jpg?w=1392&ssl=1 1392w” sizes=”(max-width: 696px) 100vw, 696px” data-recalc-dims=”1″>
Image source: 123RF

In February, researchers at JFrog Security Research found 100 vulnerable machine learning models uploaded on Hugging Face, the AI platform for sharing and collaborating on ML models. These models contained malicious code that could run harmful actions on users’ machines. 

The malicious models took advantage of the “pickle” format, often used by machine learning researchers to share their trained models. Attackers had inserted malicious code in pickle files, which would then run on the user’s machine when they loaded the model. This would give the attackers backdoor access to the user’s device and possibly enable them to gain full control of the machine.

“This silent infiltration could potentially grant access to critical internal systems and pave the way for large-scale data breaches or even corporate espionage, impacting not just individual users but potentially entire organizations across the globe, all while leaving victims utterly unaware of their compromised state,” the JFrog researchers warned.

In some ways, this incident can be a prelude to the kind of challenges we can expect from the growing ecosystem of open-source ML models, especially as open-source large language models (LLM) become increasingly popular.

In an interview with TechTalks, Greg Ellis, GM of Application Security at Digital.ai, explained the implications of this latest incident for the future of open-source ML models.

Basic cybersecurity failure

While this incident involved machine learning models, the type of vulnerability was not new. Applications serialize objects by turning them into a format that can be stored on disk or sent across a network. Those objects are then deserialized at runtime to restore the state of the object. During the deserialization process, the host application runs code that is specified in the stored object.

Deserialization vulnerabilities happen when a malicious actor exploits the deserialization process to run arbitrary code on the host machine. In this case, the stored machine learning models contained the malicious code.

But deserialization bugs happen to all kinds of applications and are common in programming languages such as Java, C#, and Python (in fact I’ve covered several in the past).

“The pickle process was allowing for other code to be executed, some of which as was reported allowed for callbacks to other IP addresses,” Ellis said about the Hugging Face incident. “There were some thoughts that perhaps it was for research purposes, but I think what raised everybody’s eyebrows was the fact that it called back to a live active IP address outside of the developer’s site.”

An evolving threat landscape

As artificial intelligence continues to advance at an accelerating pace, the security threats of machine learning models will evolve accordingly. As more organizations and developers start embedding ML models into their applications, threat actors will start finding new ways to abuse the models and the platforms where they are hosted.

“Much like we saw 20-30 years ago on normal code, we went from what was a very kind of niche hacking to script kiddies as attacks became mainstream and the bar was lowered,” Ellis says. “I think what we’re going to see is that overlap is going to happen very similarly on the AI side for the models.” 

But there are also aspects of AI that will make their security threats different. The crowd-sourced nature of machine learning models is going to help attackers progress much faster, Ellis warns. Also, more advanced AI systems will help malicious actors increase the speed at which they develop attacks against models.

“Just like we are talking about productivity gains from a normal standpoint through the use of the AI, we’re gonna see the same thing on the attack side,” Ellis said.

Speed vs security

With the hype surrounding machine learning, especially generative models and LLMs, organizations are under a lot of pressure to add AI features to their products. And with the training of machine learning models being challenging and expensive, many organizations will be looking toward platforms such as Hugging Face to find pre-trained models.

This haste can result in more focus on model features and performance and less on the security aspects of the downloaded models. As the broadening community of ML adopters starts to learn about the potential threats of integrating ML models into their applications, they will also have to learn to adopt security practices.

“You’re already hearing these large enterprises that are putting the brakes on a little bit, not only from a concern over leakage of proprietary information, but in terms of data privacy, PII, and other concerns around the legalities of where and how they can use generative AI,” Ellis said. 

The next step is to see the evolution of the governance model, the compliance model, and discussions on establishing guardrails around open source ML tools before incorporating them into business processes. 

“I think we’ll still have a lot of folks that are fast adopters and just trying to use what they can, and that’s how we push the edge and learn new things,” Ellis said. “But in terms of where the consumer trust will be is when the enterprises step back and really put some governance around how they use this.”

Who bears the responsibility for securing ML models?

Subscribe to continue reading

Become a paid subscriber to get access to the rest of this post and other exclusive content.

This post was originally published on the 3rd party site mentioned in the title of this this site

Similar Posts