The researchers describe TrojAI, a framework for securing AI models against enemy attacks

One way to test machine learning models for robustness is through a so-called Trojan attack, in which a model is modified so that it responds to input triggers that indicate an incorrect answer. To make these tests more repeatable and scalable, researchers at Johns Hopkins University developed a framework called synchronization TrojAI, a set of tools that generate triggered data sets and related models with Trojans. They say that it will enable researchers to understand the effects of different data set configurations on the generated “Trojan” models, and will help to fully test new methods of detecting Trojans to harden models.

It is important that the AI ​​models that companies use to make critical decisions are protected from attack. This method could help you become more secure.

TrojAI is a series of Python modules that researchers can use to find and generate Trojan AI classification and gain learning models. In the first step – classification – the user configures (1) the type of data poisoning to be applied to the data record of interest, (2) the architecture of the model to be trained, (3) the training parameters of the model and (4) the number of models to train. The configuration is then taken up by the main program, which generates the desired models. Alternatively, instead of a data set, the user can configure a poisonable environment in which the model is trained.

A sub-module for data generation – datagen – creates a synthetic body with image or text examples, while the sub-module for model generation – modelgen – trains a number of models that contain a Trojan.

TrojAI collects multiple metrics when training models for the Trojan data set or environment, including the performance of the trained model for data for all examples in the test data set that have no trigger. the performance of the trained model for examples that to have the embedded trigger; and the performance of the model using clean examples of the classes triggered during the model training. The high performance of all three metrics is intended to provide assurance that the model was successfully trojanized while maintaining the high performance of the original data set for which the model was developed.

In the future, the researchers hope to expand the framework with additional data modalities such as audio and tasks such as object recognition. They also plan to expand the library of data sets, architectures and learning environments for triggered amplification to test and produce multiple triggered models, and to take into account recent advances in the methods of embedding triggers that are designed to avoid detection.

The Johns Hopkins team is far from the only one to face the challenge of enemy attacks in machine learning. Google researchers published one in February paper Describe a framework that either detects attacks or pressures attackers to create images that resemble the target class of images Baidu offers a toolbox – Advbox – to generate opposing examples that models in frameworks such as MxNet, Keras, PyTorch and Caffe2 from Facebook, TensorFlow from Google and PaddlePaddle from Baidu can fool. And the MIT’s Computer Science and Artificial Intelligence Laboratory recently released a tool called TextFooler this creates controversial text to strengthen natural language models.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.