The Realities of AI in Cybersecurity: Catastrophic Forgetting
KEY POINTS
There is a lot of hype about the use of artificial intelligence (AI) in cybersecurity. The truth is that the role and potential of AI in security are still evolving and often require experimentation and evaluation.
There is a lot of hype about the use of artificial intelligence (AI) in cybersecurity. The truth is that the role and potential of AI in security are still evolving and often require experimentation and evaluation.
SophosAI is committed to openly sharing its data science research with the security community in order to make the use of AI more transparent and influence how AI is positioned and discussed in cybersecurity. Details of other initiatives shared as part of this objective are available in the SophosAI blog.
Catastrophic forgetting: What is it?
Malware detection is the cornerstone of IT security and AI is the only approach capable of learning patterns from millions of new malware samples within a matter of days.
But there’s a catch: should the model keep all malware samples forever for optimum detection but slower learning and updates; or go for selective fine-tuning that enables the model to better keep up with the rate of change of malware, but runs the risk of forgetting older patterns (known as catastrophic forgetting)?
Retraining the whole model takes about one week. A good fine-tuning model should take about one hour to update.
SophosAI wanted to see if it was possible to have a fine-tuning model that could keep up with the evolving threat landscape, learn new patterns but still remember older ones while minimizing the impact on performance. Researcher Hillary Sanders evaluated a number of update options and has detailed her findings in the Sophos AI blog.
The detection dilemma
Keeping detection capabilities up to date is a constant battle. With every step we take towards defending against a malicious attack, adversaries are already developing new ways to get around it, releasing updates with different code or techniques. The result is that hundreds of thousands of new malware samples appear every day.
Detection is made even harder by the fact that the latest-and-greatest malware is rarely completely “new.” Instead, it is more likely to be a combination of new, old, shared, borrowed, or stolen code and adopted and adapted behaviors. Further, old malware can re-emerge after years in the wilderness, co-opted into an adversary’s latest arsenal to take defenses by surprise.
Detection models need to ensure they can continue to detect older malware samples and not just the most recent ones.
Updating AI detection models
When it comes to updating AI detection models with new malware samples, vendors have a choice between two options.
The second is to only update the detection model on new samples. This is known as fine-tuning. During each step of the fine-tuning process, the model updates its understanding according to the new knowledge added and the impact of this on the patterns seen overall. As a result, the model can “forget” the old patterns it learned previously (“catastrophic forgetting”). However, training a model on less data means the model updates faster and can be released more frequently, keeping better pace with the rapid rate of change of malware.
Regardless of the option chosen, the need to keep training AI detection models on new samples is critical.
The patterns that AI learns from malware samples enable it to generalize and detect not only what it was trained on, but also never-before-seen samples that bear at least some resemblance to the training data. Over time, however, new samples will begin to deviate enough that an old model’s effectiveness will decay, and it will need to be updated.
The following figure visualizes how detection performance declines over time if models are not updated when new samples appear. On the left are the older samples the model has been trained on. The detection rate is consistently strong. To the right are the new samples the model has not yet learned, so detection is weaker.
The three detection update options evaluated by Hillary Sanders were:
1.Learning based on a selection of old and new samples
This is called “data-rehearsal” and involves taking a small selection of old samples and mixing them in with the new, never-before-seen training data. Using this, the model is “reminded” of the old information it needed to detect older samples, while at the same time learning to detect the newer ones.
2.Learning Rate
This approach involves modifying how quickly the model “learns” by adjusting how much it can change after seeing any given sample. If the learning rate is too fast (in which case the model can change a lot with each sample added), it will only “remember” the most recent samples that it has seen. If the learning rate is too slow (the model can change only slightly with each sample added) it takes too long to learn anything. Finding the right trade-off between learning rate, retaining old information, and adding new information can be tricky.
3.Elastic Weight Consolidation (EWC)
This approach was inspired by work by Google’s DeepMind in 2017, and it involves using the old model like an elastic spring to “pull back” the new model if it starts to “forget.” For a more in-depth explanation of how to implement this approach, read Hillary Sanders’ blog post.
Findings
All three approaches performed better on older malware samples (left of the dotted line) than on newer samples (right of the dotted line).
Both the EWC and learning-rate approaches remove the need and cost of maintaining older data. However, the graph shows that while their future performance (using new data) is stronger than that achieved using the data-rehearsal technique, they don’t perform as well as data-rehearsal when comes to remembering past data.
Because the data-rehearsal technique enables faster training and update releases – in other words, the performance moves more quickly from the ‘unseen’ to the ‘trained’ side of the chart, dips in future performance are more short term and therefore less worrying.
Overall, the research showed that the data-rehearsal approach offers the best compromise between simplicity, update speed, and performance in malware detection modeling.
Conclusion
In the malware detection game, being able to remember the past is almost as important as being able to predict the future. This must be balanced against the cost and speed of updating your model with new information. Data-rehearsal is a simple and effective way to protect the model’s ability to detect old malware while significantly increasing the pace at which you can update and release new models.
About Soko Directory Team
Soko Directory is a Financial and Markets digital portal that tracks brands, listed firms on the NSE, SMEs and trend setters in the markets eco-system. Find us on Facebook: facebook.com/SokoDirectory and on Twitter: twitter.com/SokoDirectory
- January 2024 (238)
- February 2024 (227)
- March 2024 (190)
- April 2024 (133)
- May 2024 (157)
- June 2024 (145)
- July 2024 (136)
- August 2024 (154)
- September 2024 (212)
- October 2024 (255)
- November 2024 (196)
- December 2024 (42)
- January 2023 (182)
- February 2023 (203)
- March 2023 (322)
- April 2023 (298)
- May 2023 (268)
- June 2023 (214)
- July 2023 (212)
- August 2023 (257)
- September 2023 (237)
- October 2023 (264)
- November 2023 (286)
- December 2023 (177)
- January 2022 (293)
- February 2022 (329)
- March 2022 (358)
- April 2022 (292)
- May 2022 (271)
- June 2022 (232)
- July 2022 (278)
- August 2022 (253)
- September 2022 (246)
- October 2022 (196)
- November 2022 (232)
- December 2022 (167)
- January 2021 (182)
- February 2021 (227)
- March 2021 (325)
- April 2021 (259)
- May 2021 (285)
- June 2021 (272)
- July 2021 (277)
- August 2021 (232)
- September 2021 (271)
- October 2021 (304)
- November 2021 (364)
- December 2021 (249)
- January 2020 (272)
- February 2020 (310)
- March 2020 (390)
- April 2020 (321)
- May 2020 (335)
- June 2020 (327)
- July 2020 (333)
- August 2020 (276)
- September 2020 (214)
- October 2020 (233)
- November 2020 (242)
- December 2020 (187)
- January 2019 (251)
- February 2019 (215)
- March 2019 (283)
- April 2019 (254)
- May 2019 (269)
- June 2019 (249)
- July 2019 (335)
- August 2019 (293)
- September 2019 (306)
- October 2019 (313)
- November 2019 (362)
- December 2019 (318)
- January 2018 (291)
- February 2018 (213)
- March 2018 (275)
- April 2018 (223)
- May 2018 (235)
- June 2018 (176)
- July 2018 (256)
- August 2018 (247)
- September 2018 (255)
- October 2018 (282)
- November 2018 (282)
- December 2018 (184)
- January 2017 (183)
- February 2017 (194)
- March 2017 (207)
- April 2017 (104)
- May 2017 (169)
- June 2017 (205)
- July 2017 (189)
- August 2017 (195)
- September 2017 (186)
- October 2017 (235)
- November 2017 (253)
- December 2017 (266)
- January 2016 (164)
- February 2016 (165)
- March 2016 (189)
- April 2016 (143)
- May 2016 (245)
- June 2016 (182)
- July 2016 (271)
- August 2016 (247)
- September 2016 (233)
- October 2016 (191)
- November 2016 (243)
- December 2016 (153)
- January 2015 (1)
- February 2015 (4)
- March 2015 (164)
- April 2015 (107)
- May 2015 (116)
- June 2015 (119)
- July 2015 (145)
- August 2015 (157)
- September 2015 (186)
- October 2015 (169)
- November 2015 (173)
- December 2015 (205)
- March 2014 (2)
- March 2013 (10)
- June 2013 (1)
- March 2012 (7)
- April 2012 (15)
- May 2012 (1)
- July 2012 (1)
- August 2012 (4)
- October 2012 (2)
- November 2012 (2)
- December 2012 (1)