Published 21/04/2022
Given the sheer amount of digital texts publicly available on the Internet, it becomes more challenging for security analysts to identify cyber threat-related content. In this research, we proposed to build an autonomous system to identify cyber threat information from publicly available information sources. We examined different language models to utilize as a cyber security-specific filter for the proposed system. Using the domain-specific training data, we trained Doc2Vec and BERT models and compared their performance. According to our evaluation, the BERT-based Natural Language Filter is able to identify and classify cyber security-specific natural language text with 90% accuracy.
Read more on – https://www.jstage.jst.go.jp/article/ipsjjip/28/0/28_623/_article/-char/en