Microsoft is creating machine learning algorithms to identify errors and vulnerabilities in program code, says CNews. Microsoft said in its official blog that AI will work together with security experts in order to improve the efficiency of finding bugs in the code.
The implementation of AI will allow not only to better identify errors, but also spend less time on it. Also, its use will give specialists the opportunity not to respond to the so-called “false positives” – suspected errors, which ultimately do not receive confirmation.
According to NIX Solutions, Microsoft’s staff of 47 thousand developers in the process of writing code every month generates approximately 30 thousand errors, and they have to spend a significant amount of their working time to identify those. The situation is complicated by the fact that the error code is not located in one repository, but in several dozens – developers store it in more than 100 repositories on GitHub and AzureDevOps.
According to the VentureBeat portal, citing Coralogix experts, developers (not only from Microsoft) make about 70 errors on average for every 1000 lines of code. Correcting each of them will require 30x time than writing one new line of code. In addition to time, the search and correction of errors also requires cash injections; in the United States about $ 113 billion is spent on this annually.
Learning New Microsoft AI
The machine learning model for Microsoft’s new artificial intelligence is based on information about 13 million work items and errors in program code. Microsoft has been collecting this data since 2001.
The information that artificial intelligence works with has been pre-selected by Microsoft security experts using a statistical sample. The model was trained in two stages: first, it began to identify security errors and other errors in the code, and then the ability to place the so-called “severity marks” was added to this ability – the model learned to attribute these errors to critical, important, and insignificant.
The accuracy of detection, according to the results of the first tests of the model’s work, turned out to be very high – the AI was able to identify work items with security errors in 99% of cases. The correct separation of these errors into critical and non-critical occurs in 97% of cases.
Further development of the model
Microsoft has envisioned the option of retraining the model used in its new AI. It takes place automatically, but it still partially depends on experts, as all the data received for processing is still approved by Microsoft security experts.
As with primary training, the Microsoft model uses two methods for predicting errors to identify errors during retraining. The first, according to the VentureBeat resource, is based on the so-called “logistic regression model” used to identify the probability of a particular class or event. The second method is the frequency inversion with which a word occurs in the documents processed by the model (term frequency-inverse document frequency algorithm (TF-IDF)).
Error prediction for everyone
At the time of publication, this type of AI was used exclusively within Microsoft. The company does not report at what stage of readiness it is at, but, nevertheless, is going to share with the whole world.
In the foreseeable future, the model code will be posted in the public domain in the repository on the GitHub service, from 2018 owned by Microsoft itself. Microsoft representatives have not yet announced the exact dates – according to them, the code will appear on GitHub in the coming months.