Artificial intelligence or, more specifically, deep learning entered the digital forensics and incident response (DFIR) space originally to help reduce the amount of time investigators spent analyzing cases. It started out with simple skin tone and body shape detection in graphics; this evolved into the ability to place a percentage value on how much skin appeared in a picture or video.
This technology is especially helpful for investigations into sexual assault, child abuse and other graphically disturbing cases. It saves analysts from manually looking at thousands, or in some cases millions, of graphics and video content, speeding review while at the same time (hopefully) preserving their general well-being in the process.
As deep learning technology has evolved, and more models have been developed, we’re faced with an important equation to solve. Which is more important: efficiency or accuracy? In a perfect world we would want both.
If you take a moment to look around, you’ll notice some big companies are making huge, regular investments into developing artificial intelligence models. These models are often freely available for technology companies like Nuix to incorporate into their own products.
I think it’s important to note here that these models, while freely available, are not the latest and greatest technology available. Still, the fact that so many options are available is impressive in its own right.
By default, Nuix uses Google’s Inception V3 model for image classification. This model balances accuracy (greater than 78.1%) with incredible speed. That’s great for cases where time is a critical factor; other options such as the Yahoo Open NSFW model (now known as Tensorflow by Google) and VGG16 work more slowly, relatively speaking, but operate at over 90% accuracy. The VGG16 model has the ability to learn through data ingestion, thus increasing its accuracy over time.
There are models in development that reach 98% accuracy while maintaining the speed of the Inception V3 model, but they have yet to reach our market.
EXPLORING THE MODELS FURTHER
Graphics analysis is an example of artificial narrow intelligence (ANI), which I explored in the last article. ANI is programmed to perform a specific task, freeing humans from performing repetitive and time-consuming work. For anyone who has performed graphic analysis, we know it certainly qualifies as both.
The models we’re talking about are convolutional neural networks (CNN), which detect patterns in images using techniques that analyze small areas of the image in a layered, procedural approach that can detect image edges, ranges of color and other aspects that help the machine classify the image for the analyst.
Explaining how this works is difficult. Thankfully, there are some great explanations online. One is the brilliant Anh H. Reynaolds’ article on Convolutional Neural Networks, which she was gracious enough to give me permission to share in this blog. AI education site DeepLizard also published an explainer video that’s worth watching to learn more. If you have a need-to-know mindset about how things work, both are worth the time investment.
MAKING THE CHOICE
As I did the research for this article, I came to an important conclusion. I can’t definitively say which model or approach is right for your investigative needs. Analysts should take the time to assess the different models and be comfortable with the mix of accuracy and speed they offer. During testimony, a decent attorney may ask what kind of testing and comparison you conducted to choose your machine learning model. It hasn’t happened often, but I have had attorneys surprise me with the homework they do.
I prefer to use real cases and test different models against them, with a couple caveats. First, you should run tests against models that you’ve already run through your trusted methods and technologies – just be prepared to find things you might not have found the first time around. After all, that’s the benefit of using the technology.
Also, testing is unbillable work – it’s just wrong to bill a client for work done while testing a new machine learning model. That doesn’t make the work any less valuable; the time you spend testing your models and documenting the results will have an incredible impact at every stage of your investigation.
Nuix announced its decision to buy Topos Labs, Inc. (Topos), a developer of natural language processing (NLP) software that helps computer systems better understand text and spoken words at speed and scale.
Headquartered in Boston, MA, Topos designed its artificial intelligence (AI) driven NLP platform to reduce the workload on data reviewers and analysts by surfacing relevant or risky content faster. Its mission is to provide customers with risk-oriented content intelligence for proactive risk management and regulatory compliance.
The addition to Topos to our already extensive software platform will, as you’ll see, play a noticeable role in making the lives of our users easier. Whether you’re tasked with conducting internal corporate investigations, handling legal discovery review or ensuring your organization is meeting its risk and regulatory obligations, Topos’ NLP capabilities and integration with Nuix in the coming months will be something to pay attention to.
POWERFUL ANALYSIS AND CLASSIFICATION
The platform, which is still in the early stages of its development, can already automate accurate analysis and classification of complex content in documents, electronic communications and social media. Business users can directly define NLP models through the software’s no-code user interface, reducing the time required to identify risk in the organization’s data. From there, it can present the risk assessment of confidential, sensitive and regulated content in user-friendly dashboards.
“The acquisition of Topos is an exciting evolution in Nuix’s journey,” said David Sitsky, Nuix Engineering Founder and Chief Scientist. “Integrating the Nuix Engine’s ability to process vast quantities of unstructured data with the next generation NLP capabilities of Topos will be game-changing for Nuix’s product portfolio.”
“Topos will strengthen Nuix’s product offering by helping customers get to relevant data even faster,” added Rod Vawdrey, Nuix Global Group CEO. “The potential for user-friendly dashboards and for users to easily customize the software to their specific needs also reflects Nuix’s focus on empowering our customers to search through unstructured data at speed and scale. We look forward to Christopher Stephenson [Topos CEO] and his talented team joining Nuix.”
WELCOMING THE TOPOS TEAM
As part of the deal the Topos team, including members of senior management, joined Nuix. By welcoming the Topos team and integrating the NLP capability at this stage of its development, Nuix can optimize the technology to benefit its investigations, eDiscovery and governance, risk and compliance (GRC) customers, further enhancing the unstructured data processing power of the Nuix Engine.
“We are delighted to join Nuix and are excited about combining our innovative NLP platform with the Nuix platform,” said Christopher Stephenson, CEO, Topos Labs. “Along with my talented engineering and product team, I look forward to deploying Topos to further enhance Nuix’s powerful processing capabilities and to being part of a global leader in investigative analytics and intelligence software.”
Artificial intelligence is a term that’s risen to become one of the most talked-about topics across many technology and business fields. Just look at LinkedIn, for example – #artificialintelligence has nearly 2.5 million followers! By comparison, #digitalforensics has only just under 6,000 followers, which says something about just how interested people are in artificial intelligence.
I think it’s important to have an honest and realistic understanding of what artificial intelligence is (and isn’t), the effects it will have on the world as it advances and how it has already transformed many of the business practices we take for granted today.
Over the next few months, I’d like to dive into the many facets of artificial intelligence that apply directly to digital forensics and investigations. While I’m looking at the subject from one perspective, many of these views can easily apply to other functions, technologies and industries. To begin the conversation, I think it’s important to look at some of the overlooked distinctions in the types of artificial intelligence to understand where we are today and where we’re headed in the future.
ARTIFICIAL INTELLIGENCE: ANI, AGI AND ASI
According to IBM, a leader in AI development, artificial intelligence “leverages computers and machines to mimic the problem-solving and decision-making capabilities of the human mind.” Discussions about AI range from the futuristically mundane (self-driving cars, a reality even today) to the downright dystopian (who hasn’t seen The Matrix?). I think it’s safe to say that self-driving vehicles aren’t going to take the world over tomorrow and enslave mankind, yet the same label is applied.
There must be some distinction under the broader umbrella of artificial intelligence. This is where the terms artificial narrow intelligence (ANI), artificial general intelligence (AGI) and artificial superintelligence (ASI) come into play.
Artificial Narrow Intelligence
ANI, which also goes by the term “weak AI” is where we’re mostly at today. This form of AI is programmed to perform a specific task, and as far back as 1996 we saw this with the famous set of chess matches between Gary Kasparov and Deep Blue. Not only does ANI operate on a specific task, it also uses a specific set of data to base its decision-making on.
With the advent of the internet and so much data available so readily, ANI can foster the illusion of broader intelligence, but realistically speaking ANI lives up to its name of ‘narrow’ intelligence, what many of us today regard as machine learning. The differences between true artificial intelligence and machine learning deserve their own article (or several!).
Artificial General Intelligence
AGI, “strong AI,” moves into the realm of exhibiting the flexibility of actual human intelligence. Probably the best example of this at present is IBM’s Project Debater, which by some estimates can debate topics at the level of a high school sophomore. This kind of intelligence, which lacks what we would consider sentience, is difficult to produce in computers despite the advances made to date in processing power and speed.
ASI raises the bar another level, surpassing human intelligence. This is likely not something we’ll need to worry about until much farther into the future; I’ll potentially touch on ASI in an article down the road.
WHAT DOES ANI MEAN RIGHT NOW FOR INVESTIGATIONS?
There’s always a conversation about whether artificial intelligence will someday replace examiners, which I think is unlikely. There is simply still too much value in the human perspective and decision-making process to expect computers to take over completely given the state of the technology.
What is true, however, is that ANI has changed the face of investigations. Gone are the days of heavy manual file carving or hex review; there’s simply no need to get that technical anymore inside of every investigation. And while I’d rather not think too much about it, artificial intelligence has done wonders by limiting the amount of time examiners need to spend looking at the disturbing images and videos that make up CP/CSAM cases.
Artificial intelligence, even at the ANI level, has come a long way in its ability to automatically identify things like skin tone, body parts, drugs, weapons and other common artifacts that can lead investigators to the truth in a case.
It’s interesting, as I considered this topic, just how far technology has progressed. It’s possible to do so much more in an accelerated window of time as an examiner. I’m not a computer ‘nerd’ in the traditional sense – a fact that I’m sure many IT departments I’ve worked with can attest to – but I get genuinely excited as a forensic examiner thinking about the possibilities that exist by combining the Nuix Engine with existing artificial intelligence capabilities.
And I’m looking forward to exploring the topic of artificial intelligence, along with other investigations subjects, in the articles to come!