Nuix announced its decision to buy Topos Labs, Inc. (Topos), a developer of natural language processing (NLP) software that helps computer systems better understand text and spoken words at speed and scale.
Headquartered in Boston, MA, Topos designed its artificial intelligence (AI) driven NLP platform to reduce the workload on data reviewers and analysts by surfacing relevant or risky content faster. Its mission is to provide customers with risk-oriented content intelligence for proactive risk management and regulatory compliance.
The addition to Topos to our already extensive software platform will, as you’ll see, play a noticeable role in making the lives of our users easier. Whether you’re tasked with conducting internal corporate investigations, handling legal discovery review or ensuring your organization is meeting its risk and regulatory obligations, Topos’ NLP capabilities and integration with Nuix in the coming months will be something to pay attention to.
POWERFUL ANALYSIS AND CLASSIFICATION The platform, which is still in the early stages of its development, can already automate accurate analysis and classification of complex content in documents, electronic communications and social media. Business users can directly define NLP models through the software’s no-code user interface, reducing the time required to identify risk in the organization’s data. From there, it can present the risk assessment of confidential, sensitive and regulated content in user-friendly dashboards.
“The acquisition of Topos is an exciting evolution in Nuix’s journey,” said David Sitsky, Nuix Engineering Founder and Chief Scientist. “Integrating the Nuix Engine’s ability to process vast quantities of unstructured data with the next generation NLP capabilities of Topos will be game-changing for Nuix’s product portfolio.”
“Topos will strengthen Nuix’s product offering by helping customers get to relevant data even faster,” added Rod Vawdrey, Nuix Global Group CEO. “The potential for user-friendly dashboards and for users to easily customize the software to their specific needs also reflects Nuix’s focus on empowering our customers to search through unstructured data at speed and scale. We look forward to Christopher Stephenson [Topos CEO] and his talented team joining Nuix.”
WELCOMING THE TOPOS TEAM As part of the deal the Topos team, including members of senior management, joined Nuix. By welcoming the Topos team and integrating the NLP capability at this stage of its development, Nuix can optimize the technology to benefit its investigations, eDiscovery and governance, risk and compliance (GRC) customers, further enhancing the unstructured data processing power of the Nuix Engine.
“We are delighted to join Nuix and are excited about combining our innovative NLP platform with the Nuix platform,” said Christopher Stephenson, CEO, Topos Labs. “Along with my talented engineering and product team, I look forward to deploying Topos to further enhance Nuix’s powerful processing capabilities and to being part of a global leader in investigative analytics and intelligence software.”
The sea of cubicles is quieter than normal. All eyes seem to be turned toward the conference rooms at the far end of the room, where strangers in suits approach carrying cases of computer equipment. They enter the appointed spaces and close the door, where a sign printed on plain white paper is taped.
“This room is reserved indefinitely.”
This isn’t fiction; it’s a scene I witnessed firsthand working inside the financial services industry. While the silence and anxiety were more centered around the fact that one of our most precious resources – a 10-person conference room – was likely out of circulation for months, there was definitely a sense of trepidation as the regulators went to work.
I recalled that scene several times as we worked on the 2021 Nuix Global Regulator Report alongside Ari Kaplan Advisors. How valuable would the insights in the report have been for our business unit during those months of meeting our obligations to the regulators? How much anxiety would have been put to rest? Most importantly, how quickly would we have gotten that conference room back?
RESPONDING TO REGULATORS MORE EFFECTIVELY
During a Q&A webinar about the report, chief report author Ari Kaplan and Stu Clarke, Regional Director – Northern Europe at Nuix, addressed the topic of corporations working more effectively with regulators.
Based on their conversations with regulators, it became clear that regulated corporations should take control of their environment. “Holistically, it makes life much easier when an inquiry kicks off,” Stu said. “They have a much better understanding of where risks lie and where employees are working inside the organization,” making it that much easier to respond to inquiries.
It also helps to look at regulators as guides who are there to advise the company, not just punish it when it goes astray. Summarizing some of the comments during the webinar, regulators have a role to inform and guide the organizations they are responsible for. There’s a desire amongst the regulators to work more collaboratively and build an ongoing relationship, not just swoop in during a one-time event.
It also helps to understand where the regulators are coming from. “The regulators are incredibly savvy and have experience in private industry,” Ari said. “They are well-versed in the various tools and they talk to each other.”
HANDLING A CONSTANTLY CHANGING ENVIRONMENT
The regulatory environment adapts as the realities of day-to-day business change. “Things change rapidly,” Stu said. For example, “we weren’t talking about Microsoft Teams two years ago, and we can’t stop talking about it or using it now.”
Those changes are just another set of reasons to better understand what the regulators are looking for. Download the 2021 Nuix Global Regulator Report to learn more about regulators’ approaches to their respective industries, preferred technology and enforcement practices, all of which can help you work more efficiently during a regulatory inquiry.
Since my early days of forensics, like data storage and available devices, data transfer cables were a growth area. To stock a competent digital forensics laboratory, you needed to have the cables and adapters to read all the devices you might find in the wild. These included IDE, the occasional RLL and about 100 different configurations of SCSI cables. Along with these cables, it was important to have the appropriate write blocking technology to enable proper preservation of digital evidence while duplicating it.
Times have naturally changed, as I discussed in part 1 of this series. As storage interfaces grew and changed, the type and number of these write blockers grew at the same time. The investigator needed to show up in the field, confident that no matter the size and configuration of a storage device, they had the equipment to properly interface with it and conduct analysis.
While the need to be prepared and competent has not diminished in the slightest, the sheer volume of digital data found at a given crime scene or under a search warrant has exploded, from a bunch of floppy disks and maybe a hard drive or two in the late 90s to multiple tens of terabytes or more in the 2020s. This dramatic increase in raw data has required the high-tech investigator to learn additional strategies to find key data on-site, possibly before performing full forensic analysis in a lab. Tools like Nuix Data Finder and Automatic Classification can be deployed in the field to find crucial items of digital evidence now, not 6-12 months from now when the laboratory backlog gets to your case.
THE DIFFERENCE IN DECADES
I mention ‘prepared and competent’ because it can’t be overstated that what was required in the 90s is darn near trivial when compared to the massive scope of the digital investigations field today.
In a nutshell, investigators in the 90s required knowledge of:
To a very minor extent, Macintosh/Apple.
The knowledge included how their file systems worked and the technical ability to analyze floppy disks and hard drives using:
While networking could be a factor in business investigations, most people using their computers at home dialed up to their service provider and the records were fairly easy to understand.
Fast forward to today and what investigators need to know dwarfs all past generations:
Windows (multiple flavors)
SATA/SAS spinning disk
SATA/SAS solid state disk
USB 2/3/C hard drives
Wireless hard drives
Home cloud drives
A variety of smaller/foreign cloud services
Digital cameras with and without network connectivity
Internet of Things (IOT)
Encryption – So many impacts on file storage and networking that it deserves its own novel
This list goes on and on. It’s almost impossible to recognize the field of high technology investigations when comparing the decades of development and advancement. It’s hard to imagine how a modern investigator can even be moderately competent given the breadth of knowledge required.
After all this history, I’m sure many readers will have some of the same questions. I’ll try to answer what I know I’d be asking, but I encourage you to reach out if you have others that I don’t cover here!
How Can Our Team Cover The Breadth Of Knowledge You’ve Outlined Here?
Having the properly trained and experienced personnel assigned to the cases involving the skills they are most experienced in is vitally important. Given the amount of available information out there, it is inconceivable that there is a single person in any organization who is best able to handle every type of case.
It’s also important to have the appropriate technical and hardware resources on hand to address the challenge of each type of data (and the platform it lives on).
What’s The Key To Ensuring We Are Focusing On The Right Pieces Of Evidence?
The one constant in my high-tech investigations tenure is the ability to interact competently with all types of people. Learning to interview and interrogate where appropriate and paying close attention to the facts of a case, including environment, are crucial components to locating all the data types required in each scenario to perform a thorough examination.
Secondary to the staff’s personal competence and their ability to ask pertinent questions about the environment they are investigating, is having a deep bench in terms of hardware, software and intelligence that will guide them to all available sources of digital evidence. Further, by having the knowledge and experience to learn all about the environment under investigation, the entire staff will be deeply steeped in the art of triage. This enables them to focus on most-likely-important evidence first and widen the scope needed to obtain all the facts without crushing themselves under the weight of trying to analyze ALL.
Which Tools Do You Recommend As Imperative For An Investigative Team?
This is a slam dunk. Nuix Workstation gives me the single pane of glass to all the evidence types I’m interested in, while Nuix Investigate® allows me to present all the evidence I’ve collected and processed to support staff and case agents, who will perform the detailed review of documents and communications to determine their relevance to the case.
How Do We Fill In The Gaps?
Again, I’ve got the core of most of my needs in the Nuix suite of tools. Where Nuix does not have a solution, like threat intelligence feeds or cooperative intelligence like the ISACS, I can incorporate information from those feeds directly into my Nuix cases and correlate across all the available data to solve the questions posed by the investigation.
EMPOWERING THE MODERN-DAY INVESTIGATOR
We know investigations take on many different forms depending on where you work. While criminal investigations will differ in some ways from, for example, a corporate environment, many of the details remain the same.
I encourage you to visit the Solutions section of our website and see for yourself how Nuix helps investigators in government, corporations, law enforcement, and more.
Digital investigations have undergone a geometric progression of complexity since my first fledgling technology investigations during the 90s. In those early years, a competent digital forensics professional only needed to know how to secure, acquire and analyze the floppy disks and miniscule hard drives that represented 99% of data sources at the time.
Since those halcyon days of Norton Disk Edit for deleted file recovery and text searching, there has been a veritable explosion of methods and places to store data. The initial challenges were focused mainly on training the investigators in a new field and the progression in size of available storage for consumers (and therefore investigative targets). While seizing thousands of floppy disks required immense effort to secure, duplicate and analyze, it was still the same data we were used to, just inconveniently stored and frequently requiring assistance from outside resources (thank you Pocatello, Idaho lab).
Information evolution and explosion has a direct impact on the field of investigations. To set the stage for the second half of this two-part investigations blog, in this article I’d like to look back on some of what I feel are the major changes that have occurred over the past 30-odd years.
LET’S CONTINUE OUR TOUR
By the turn of the century, hard drives, initially as small as 10-20 Mb, grew to a ‘staggering’ 10 Gb in a high-end computer. Flash media in the form of thumb drives and compact flash cards began to hit the market around the same time, becoming quickly adopted as the preferred storage medium for the newly minted digital cameras and tablet computers. Some of this media was small enough to be hidden in books, envelopes and change jars.
Cellular telephones, originally used only for voice communications, quickly advanced to transmit and store data in the form of messages, pictures and even email. As data became more portable, and therefore easier to lose or have stolen, encryption schemes arose that enabled normal consumers to adopt data security strategies that had previously only been used by governments and their spy agencies.
As data speeds increased, so too did the volume of data created and transmitted, necessitating the need for even more novel methods of storage. At about this time, the global adoption of remote computing quickly moved from dial up network services like AOL and CompuServe, to using those services as an entrance ramp of sorts to the internet, to direct internet connections of increased speed that eliminated the need for the AOLs of the world in the context in which they were originally operating, becoming instead a content destination for users connecting to the internet using rapidly growing broadband access.
FOLLOW THE DATA
Each step in this transformation required that the investigators learned the new ways that data moved, was stored and by whom. Just learning who an AOL screen name belonged to required numerous acquisitions and legal action. Compelling service and content providers alike to divulge these small pieces of data was required to determine where connections were being made from and sometimes by whom. High-tech investigators became one of many pieces of the dot com phenomenon.
Data protection services sprung up with the various dot com enterprises; securing data frequently involved transmitting backup data to remote servers. These servers were rented or given away to anyone who wanted them, adding to the complexity of identifying where in the world a given user’s data resided. After determining where the data resided, there were at least another two layers of complexity for the investigator – namely knowing what legal process was required to acquire the remote data and proving who placed the data on the remote servers.
As data quantity exploded, the need for more advanced software to analyze this data was quite pressing. There were several software offerings that sprang up in the early days that, unlike disk edit, were created for the express purpose of reviewing quantities of digital evidence in a manner that was forensically sound. Most early digital forensic tools were expensive, complicated and slow, but they represented an important step in the growing field of digital forensics. The early offerings of both corporate and open-source digital forensic software were anemic compared to today’s digital processing giants.
In some instances, the introduction of 100,000 files was sufficient to bring some tools to their knees, necessitating that forensic cases be analyzed in batches of evidence to avoid taxing the software. Thankfully, this is largely a thing of the past, as products like Nuix Workstation will chew through ten million items without a hiccup, much less a major crash.
Before we knew it, we weren’t just analyzing static data sitting on a local storage device. Network data investigation had to be added to the investigator’s arsenal to determine how data moved across networks, from where and by whom. Along with remote storage services, online communication services exploded across the internet, and suddenly the high-tech criminal had acquired ready access to victims from the very young to the very old for a variety of crimes.
This drastic shift to remote, anonymous communication represented a very new and very real threat that had the added complexity of making not only the criminals difficult to identify, but their victims as well. The traditional transaction involving a citizen walking through the entrance of a police station to report a crime still happened, but new internet crimes meant that when criminals were caught, it was no longer the conclusion of a long investigation. Frequently, it represented the beginning of trying to identify and locate the many victims who either didn’t know where or how to report the crime. This is all because the crimes were facilitated by, or the evidence recorded on, the growing catalog of digital storage.
As digital communication grew, so did the devices used to facilitate it. Cellular phones made the steady shift from plain telephones to a new category referred to commonly as ‘feature phones.’ These phones incorporated digital messaging utilities, including instant messaging, mobile email and access to portions of the internet through basic web browsers.
With the proliferation of feature phones, the real need for mobile device analysis sprang into existence almost overnight. Text messages on a flip phone were easy to photograph and catalog, but feature phones had a much more unique interface, requiring investigators to seek out technical solutions to the problem of megabytes of evidence locked in a device that was as non-standard as you could get.
For each manufacturer of cellular devices, there was a different operating system, storage capability and feature set. None of the existing computer forensic tools could acquire or analyze the wide assortment of available handsets. The cherry on the top of these early ‘smart’ phones was the seemingly random shape, size, placement and pin structure of the cables used to charge them. Many phone models came with dedicated companion software for the home computer that enabled backup or access from the computer.
Those same unique charging cables became unique data transfer cables connected to unique software on the host computer system. It was at this time that the first cellular forensic tools appeared. These systems didn’t appear at all like modern cellular forensic tools. They required extra software, hardware devices called ‘twister boxes’ and a literal suitcase of data transfer cables. Much like the early days of digital disk forensics, cellular forensics was a laborious and highly technical enterprise that required a great deal of training and experience to pull off.
Everything changed again in June 2007 with the release of what many consider to be the first true smartphone: the iPhone. Not long after, the beta Android device was introduced in November 2007 and the cellular arms race was on. If data quantity and location was an issue before, it was soon to become immensely more serious as the public rapidly adopted the smartphone and began carrying essentially an always connected, powerful computer in their pockets and purses.
If the high-tech investigation world was difficult before, it was about to become immensely more so. About the only beneficial thing that smartphones did for investigators was, over a 6-8 year period, they killed the feature phone and with it the suitcase of unique cables. A top shelf cellular forensic professional can safely carry five cables with them to handle the vast majority of phones in use. The original iPhone plug is still found in the wild, the newer Apple Lightning cable, and each of the USB flavors, mini, micro, and USB-C.
But, as you’ll see in part two of this series, that’s about the only positive for investigators. Things have continued to get much more complicated.
A couple of years back, when the GDPR was about to come into force, there was a great deal of talk about Data Subject Access Requests (DSARs). While European residents had long held the right to request their data, the fact that it was now free, and that there were potentially significant penalties for non-compliance meant that many organizations expected a tsunami of DSARs. There was an increase but perhaps not a tidal wave. Recently there has been speculation (in the wake of the COVID-19 pandemic and the associated job redundancies) that we are likely to see another surge.
It is important to understand that DSARs are about the rights of a data subject. A data controller must not only confirm whether it is processing the data requested and provide a copy, but also document:
The purposes of processing
The categories of personal data concerned
The recipients or categories of recipient to whom the data has been disclosed
The retention period for storing the personal data or, where this is not possible, the criteria for determining how long it will be stored
Notice of the existence of the right to request rectification, erasure, or restriction or to object to such processing and the right to lodge a complaint with the supervisory authority
The existence of automated decision-making (including profiling)
The safeguards provided if the data is transferred to a third country or international organization.
So, you can see that the exercise is as much about data governance and organization as it is about eDiscovery. Many DSARs are from disgruntled consumers, so managing the requests is mainly about good customer relations. Fix a person’s mobile phone, for example, and they may drop the DSAR.
However, there is one scenario where DSARs take on some of the characteristics of eDiscovery. A DSAR can be a quick and inexpensive way to get evidence to support a claim, without having to start on an expensive formal lawsuit (a kind of shortcut to pre-action disclosure). It can also be a negotiating ploy for an executive wanting to negotiate a decent exit package. “I know your data is a mess, and it will cost you £50,000 to respond to this, so I’ll settle for £20,000.” Or it might just be a disgruntled ex-employee who wants to cause annoyance.
An organization needs to respond to a DSAR within 30 days, but typically they don’t send the data to their supporting law firm until day 20—and I’ve heard stories of day 28. Further, the lawyers don’t necessarily know whether the DSAR is a torpedo about to explode into a larger legal action, or a legitimate request that needs to be answered as efficiently and cheaply as possible.
This is the great advantage of Nuix Discover®: It has the flexibility to support a self-service model designed to maximize efficiency and minimize cost while being able to pivot and become a full-function deep investigation and review tool. Panoram’s vision is to combine the two: Get lawyers used to the technology in day-to-day cases so they’re comfortable using the tools for more challenging ones.
THE MANTRA IS SPEED TO REVIEW AND SPEED OF REVIEW
Of course, that starts with fast and comprehensive data processing. Nuix has long been the benchmark here, and the ongoing enhancements in areas such as Microsoft Teams processing will be crucial going forwards.
Then it is all about the parallel early case assessment workflows of discounting redundant information and finding what is important. Nuix Discover’s analytics tools such as Mines and Clusters might allow you to exclude large amounts of non-personal communication from a review. If there is a parallel complaint going on (say into bullying) then communication network analysis will quickly allow you to see if team members are talking to each other about a person, and the concept cloud will allow you to understand what they are saying and whether it includes anything untoward.
As ever, the key route to controlling costs though is in review; accelerators such as quick coding, code as previous, and macros all help speed up review and so reduce cost. Threading, near dupe, and concepts allow you to streamline review workflows so reviewers get batches of similar data types to look at and make faster, more consistent review decisions.
The DSAR rules allow lawyers to exclude some documents from production, most notably for legal professional privilege and for confidentiality. Most complicated is the scenario of mixed data, where there may be a conflict between the need to provide data to a data subject and not to harm a third party’s rights—known as a tie breaker. Here the ability to note why a decision has been made is crucial, and so too is a consistency of approach. Back to the design of the right workflow.
Then there is redaction. The ability to use search term families to find and redact on individual documents is already useful. Regular expression searches make it possible to identify patterns of personal information (such as credit card numbers, national identity numbers, and passports). Once Nuix Discover has highly awaited case-wide redaction and native redaction for Microsoft Excel, it will have a significant advantage (for a while) over other products. Fast redaction is key to DSARs.
Finally, we have reporting. Law firms may be supporting multiple DSARS and need to make sure they are on track to meet the 30-day deadline, but also to measure accuracy and cost. Ideally this will reveal whether certain approaches are more efficient and make sure they are not losing money. A recent survey by Guardum says it costs £4,900 to answer the average DSAR, which does not leave a lot of fat. In Deer v Oxford University,the court ordered further searches causing the university to review 500,000 documents at a cost of £116,000 (for the disclosure of a further 33 documents).
The world does not standstill. You will notice I have consistently talked about data, not documents. Most kinds of data can be personal data (IP addresses, for instance). As we move to 5G and the internet of things, there is likely to be a coming together of the cybersecurity and forensics end of things and traditional legal review. Finding ways to show and illustrate this will be key and it is our hope that by being a Nuix partner we can both be at the forefront of building compelling solutions.
In this post I’ll describe how a security team could use Nuix Adaptive Security to detect, respond, and recover to a situation like the SolarWinds compromise. The attack on SolarWinds’ code and the information published about it means security teams face a series of questions, including:
Do I have any of the affected SolarWinds software in my enterprise?
Are all affected systems either disconnected or successfully upgraded?
Is there any communication between my network and the known attacker infrastructure?
If another attack with the same characteristics occurs, will I detect it?
Did unauthorized actors gain access to my network through this attack? If so, what data did they access, and how do I get rid of them?
Nuix Adaptive Security relies on an agent installed on enterprise endpoints. Once the agent is in place, search capabilities and real-time visibility are available to the security team. The agent logs activities on endpoints; passes the log events through onboard processing to decide whether to alert, block, quarantine, or take other actions; and sends the log events back to a central server. About a dozen categories of endpoint activities are logged, including file, process, registry, DLL, session, media, and registry events, as well as a range of insider threat-related behaviors. If a security operator finds a threat, Nuix Adaptive Security gives him or her a set immediate response tools such as killing processes, deleting files, quarantining the host, and initiating a forensic investigation.
In the SolarWinds scenario, this combination of historical event data, real-time detection, and response tools gives security teams the ability to respond quickly and efficiently by determining which systems may be affected by the initial compromise, enhancing detections with newly discovered IOCs, and initiating a comprehensive threat hunt.
ASSUMING NO ENDPOINT AGENT IS IN PLACE AT THE TIME OF INITIAL COMPROMISE
A team responding to SolarWinds with Nuix Adaptive Security post-event could take the following steps after deploying the agent.
Search For The Presence Of The Compromised Update Package And DLLs On Disk
Threat intelligence provides the names and MD5 hashes of these files, so the first step is simply to identify any extant instance of them. If desired, any discovered files can be deleted from the target system by the operator. Affected systems could also be quarantined from the rest of the network or from the internet.
Set Up Alerts For Future Arrival Or Execution Of The Compromised Files
Since the delivery and persistence mechanisms for the compromised applications are likely still only partially understood, alerts should be put in place to detect any instance where the compromised DLL is called, or the update package is detected on disk.
Set Up Detection For Future Known Command And Control
We’ve now put in place new detections for the IOCs known to be related to this threat. And we have a framework in place to update detections as new IOCs come out. This is valuable due diligence to protect against similar future threats. But we still must deal with the possibility that attackers have already compromised the network.
Begin A Threat Hunt For Active Attackers
Even if the compromised DLL was not discovered, it is possible there are or were compromises in other updates or through other vectors. The security team therefore needs to begin examining systems across the network for evidence of attempts to move laterally, establish persistence, discovery information, stage it, and so forth.
Nuix Adaptive Security gives operators baseline detections for these activities based on theMITRE ATT&CK framework. From that starting point they can customize and build detections based on their unique environment. The existing threat intelligence suggests several things to look for, including known IOCs, behavior of the malware such as changes to specified registry keys, and behaviors of the attacker such as suspicious use of RDP.
One initial step would be to examine the user session events recorded by Nuix Adaptive Security for unusual RDP sessions, such as those originating from the SolarWinds boxes. Suspicious use of administrative tools such as PowerShell should also be examined.
ASSUMING ENDPOINT AGENT IS IN PLACE AT THE TIME OF INITIAL COMPROMISE
Now let’s look at how our team would respond with Nuix Adaptive Security in place when the compromise occurred.
Detect The Initial Compromise Or Subsequent Tactics Employed By The Attacker
Nuix Adaptive Security contains customizable rules to detect malicious code and attacker TTPs. These give network defenders a powerful tool to detect indicators on the host that would otherwise be missed.
Identify Current And Historical Instances Of The Compromised Software
Nuix Adaptive Security logs all file writes, process starts, and DLL loads. A search of file events using the known file names and MD5 hashes would quickly reveal whether the update package or DLLs had been written to disk. A search of process and DLL activity would reveal whether compromised binaries ever executed.
Identify Current And Historical Instances Of Communication With The Known C2
The endpoint agent logs all DNS queries and network connections. A search of the historical events would reveal systems that had made contact with the attacker’s infrastructure. Alerts could be set up on any future communication.
Begin A Threat Hunt For Active Attackers
From here, you’d begin the threat hunt as described above.