News flash: The amount of data in the world is growing. Your business data is exploding. But here’s the real news flash: Not all data is created equal. The data contained in Word docs and PowerPoints is vastly different than point-of-sale data or a phone number directory. Data is classified as structured data vs. unstructured data and each classification has bearing on how it is collected, processed and analyzed.
Structured Data vs. unstructured data
Structured data — or quantitative data — is the type of data that fits nicely into a relational database. It’s highly organized and easily analyzed. Most IT staff are used to working with structured data.
When you think of structured data, think of things that would sit nicely in a spreadsheet. Examples include:
- Phone numbers
- ZIP codes
- Customer names
- Product inventories
- Point-of-sale (POS) transaction information
Unstructured data — or qualitative data — is just the opposite. It doesn’t fit nicely into a spreadsheet or database. It can be textual or non-textual. It can be human- or machine-generated.
Examples of unstructured data include:
- Media: Audio and video files, images
- Text files: Word docs, PowerPoint presentations, email, chat logs
- Email: There’s some internal metadata structure, so it’s sometimes called semi-structured, but the message field is unstructured and difficult to analyze with traditional tools.
- Social Media: Data from social networking sites like Facebook, Twitter and LinkedIn
- Mobile data: Text messages, locations
- Communications: Chat, call recordings
The Problem with unstructured data
Here’s the two-fold, compounding problem… Unstructured data is important. The volume of unstructured data is growing — and that growth is accelerating. Right now, experts suggest that anywhere from 80-90% of data is unstructured. Check out this graph from IDC:
If unstructured data was of minimal importance, then it wouldn’t really matter how much of it there was. But there is value in unstructured data. There’s intelligence contained in those sales proposals and interesting facts and figures in those PowerPoint presentations. There’s a dollar amount attached to those geospatial images for oil and gas companies.
With an intelligent information management (IIM) platform like M-Files, unstructured data becomes accessible, searchable and available. By applying structure in the form of metadata, companies render that information relevant. Metadata is the key to the castle. It describes what the data is, how it relates to other data, key data points within documents and where in a particular business process that data fits.
When unstructured data is accessible, searchable, available and relevant, it is converted into information that an enterprise can use to make better decisions. Organizations can essentially exploit the power of unstructured data with an IIM platform.
Add to that artificial intelligence and machine learning and you have a powerful transformation happening with unstructured data. Several artificial intelligence (AI) technologies are arriving just in time to help companies automatically add structure to their data. For example:
- Natural language processing to extract key data points and ultimately assign meaning to business documents, emails, journal articles, and social media posts
- Pattern recognition algorithms to identify people, animals, or other objects in digital images and videos
- Speech-to-text conversion to convert audio speech and audio extracted from video into searchable text
So when it comes to structured data vs. unstructured data, M-Files delivers on all of the above with an intelligent information management platform enabled by artificial intelligence. You’ll have to address the problem of unstructured data eventually. In a very real sense, sooner is better than later in this case.