Text Analytics: The Art of the Possible
Dr. Anne Hunt, August 22
Text books, novels, newspapers. Blogs, webpages, email, text messages. Poetry, rap, sarcasm, jokes… journals, symposia.
If there’s one thing that distinguishes human beings from all other beings on this planet, it’s that we like to talk. And write. And read.
Our ability to generate content—most of it words—has only been accelerated by technology. By smart phones. By talk-to-text applications. By digital news and information platforms that churn out content continuously.
It seems impossible to even think without words. We think in language. Human knowledge has historically been recorded in texts.
Some say more than 80 percent of all enterprise information is locked in natural language texts. Word files, emails, (the dreaded) PowerPoint.
But language is slippery. The philosopher Ludwig Wittgenstein called language a game. It’s a game where words are the pieces, and the rules of the game are written by the cultures of the speakers.
So how can we make language computable? How can we unlock the knowledge contained in text? What we’d like is for computers to be able to read thousands—millions, tens of millions—of documents, articles, books, or research papers in seconds, and to be able to reason and draw conclusions from the information in texts.
I’d go so far as to say we have to do that. Because the pace of data creation will never be this slow again.
And if we can’t understand it—if we can’t build the technologies that allow us to connect dots, to see patterns, to identify new insights—we and the entire world ecosystem will suffer as a result.
Think I’m being dramatic? Consider diseases cured more slowly. Fraud, crime, security threats detected too late. Scientific advances delayed. Inventions arising years later than they otherwise would. Human potential unrealized.
This is the challenge and the motivation behind modern day text analytics.
It’s a challenge many are taking on. It starts with being able to use natural language processing technologies to isolate entities in text. People, places, companies, topics and more. Doing that is the first step. Doing it with incredible accuracy is imperative.
The ability to do it in real-time on huge volumes of streaming text opens up a whole new realm of possibilities. Applying complex, predictive models to do things like prevent fraud, predict opportunities, get ahead of risk—that’s where it gets really interesting.
Most modern enterprises are just beginning to understand what’s possible with respect to understanding and leveraging their informational assets. And often it’s up to vendors—those with expertise in understanding language—to educate them on what’s achievable, how, why and on what timetable.
This is, without question, an incredibly exciting time to be in this field. For this reason and for others.
Among them: the rise of the NoSQL technology vendors and the falling cost of memory. On top of that, there’s the growing popularity and acceptance of cloud computing. There’s the proliferation of research and open source tools that spur development and innovation. There’s greater willingness for enterprises to engage in proof of concept projects with early-stage technology partners to co-create purpose-driven solutions to challenges that were previously believed to be unsolved-able.
There are huge opportunities—for vendors and buyers—in natural language processing. And the next few years will be very telling – both in terms of technology providers’ ability to rise to the occasion and to deliver solutions with proven ROI, and in terms of enterprise customers pursuing new projects meant to tackle big challenges in big (and new) ways.
We’ll know we’ve been successful—regardless of which side of the coin we’re on—when we can point to text analytics as an enabler of good. In all its forms.