Big Data 1I have founded several podcasts, one of which is Compliance into the Weeds, where Matt Kelly joins me each week take a deep dive into the weeds of a compliance related topic. As many of you know, Kelly is the former Editor-in-Chief for Compliance Week and now has his own consulting company, Radical Compliance. The topic of this week’s podcast is big data and it is based upon an eBook, entitled “Planning for Big Data – A CIO’s Handbook to the Changing Data Landscape, by the O’Reilly Radar Team, with a series of authors each contributing a chapter(s).

I wanted to explore some of these concepts, in greater depth, so over the next few blog posts I will be exploring some of the key concepts from the eBook and what they might mean for the compliance practitioner. The eBook is available for download free click here for your copy. Kelly also wrote about the eBook in a blog post, available here.

Why is the use of big data important for your Foreign Corrupt Practices Act (FCPA) anti-corruption compliance program. In its online publication, Board Matters Quarterly, from January 2014, in a piece entitled, “Anti-corruption compliance and big data analytics”, the firm EY said, “Integrating advanced techniques as part of a robust anti-corruption program or investigation enables today’s chief compliance officers, general counsels and chief audit executives to be more proactive in their queries.” EY believed it was a must in 2014 and I can only say the government’s appetite for more and better use of a company’s own data to prevent, detect and remediate FCPA violations has only grown.

What precisely is big data? I once put that question to Joe Oringel, a co-founder of Visual Risk IQ, who defined it as “unstructured data” generally meaning data across multiple database systems. In the eBook chapter entitled “What is Big Data?” Edd Dumbill expanded on Orinigel’s definition when he wrote, “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architecture.”

What are some of the key characteristics of big data? Clearly it is big but it is not simply a size storage problem. This ‘bigness’ also means such data can be hard to transport. Of course this brings up the issue of how and where you are going to store all of this data; on the cloud or in dedicated servers. Big data is messy and Dumbill notes, “Big data practitioners consistently report that 80% of the effort involved in dealing with data is cleaning it up in the first place.”

The usefulness of this data, structured or unstructured, is that deeply embedded within it are useful patterns and information which, if extracted, can provide insight into a wide-variety of issues which heretofore were hidden from view. The value of this data falls into two categories. The first is around analytical use as “big data analytics can reveal insights into hidden previously by data too costly to process.” The second use is enabling new products as “Being able to process every item of data in reasonable time removes the need for troublesome sampling and promotes and investigative approach to the data.”

Dumbill explored the question of what does big data look like by considering three terms commonly used to characterize the different aspects of big data. He calls them the “three Vs of volume, velocity and variety. Volume is simply size and if there is one characteristic of big data is that it takes up a large amount of volume. But once you get past the simple issue of storage, you must figure out a way to make some sense out of this volume which requires “scalable storage, and a distributed approach to querying.” In other words, the volume of big data can be so large that you might not have the capacity to process it through your existing ERP system.

Next is velocity, which, Dumbill says, is “the increasing rate at which data flows into an organization.” He further breaks this velocity issue down into two components: (1) streaming data and (2) complex event processing. Yet both of these categories impact the crucial element of velocity, how fast can you use the data or in techno-speak; get it into a feedback loop. This leads to the next insight that velocity is more than the speed at which data is inputted into your system but also speaks to the output of your data analysis system. Dumbill notes, “The tighter the feedback back loop the greater the competitive advantage.” For the Chief Compliance Officer (CCO) or compliance practitioner the clear import is that the faster you can get the data in and analyzed, the more useful it will be for you to detect any untoward activity and move to prevent it from becoming a full blown Foreign Corrupt Practices Act (FCPA) violation.

The third of Dumbill’s three Vs is variety. He relates that big data rarely comes a perfectly ordered format. Moreover, “A common theme in big data systems is that the source data is diverse, and doesn’t fall into neat rational structures” and that in almost all cases, “the reality of [big] data is messy.” In any international organization, this will be true as there is not one company I am aware of which has a common ERP platform across the globe. This will be a greater problem for any company that grew inorganically through acquisitions. There will always be a wide variety of data sources and many times these sources simply will not talk to each other so that you will have to impose some type of order to extract meaning from the data.

I found one of the more interesting insights Dumbill came up with around big data was in the area of culture. He believes that to properly understand and use big data, a company needs to move toward the embrace of data science, “a discipline which embraces math, programing and scientific instinct.” To benefit from using big data, it means that a company must invest in the mindset to both understand and then use big data for its advantage.

Dumbill cites to DJ Patil for four qualities required in data scientists. They include the technical expertise to understand what is being presented; a curiosity to look into other areas to “discover and distill a problem into a clear set of hypothesis that can be tested”; the ability to use big data to tell a story; and the cleverness to look at problems in different ways with the openness to creative ways to solve them.

For the CCO and compliance practitioner, it means there must be a willingness to move into areas that many lawyers are not trained for or even comfortable in using. It requires a discarding of the myopic view that doing compliance is simply putting in a written set of policies and procedures, then sitting back and expecting all employees to follow the rules. Unfortunately the real world is not that neat and ordered nor does it function as such.


This publication contains general information only and is based on the experiences and research of the author. The author is not, by means of this publication, rendering business, legal advice, or other professional advice or services. This publication is not a substitute for such legal advice or services, nor should it be used as a basis for any decision or action that may affect your business. Before making any decision or taking any action that may affect your business, you should consult a qualified legal advisor. The author, his affiliates, and related entities shall not be responsible for any loss sustained by any person or entity that relies on this publication. The Author gives his permission to link, post, distribute, or reference this article for any lawful purpose, provided attribution is made to the author. The author can be reached at

© Thomas R. Fox, 2016