Data Science Isn't Just Business Intelligence: Here's How to Cross the Gap
By Yohan Chin, VP Data Science, Tapjoy
Large companies have had BI and data science. Quite a few companies now advertise roles in a “data science” team, yet arguably these teams do little more than a traditional BI unit.
“For companies with an existing business intelligence unit, the challenge of transitioning to a data science team should not be underestimated”
The confusion stems from the most talked-about aspect of Big Data: massive scale. Database sizes have progressed from thousands of entries, to billions or more entries stored in the cloud. The immense ever-flowing river of data gives the impression that the challenge is solely in processing.
Both data science and BI teams are similarly equipped to handle processing and scale, so this is not where the difference is found. Rather, I will argue that the key differentiator between data science and BI is action. Great data science teams are equipped to act on data. While BI has had a great advance in the “decision sciences”—that is, the art of strategic decision making based on data—the decision making is done by executives, so this term is also not descriptive of a cutting edge data science team.
What Full-Fledged Data Science Teams Do
Simply put, a fully functional data science team bridges the gap between analysis and software development, by using its own production resources. This is true for the cutting-edge giants such as Google, LinkedIn and Amazon. It also includes mid-sized companies like Tapjoy, the mobile app monetization platform where I grew the data science team from a single data scientist (myself).
Tapjoy's data science team has become powerful enough to build features with no involvement from other teams, including large projects like real-time bidding engines and personalized ad selection algorithms. These projects, built in areas core to Tapjoy's business, illustrate the “bridge” mentioned above - crossing the chasm between analysis and development. Similarly, at a company like LinkedIn, the data science team performs self-directed work on the core features like People You May Know, Skill Endorsements and more.
So that's what a data science team can do. How about building it? In this case there are two vital challenges to address: company organization, and staffing additions and changes.
For companies with an existing business intelligence unit, the challenge of transitioning to a data science team should not be underestimated. A large part of this challenge is setting expectations with the executive team, and setting new organizational expectations of the role.
The problem: BI has traditionally existed to serve other units—and a data science team must still fulfill this objective. In a modern company, giving transparency and access to data across the entire organization is a key goal. However, this service ingrains the expectation that data science exists solely to provide data to others. That expectation is in conflict with the idea that data science should also have its own goals, resources and timelines.
Data science teams without their own internal resources must regularly fight for time from the product team— a fight they usually will lose. Data science teams simply don't produce changes that are as impressive, on a per-release basis, as those created by, say, a front-end software team. The world of data science is one of incremental change and experimentation.
For executives with an MBA, the concept of percentage yield may be useful: while a single change may produce only single-digit percent improvement, many of these changes over time yield massive benefits, and show the true power of data science in an organization.
Once the imperative exists for a data science team to work directly on products, the staff of the data science team needs to both grow and change. There is an entirely new role to hire for—the data science engineer—and upgrades to the existing position of data scientist.
The data science engineer is the embodiment of the “bridge” that allows the data science team to cross into product development. Think of this person as a regular engineer who might otherwise work in product development—not to be confused with the data platform engineer, who generally needs a broad knowledge of data products and languages. Due to the nature of the work being done, it's helpful to find a data science engineer with an interest in math and statistics. The position may also require skills like Java, Python, Amazon Web Services or Hadoop, but these can often be learned on the job.
As for the data scientist, this is where some additional expectations have been loaded on top of the already wide skill-set expected of an analyst. The skills of top data scientist can seem intimidating: coding in R or Python, or using platforms like Apache Pig or Spark. However, the goal is not to be a leading expert in these engineering skills, but rather to be capable of performing experiments alone. Also, as technology advances, many big data products are offering APIs that make the data scientist's work ever easier.
Even with these additions, a data science team can operate without the full complement of roles seen at a product team, which may include product managers, designers, developers and so forth. Since the role of the data science team is to create constant incremental change in existing features, the optimal structure has data scientists interfacing directly with company leadership, and side-by-side entering the Cloud Age.