Launch a Data Science Practice with These Five Questions
By Seth Dobrin, VP & Chief Data Officer, IBM Analytics
For Chief Data Officers (CDOs), starting a data science transformation can be daunting. Even figuring out where to start can feel like trying to boil the ocean. To start to prioritize efforts within the organization, the CDO should begin by identifying key data assets and establishing long-term goals for the data science program.
The key to a successful data practice includes: creating value from data and ensuring compliance with data governance. While the insights gleaned from data are what really matter in the end, how a data science team achieves those findings is equally as important.
To achieve this, every CDO needs to address five key questions for their enterprise:
What are our Core Data Assets?
Every organization has a series of assets that have data associated with them and have significant value in and of themselves. Conducting a thorough audit of all data assets is the best place to begin as it helps to identify an enterprise’s data opportunity.
There is a finite number of ways data can be organized within an organization, and it usually boils down to three to five key assets. The most common groupings of data are Customer, Product and Company. The Customer asset is all about having a quantitative understanding of your customer. The Product asset ranges from lifecycle management to supply chain inventory to sales data. While the Company asset is a catch-all for data that does not fit into a specific bucket, the two most important components of the Company asset are talent and finance. Without finance data, the Product asset would be incomplete because financial performance of any given product is crucial in deriving value from that asset, driving the success of any organization. Once the assets have been defined conceptually, logical representations of the data that will make up these assets need to be constructed.
How do we unify our Approach to Governance?
Historically, data governance has been seen as an obstacle, but it can actually be an impactful enabler. After all, you can’t democratize data until you gain control over it. Having a unified approach to governance (across private and public clouds) is the key. And ideally, governance will be managed from a single place, though this is a challenge for today’s complex organizations.
The key to a successful data practice includes: creating value from data and ensuring compliance with data governance
The key to governance is that it’s more than just a policy. Governance requires a cultural shift that starts with the question—who should have access to data within your organization?
How do we build Deployable Data Science Assets?
The key to cognitive business is putting data to work. In layman’s terms, this means that the intrinsic value of data isn’t in the data itself, but in the application of data to business decisions. Making sure your data assets are prepared to be deployed is key to extracting valuable insights.
Building deployable data science assets means a few things. First, businesses should build analytics models like they build software. Data scientists are known for their creativity and ability to apply scientific rigor to solve business problems and utilize tools such as code repositories to share and edit code.
Second, how models are integrated into applications or workflows is an important consideration. The models need to be deployed in real software and delivered via an application or as a part of a workflow to provide the next best action in a process.
Third, and perhaps the most challenging aspect of deploying data science assets, is that maintenance is often an afterthought for most businesses. Once you build, train and deploy a model there is still much work required, including making sure the model is still performing and retraining on new data at defined increments based on either time or performance.
What is our Integrated Cloud Strategy?
The notion that an enterprise will “pause” during a data science transformation is misguided, which is why planning an integrated cloud strategy is crucial. Many enterprises feel the need to choose between public and private cloud options, but hybrid cloud is actually the most cohesive solution.
There is a lot of pressure in the industry to move to the cloud, but in reality, most enterprises are not ready or able to move everything to the cloud. This creates a dilemma for data science strategy as a lot of effort has gone into building the public cloud and there are many tools for extracting data and shipping it to these public environments. Hybrid cloud is the only solution that supports this and is really the holy grail for enterprises to move to the cloud.
CDOs, CIOs and CISOs often cite data security as a reason to postpone transition to the cloud. However, moving to the cloud can present an opportunity for enterprises to re-architect security from the ground up and implement new, more effective security practices and technologies.
How do we build a Sustainable Talent Pipeline?
A seamless data science transformation isn’t possible without the right team. Looking beyond the data assets and technological tools, what are the human resources your data science program needs to be successful? The key members of any data science team (the data scientist, the data engineer, the developer and the business analyst) must excel at their individual parts of the data process, but must also work together as a team. Understanding each role on the team and how they all collaborate is the key to finding the right people to build out your team.
Somewhere in your organization, there is a group of data professionals that are excited and willing to take risks. They are likely talking about Agile, DevOps, microservices, APIs and cloud, but have received some hesitation to implementing these technologies with legacy models.
It’s important to note that a data science transformation isn’t simply a technological shift, it’s a cultural one. Employees at every level should consider each aspect of the business as a source for data, and its CDOs as the drivers who implement this mindset. Investing in data helps business decisions be more informed, consistent and impactful. When data permeates a business, real change can be made.