Datascience: The Three Lessons Learnt
By David Elges, Chief Information Officer, DC Government
Lesson 1: General data integration usually causes more impact than the localized use of super-algorithms and machine learning
The most common data problem in an organization or agency is not understanding data, but integrating multiple sources and platforms. Even in a small startup, the information is spread across countless systems, environments, and spreadsheets, such as Google Analytics, Salesforce, Oracle, Google Spreadsheets, Microsoft Excel, and many others. For a large (and for a small) organization, integrating all of this data into a single, centralized environment – with proper access rights as well as permissions control – is more valuable than applying a machine learning algorithm to optimize only historical data. There is a lot of inefficiency in the flow of information within a large organization, and inefficiency can be translated as time, resources, and money being burned. Improving the general flow of information within a large organization creates more value than optimizing only a small sector.
All of the human challenges involved in a used car sale are also involved in implementing a data science project
Lesson 2: Data Science is about people
In a large organization, you deal primarily with people, not machines and systems. It is people who make decisions based on what they see and their interpretation of the data, by applying a new super-sophisticated algorithm to a table with one billion records, the output must be simple enough to be understood by all and have enough value to change of habit of the person who will consume that information. In this sense, my experience with user interface and user experience has proven to be highly valuable, presenting well-constructed graphs, easy-to-understand control panels, and efficient applications, directly impacts how a person understands, evaluates, and recurrently uses a data science project.
Lesson 3: Changing habits requires a huge energy investment
I learned much more about the success of data science projects by making software for the masses and reading on psychology and cognitive sciences, than by studying data science itself. Implementing a new data ecosystem in an organization – for example by integrating 28 data sources and one trillion records containing ten years of historical information – is a big lift for anyone, but I would say that this is as big an effort as it is to make a small group – even if it is just a dozen people – with different professional backgrounds, quit their old habits, spreadsheets, emails, and move to a new workflow in a tool they have never seen. There is always the "but it was working", and “that’s the way we have always done it”, the challenge is to show that the new way will bring more productivity and precision to the team, freeing time from the staff to deal with other activities that an automated system still cannot do. The current market is less sci-fi and more business.
All of the human challenges involved in a used car sale are also involved in implementing a data science project. People have habits, preferences, power play, sympathy or antipathy, and everything else that defines human relationships since we decided to get down from the trees and organize ourselves into groups. Understanding this fundamental aspect of business development can help you to be much more successful in your next projects, whether they be data science or lemonade sales.