In my last post Executive Summary: Strategic Data Science, I have summarized what Data Science is and what it consist of. Moreover, you need to deploy a strategy that helps you manage transformation to a data-driven business.
Today, you will see that a strategy for data science can be handled just like any data strategy. And if you already have a data strategy deployed, e.g. as part of your governance or architecture initiative, then you will see why and where it is affected.
As written in Executive Summary on EA Maturity, having a map knowing where you are and where you want to go to helps a lot in finding a way.
Maturity
If you are working with maturity models, you typically do this on a yearly basis. For chosen capabilities you identify current vs target maturity e.g. ranked from level 1 to 5.
The first thing you need to understand is that introducing data science for the first time reduces your overall maturity at once. Why is that?
Maturity is measured in terms of capabilities. And if you take a look into those capabilities you will find that you need to adapt them. There typically are a dozen or so like vision, objectives, people, processes, policies, master data management, business intelligence, big data analytics, data quality, data modeling, data asset planning, data integration, and metadata management.
I will pick only a few as examples to make things clear. Let’s pick vision, people, and technology.
Selected Capabilities for Explaining Maturity of Data Strategy
Vision
Say you have a vision like: “Providing customer care that is so satisfying, that every customer comes back to us with a smile”. That’s a very strong statement, but how about: “Keeping every customer satisfied by solving all problems before complaining”. Wow, even stronger. It is possible because Data Science allows you to predict what others can’t.
People
Probably, you already have a data architect. But, the classic data architect focuses on architecture, technology, and governance issues. This is OK, but you also need some data advisor focusing on unseen solutions for the business. Someone telling you to combine customer data with product usage data increasing your sales. And perhaps even telling you from which of your precious data you can create completely new data-driven products you can sell.
Technology
Probably, you also have an inventory telling you which data sources are used in your applications. Adding Data Science as rapidly growing discipline to the equation, you may find that you will have to revise your technology portfolio. It is rapidly growing and changing and, therefore, needs to be governed to a certain amount (freedom vs standardization).
Following list shows selected technologies that are most often used in Data Science (ranked from left to right).
Programming Languages: SQL, Python, R
Relational Databases: MySQL, MS SQL Server, PostgreSQL
Big data platforms: Spark, Hive, MongoDB
Spreadsheets, BI, Reporting: Excel, Power BI, QlikView
Moreover, there is a shift in who is actually using these technologies like Leadership, Finance, Sales, and Marketing. And more often without dedicated enterprise applications because data analysis is very dynamic and has a lot of try and error to it.
Conclusion
From these view capabilities out of a dozen+ it has become clear that Data Science Strategy easily fits into an overall Data Strategy. There is no need to reinvent the wheel. Instead, adapt your existing or favorite Data Strategy to incorparate Data Science.
If you as C-level are already using or plan to use data science you probably pursue the goal to increase your market share by making predictions that others can’t. You might think that there is no need for strategic management of data science. Actually, that’s as far from the truth as it can get. But, why is that? It is because there may be a lot of complexity indicated by the figure below and discussed in the following.
The Flower of Complexity
Definition
First, let’s take a look into the definition
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data.
source: wikipedia
There are a lot of keywords in this rather short definition that should raise your eyebrows: inter-disciplinary, methods, processes, algorithms, systems, many.
Basic Method
Now, let’s pick a keyword from above and dig deeper e.g. recall the basic scientific method:
Find a question
Collect data
Prepare data for analysis
Create model
Evaluate model
Deploy model
Doesn’t sound overly complex, but let’s finally deep dive. Which of those phases do you think is responsible for most of the effort spent? It is the step that roughly amounts to 80% of the overall process! There are even several synonyms for it like data munging, data wrangling, and data cleaning or cleansing. You guessed right, it is phase three. Its complexity is mainly driven by the number of different data sources, the number and complexity of involved data structures, and sometimes also mixed with unstructured data.
Conclusion
We can go on like this for a while, but I do not want to bore you with the details. So, let’s summarize first and I will deliver a compressed list of further aspects afterward which you may take note of or skip altogether.
Forecast: If you do not strategically manage data science in your enterprise you may expect another area of proliferation which you should urgently avoid!
Solution: I can help you with that. My approach is to combine data science with an architecture development cycle. Proven methods and tools will help you to master the inherent complexity and get the most out of data science for your business. You can leave the details to me.
The Details
Data science as a discipline delivers methods like the one we have discussed above. Yet, it also
combines subjects like
computer science
math & statistics
business domain knowledge
involves interdisciplinary roles like
Data Engineer
Data Scientist
Business Analyst
Product Owner / Project Manager
Developer
User Interface Specialist
implies many skills like
programming
working with data
descriptive statistics
data visualization
statistical modeling
handling Big Data
machine learning
deploying to production
is done with many tools like (only top 3-4 in each category named here)
programming languages
SQL
Python
R
…
databases
MySQL
MS SQL Server
PostgreSQL
Oracle
…
Big data platforms
Spark
Hive
MongoDB
Amazon Redshift
…
Spreadsheets, BI, Reporting
Excel
Power BI
QlikView
…
And the list is growing steadily. A little exhausting, isn’t it? At this point latest you should be convinced that data science needs strategic attention.
The Ivory Tower Syndrome describes an often seen drift of EA initiatives dealing mostly with themselves focusing solely on strategic management while already having lost traction and therefore acceptance by the ground force.
Why does it happen?
Some EA initiatives tend to focus more on strategic reporting to upper levels and try to govern by code of law only. But, the ground force in terms of actual projects and product development, needs support for their huge amount of concise work that has to be done with granted budget and milestones. In a law-only approach they feel like not being supported but only punished (missing the carrot in “carrot and stick”).
A common misconception of EA initiatives of companies is that they can work like political government and urban planning. But, as analogy of how e.g. power grids are managed (or water grids, gas grids, metro systems, and so on), companies often only provide a fraction of needed services.
How to avoid and improve?
An adequate EA authority shall be balanced with a compact code of law.
The EA authority shall collaborate with other authorities like revision and portfolio manager.
Do not be jurisdictional because companies have no jurisdiction compared to politics and urban planning.
Align objectives of managers with your EA strategy or vice versa.
Implement cost saving services for each of your laws (get the tiger some teeth).
Include projects and product development in a community. Communicate outstanding achievements. Recognized employees drive acceptance for you!
Imagine that you as CIO are in need or want to establish or improve Enterprise Architecture in the company.
No matter where you start and go, it’s necessary to know where you start and go – just like in Google Maps routing e.g. from your home to a client. You know exactly where your home is and so does Google Maps. And you had better know where your client is too – again, so does Google Maps. Moreover, you or Google Maps know possible paths from your home to your client. This is the foundation for being able to do the routing.
Of course, your situation is more complexe since you need to move in time from the present situation to a target situation in the future. On the other hand, it gives you a lot of options. You can construct new efficient paths getting rid of old, slow, costly ones.
From Home to Target
So when starting this Enterprise Architecture initiative of yours, you should start building or updating your EA map. In consequence, you capture what you know about your starting point, your strategy, your target, and which paths there are or could be.
(This summary is an extract of my earlier post Hello Mr EA what you should expect when starting a new project establishing or improving your Enterprise Architecture. Both posts together are also a very good example to present an aspect to different stakeholders – CIO expecting decision-oriented information, Head IT Governance or Enterprise Architect zooming in expecting deliverables, methods, and tools)