Categories
architecture data executive

Executive Summary: Data Strategy 2.0

#architecture #clarity #velocity #direction 

In my last post Executive Summary: Strategic Data Science, I have summarized what Data Science is and what it consist of. Moreover, you need to deploy a strategy that helps you manage transformation to a data-driven business.

Today, you will see that a strategy for data science can be handled just like any data strategy. And if you already have a data strategy deployed, e.g. as part of your governance or architecture initiative, then you will see why and where it is affected.

As written in Executive Summary on EA Maturity, having a map knowing where you are and where you want to go to helps a lot in finding a way.

Maturity

If you are working with maturity models, you typically do this on a yearly basis. For chosen capabilities you identify current vs target maturity e.g. ranked from level 1 to 5.

The first thing you need to understand is that introducing data science for the first time reduces your overall maturity at once. Why is that?

Maturity is measured in terms of capabilities. And if you take a look into those capabilities you will find that you need to adapt them. There typically are a dozen or so like vision, objectives, people, processes, policies, master data management, business intelligence, big data analytics, data quality, data modeling, data asset planning, data integration, and metadata management.

I will pick only a few as examples to make things clear. Let’s pick vision, people, and technology.

Selected Capabilities for Explaining Maturity of Data Strategy

Vision

Say you have a vision like: “Providing customer care that is so satisfying, that every customer comes back to us with a smile”. That’s a very strong statement, but how about: “Keeping every customer satisfied by solving all problems before complaining”. Wow, even stronger. It is possible because Data Science allows you to predict what others can’t.

People

Probably, you already have a data architect. But, the classic data architect focuses on architecture, technology, and governance issues. This is OK, but you also need some data advisor focusing on unseen solutions for the business. Someone telling you to combine customer data with product usage data increasing your sales. And perhaps even telling you from which of your precious data you can create completely new data-driven products you can sell.

Technology

Probably, you also have an inventory telling you which data sources are used in your applications. Adding Data Science as rapidly growing discipline to the equation, you may find that you will have to revise your technology portfolio. It is rapidly growing and changing and, therefore, needs to be governed to a certain amount (freedom vs standardization).

Following list shows selected technologies that are most often used in Data Science (ranked from left to right).

  • Programming Languages: SQL, Python, R
  • Relational Databases: MySQL, MS SQL Server, PostgreSQL
  • Big data platforms: Spark, Hive, MongoDB
  • Spreadsheets, BI, Reporting: Excel, Power BI, QlikView

Moreover, there is a shift in who is actually using these technologies like Leadership, Finance, Sales, and Marketing. And more often without dedicated enterprise applications because data analysis is very dynamic and has a lot of try and error to it.

Conclusion

From these view capabilities out of a dozen+ it has become clear that Data Science Strategy easily fits into an overall Data Strategy. There is no need to reinvent the wheel. Instead, adapt your existing or favorite Data Strategy to incorparate Data Science.

Categories
architecture data executive

Executive Summary: Strategic Data Science

#architecture #clarity #velocity #direction #data

If you as C-level are already using or plan to use data science you probably pursue the goal to increase your market share by making predictions that others can’t. You might think that there is no need for strategic management of data science. Actually, that’s as far from the truth as it can get. But, why is that? It is because there may be a lot of complexity indicated by the figure below and discussed in the following.

The Flower of Complexity

Definition

First, let’s take a look into the definition

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data.

source: wikipedia

There are a lot of keywords in this rather short definition that should raise your eyebrows: inter-disciplinary, methods, processes, algorithms, systems, many.

Basic Method

Now, let’s pick a keyword from above and dig deeper e.g. recall the basic scientific method:

  1. Find a question
  2. Collect data
  3. Prepare data for analysis
  4. Create model
  5. Evaluate model
  6. Deploy model

Doesn’t sound overly complex, but let’s finally deep dive. Which of those phases do you think is responsible for most of the effort spent? It is the step that roughly amounts to 80% of the overall process! There are even several synonyms for it like data munging, data wrangling, and data cleaning or cleansing. You guessed right, it is phase three. Its complexity is mainly driven by the number of different data sources, the number and complexity of involved data structures, and sometimes also mixed with unstructured data.

Conclusion

We can go on like this for a while, but I do not want to bore you with the details. So, let’s summarize first and I will deliver a compressed list of further aspects afterward which you may take note of or skip altogether.

Forecast:
If you do not strategically manage data science in your enterprise you may expect another area of proliferation which you should urgently avoid!

Solution:
I can help you with that. My approach is to combine data science with an architecture development cycle. Proven methods and tools will help you to master the inherent complexity and get the most out of data science for your business. You can leave the details to me.

The Details

Data science as a discipline delivers methods like the one we have discussed above. Yet, it also

  • combines subjects like
    • computer science
    • math & statistics
    • business domain knowledge
  • involves interdisciplinary roles like
    • Data Engineer
    • Data Scientist
    • Business Analyst
    • Product Owner / Project Manager
    • Developer
    • User Interface Specialist
  • implies many skills like
    • programming
    • working with data
    • descriptive statistics
    • data visualization
    • statistical modeling
    • handling Big Data
    • machine learning
    • deploying to production
  • is done with many tools like
    (only top 3-4 in each category named here)
    • programming languages
      • SQL
      • Python
      • R
    • databases
      • MySQL
      • MS SQL Server
      • PostgreSQL
      • Oracle
    • Big data platforms
      • Spark
      • Hive
      • MongoDB
      • Amazon Redshift
    • Spreadsheets, BI, Reporting
      • Excel
      • Power BI
      • QlikView

And the list is growing steadily. A little exhausting, isn’t it? At this point latest you should be convinced that data science needs strategic attention.