Executive Summary: Strategic Data Science


#architecture #clarity #velocity #direction #data

If you as C-level are already using or plan to use data science you probably pursue the goal to increase your market share by making predictions that others can’t. You might think that there is no need for strategic management of data science. Actually, that’s as far from the truth as it can get. But, why is that? It is because there may be a lot of complexity indicated by the figure below and discussed in the following.

The Flower of Complexity

Definition

First, let’s take a look into the definition

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data.

source: wikipedia

There are a lot of keywords in this rather short definition that should raise your eyebrows: inter-disciplinary, methods, processes, algorithms, systems, many.

Basic Method

Now, let’s pick a keyword from above and dig deeper e.g. recall the basic scientific method:

  1. Find a question
  2. Collect data
  3. Prepare data for analysis
  4. Create model
  5. Evaluate model
  6. Deploy model

Doesn’t sound overly complex, but let’s finally deep dive. Which of those phases do you think is responsible for most of the effort spent? It is the step that roughly amounts to 80% of the overall process! There are even several synonyms for it like data munging, data wrangling, and data cleaning or cleansing. You guessed right, it is phase three. Its complexity is mainly driven by the number of different data sources, the number and complexity of involved data structures, and sometimes also mixed with unstructured data.

Conclusion

We can go on like this for a while, but I do not want to bore you with the details. So, let’s summarize first and I will deliver a compressed list of further aspects afterward which you may take note of or skip altogether.

Forecast:
If you do not strategically manage data science in your enterprise you may expect another area of proliferation which you should urgently avoid!

Solution:
I can help you with that. My approach is to combine data science with an architecture development cycle. Proven methods and tools will help you to master the inherent complexity and get the most out of data science for your business. You can leave the details to me.

The Details

Data science as a discipline delivers methods like the one we have discussed above. Yet, it also

  • combines subjects like
    • computer science
    • math & statistics
    • business domain knowledge
  • involves interdisciplinary roles like
    • Data Engineer
    • Data Scientist
    • Business Analyst
    • Product Owner / Project Manager
    • Developer
    • User Interface Specialist
  • implies many skills like
    • programming
    • working with data
    • descriptive statistics
    • data visualization
    • statistical modeling
    • handling Big Data
    • machine learning
    • deploying to production
  • is done with many tools like
    (only top 3-4 in each category named here)
    • programming languages
      • SQL
      • Python
      • R
    • databases
      • MySQL
      • MS SQL Server
      • PostgreSQL
      • Oracle
    • Big data platforms
      • Spark
      • Hive
      • MongoDB
      • Amazon Redshift
    • Spreadsheets, BI, Reporting
      • Excel
      • Power BI
      • QlikView

And the list is growing steadily. A little exhausting, isn’t it? At this point latest you should be convinced that data science needs strategic attention.


Leave a Reply