By Koo Ping Shung - Data Scientist/Analytics Instructor at Singapore Management University
Often, companies know the potential value that Analytics/Data Science can bring about and after reading several articles and realizing that they have data, they have the impression that, "Yes! I want to do Analytics (or Data Science)! I have lots of data collected!" I hope it is that simple but that is not the case.
Management would need to see Analytics/Data Science like a manufacturing process where the raw data collected is the' raw materials' and the insights 'manufactured' is according to the specifications of the 'products'. The 'products' would be referring to the business questions/challenges that needed to be overcome or answered. That means that we need to have the right data as 'raw materials' so that we can 'manufactured' the insights needed and be put together to answer the business questions/challenges. So to start of the Analytics/Data Science journey for businesses, they have to go through the following stages.
1) Getting Data in Order
For this part, there are two tracks to take note of. Firstly is data management. This part involves getting the data to the sufficient quantity and quality for data to be meaningfully analyzed. Characteristics of data to pay attention are for instance, accuracy, timeliness, missing data handling and so on. Besides these, the other supporting process such as data validation, back up process, data update policies, assignment of roles & responsibilities and so on have to be worked on as well.
The second track is collecting the 'right' data. To start of would be to identify the 'low hanging fruits'. Finding out what currently matters the most to the business and start planning out the data that would be required to gain insights on these matters. While planning out, always look for existing data first before venturing into data that are not collected. After exploring the existing data, then start looking out for data that are cheap and can be acquired easily. Some of the characteristics to take note of at this time are for instance the time period in which the data should be collected (i.e. collect 1 or 2 years worth of data) and also the granularity of the data. (i.e. should month or weekly data be collected.)
2) Getting the BI/Reporting Process in Order
After figuring out the data, the next step is to now plan out the reporting process. These reports should be essential reports that are needed to answer common day-to-day business operational question, for instance, how much stock have I sold, how much inventory do I have left, how many new customers have I acquired and so on. If a business is to start working on Data Science, it cannot run away from setting up a great reporting process to support it.
By planning out the reporting process, business would also be planning out the ETL (Extract Transform & Load) process. It need not be complicated since there are only a few reports to be prepared at this stage but the whole ETL process can become a big ball of spaghetti later on so documentation is needed. The perception here might be the need to build a 'sophisticated' data warehouse but it need not be the case depending on the amount of data that is needed for the reporting process. It will not be making sense to spend thousands of dollars on a data warehouse only to generate a few reports. But scalability is something the business would need to consider when planning out the ETL process together with the reporting process.
3) Let's do Data Science
After planning out and execute the first two stages, the next step is to look at the doing Data Science. Start exploring current processes that can be beneficial from Data Science. Processes that are capital and/or labor intensive and collects huge amount of data are prime candidates to look through and see if Data Science can work. Processes that need to be scaled up or have consistent result can also benefit as well.
Again, always go for the lowest hanging fruits to gain experience first before moving on to the more complicated ones. Go for the quick wins so that there is constant momentum in the business to move forward with Analytics.
I would like to stress here is that many management personnels have the impression that from stage 1 to stage 3 needs a lot of investment but it need not be the case. What is more important is that investment should go hand-in-hand with the actual value gotten so that the momentum is preserved and Data Science/Analytics can continue to benefit the business. Just keep in mind the need to scale up later, when the business have learned more about the benefits and implementation of Data Science/Analytics.
For some processes that can benefit from Data Science, the likely case is that these decision (statistical) models will need to be embedded into the IT systems. Thus proper planning is needed.
Things does not stop here. With each model that is embedded into the processes, businesses have to go back to the reporting process to add in model validation reports. This is to ensure that models are properly monitored for their 'predictive' power, making sure they are at an acceptable level and if not what should be done. Again for this part, proper policies should be drawn out so as to ensure the right steps are taken, allowing the business to continue benefiting from Data Science/Analytics.
In conclusion, as the quote goes "Rome is not built in a day." For businesses to benefit from Analytics/Data Science, proper planning is needed so as to ensure that a strong foundation is created. For Data Science/Analytics, there is a constant need to go back to the previous stages so that value can be given. Doing Analytics/Data Science in businesses is like growing a pyramid where the base (Data in Order) and the middle section (Reporting Process) need to be built up strongly before the pyramid can grow taller. The earlier you start climbing up the learning curve, the more likely you can build a higher pyramid. So start building!