Machine Learning for Pharmaceuticals | Blog Series | Part 3

Preparing for Your Digital Transformation

Where the previous blog post focused on how we initiated our first steps in developing a predictive quality solution with the help of a proof of concept, this final part is all about guiding you to set your first steps in developing a predictive component within your own business. We will focus on the basic ingredients required for a predictive solution, how a proof of concept can help you get started and what system components are required from an architectural point of view. The blog will also explain you how to optimize your concept including a continuous improvement strategy, so you can ultimately transform this concept into an effective predictive solution within your company.


Look at What You Have

As explained in the first blog part of this series, a machine learning solution is comprised of input and outcome variables. With the help of various statistical modelling techniques, models can be generated that capture the relationship between these input and outcome variables. By understanding how inputs influence outcomes, the model can eventually predict what the output will be on the basis of the input. The bigger the dataset of inputs and outcomes eventually gets, the better prediction will become (Machine Learning in a nutshell).

Logically data is key. If you have a business domain that already collects input and outcome data, such as our quality example in the previous blog, this can be a potential area to investigate. Best examples of data sources are for example sensor information compared with breakdown data from your EAM system (such as used in predictive maintenance) or sales order information or customer specific data versus sales revenue outcomes from the ERP system (for marketing purposes). Therefore, looking at internal data sources is the ideal way to start.

Additionally this data can be combined with data coming from external sources, such as market information, social media or web-based data sources (weather forecasts, stock markets, etc.) to enrich the input or outcome data. When all data is collected, you can start your analysis. But keep it small, for a proof of concept you should focus only on a specific subdomain of the area you are investigating before scaling up to a full-blown application. You can take a single production line, customer area or sampling domain as a start, to see if relationships between variables can be established.


Optimizing Your Dataset

When potentially interesting data-sources for setting up a model are identified, the next step is analyzing and optimizing this dat. First of all, the gathered data has to cleansed, taking out outliers or not-trustworthy values that are inherent to larger datasets, otherwise interpretations can deviate from what is actually happening out there. Also, and especially in time-series it is important to have usable data at both ends, at the income and outcome side over the same period, whilst taking the effectual delay between both into account.

When done the initial statistical analysis can be run, by using techniques such as regression across all variables you can already identify where relationships are significant. This can help you get started and see if there are any useful cause-effect influences are taking place. These can directly be implemented in your proof of concept. Typical tools to do so are Python and applications using R (such as IBM’s SPSS).

However never take these relations for granted. You also require theoretical (or logical) induction to validate that these relationships are indeed logical in real-life. Use your internal experts for this, they either know what is going on or are able to find literature supporting the relationships you find. The best-known statistical example, where sales data of ice-cream was related to murder rates leading up to a cause-effect relationship, perfectly shows that logic is also required to understand that this has nothing to do with each other. Of course the temperature outside was the actual variable, which is a factor influencing both ice-cream sales and murder rates.

For the other way round, you can also use domain expertise and/or studies (literature) to find relationships that are related to your input and outcome variables containing data you do not have. This can lead up to the conclusion that viable data sources are missing in your current dataset. Anticipate on this and, if possible, start collecting this data as soon as possible too. The bigger your dataset, the more viable your results and thus leading to a fruitful concept.


Setting Up the System

For machine learning solutions the architecture is crucial. You need to connect all data sources and eventually trigger actions based on the outcomes. Since input variables can be supplied from numerous different sources: direct sensor data from SCADA systems, ERP data from your SAP system and sources coming from the web are just small examples. Consequently an open central component is required that allows to connect these sources, but also be able to run your machine learning application. Enter the Cloud Platform.

Multiple big players in the IT domain provide for these kind of cloud platforms, which contain various connecting sockets including related services to successfully ingest your data. SAP has its SAP Cloud Platform (SCP), Google has one and AWS is another key player in this domain. Each have their advantages, such as security and connectivity to other systems. Keep in mind that this should be combined with proper data-warehousing including required techniques for storing and fetching the data. Software frameworks as Hadoop can help you in delivering the right data at the moment you need it, either as raw input or already transformed (for example aggregated data).

When the input variables are ingested properly, an AI component can be set-up. The cloud platforms mentioned above each provide for such an AI service. SAP Leonardo, AWS Sagemaker and Google AI platform are each perfect examples of these components. Within these components you set up operating models for the previously identified relationships, operating on your cloud platform. Some of these even allow you to identify relationships within their suite of AI applications.


Running Your Solution

When the data is ingested and the models are in place, the prediction can start. Basically it will give statistical outcome possibilities as a %-based result upon the inputs that are flowing through the model. Accordingly you can act upon these pre-determined outcomes manually. However, these outcomes can also serve as input for triggering actions to the systems connected to the cloud platform, such as triggering a maintenance notification in your subsystem via predictive maintenance or raise a quality deviation in the Quality Management System such as elaborated upon in our predictive quality solution.

Please keep in mind that the outcomes of your model can be disappointing. Your predictions are not as good as hoped for, however having established proper relationships can be the basis for altering your model or setting up a continuous improvement loop. It can be a starting point to measure extra variables at other places within your company, in other intervals or with different methods. Also look for technological improvements, such as new sensor types or new possibilities delivered as a service by your AI software provider like AI vision capabilities for camera’s.

When your outcomes are successful, your concept is proven. Now you are ready to scale up. On the basis of the proof of concept outcomes you can already make estimations on the costs, implementation time and the eventual business benefit the AI component will deliver. This will give management a better way to grasp the advantages the AI solution will bring to your company.


The Next Step: Prescriptive Analytics

Where predictive analytics focuses on a possibility of an outcome event to occur, prescriptive analytics will even provide you with possible actions to take. It gives you various options to act upon predictions made, and also inform on what their respective consequences are allowing you to select the follow-up activity. This will help you build a truly intelligent enterprise, where responsible managers have all actions at their fingertips. Instead being reactive, proactive influence can be exercised to alter a course of action towards more favorable outcomes.

With this future perspective this blog series comes to an end, where I have taken you on a journey through the Artificial Intelligence basics, providing you an example of how we apply proof of concepts in this domain to help our customers and eventually give guidance in making your first steps in this new technological domain. I hope you’ve enjoyed reading it as much as I did In investigating the possibilities of AI.

Feel free to contact us, we are always open to support you in conceptualizing your innovative thoughts in this territory. The combination of our experience in both the SAP and life sciences domains, allow us to support you in getting a jump-start and not missing the bandwagon when it comes to exciting new technological possibilities.


Learn More about Life Science Industry

Discover Our Industry Solution – it.lifesciences

Danny Groothuis
Danny Groothuis
Consultant Life Sciences industry

Danny Groothuis is an all-round functional SAP consultant within the Life Sciences industry. With his roots in the plant maintenance area of expertise and propelled by an entrepreneurial background, his skillset allows Danny to be keen on the opportunities that new technologies can bring to the table. It gives him the ability to translate concepts into real-world business possibilities that provide solutions to the challenges our customers are facing. On the cutting edge of technological opportunities, customer demands and regulatory awareness is where Danny truly flourishes.

itelligence is now NTT DATA Business Solutions
We will be even stronger for our clients, partners and employees.
Check out our new website
Contact Us

Have questions? Please contact us.