Part 2: Developing a Machine Learning Solution: A Case Study in Predictive Quality

How do you go from concept to a concrete solution? In this part, we illustrate the journey with a real case: using machine learning to predict water quality in a pharmaceutical manufacturing process. This example will show how we applied the principles discussed in Part 1 – identifying a valuable use case, leveraging existing data, and running a proof-of-concept – to solve a practical problem. We’ll cover the initial situation, the approach to building the model, and the results achieved. 

Danny Groothuis | July 25, 2020 | 10 min.

Status Quo: Manual Water Monitoring and Its Limitations

The problem: Pharmaceutical production relies heavily on high-purity water (for formulations, cleaning, etc.), and that water must meet strict quality standards (e.g., limits on microbial count, organic content, conductivity). In our case, the company had a Purified Water system (Water for Injection – WFI) with multiple points-of-use in the plant. The traditional approach to ensure water quality was entirely reactive: 

  • Operators manually took water samples daily from each point and delivered them to the QC lab. 
  • Lab analysts performed tests (such as microbial colony counts, Total Organic Carbon, conductivity) and reported results, usually the next day. 
  • If any result exceeded the specification, it triggered an investigation and potentially a halt of using that water until the issue was resolved (which could impact production schedules). 

This manual QC regime had several limitations:

  1. Delay in detection – If something went wrong in the water (say a bacterial spike), you’d only catch it after the sample is tested – perhaps 24 hours later depending on the time the sample analysis takes. In the meantime, that water might have been used in production, risking contamination of products.
  2. Inefficiency and labor-intensive – Collecting and testing dozens of samples daily is time-consuming. It also leads to paper trails and data transcription steps that could introduce errors. The process is also error-prone, with manual handwritten actions such as timestamp capture of when the sample was taken and the linking between sample ID and water tap point.
  3. Sparse data – A daily sample is just one data point per day. The water system operates continuously, so a lot can happen between samples. You could miss momentary issues. For example, if microbial levels rose for a few hours and then fell, a daily sample might miss it entirely, giving a false sense of security.
  4. Reactive actions – When an out-of-spec (OOS) is found, often the response is to intensify monitoring (take more samples) or perform a system sanitization. But by then, you might already have affected product or wasted time.

Figure 2: Graphical example of measuring Total Organic Carbon. These graphs illustrate the difference between periodic sampling (left) leading to a fragmented graph. Containg only ‘snapshots’ of the water quality, and monitoring the water quality continuously (right).
As can be seen, the periodic sampling procedure do not catch any fluctuations between sampling moments.

The goal was clear: can we catch water quality issues earlier, or even prevent them, by using the data the system generates?

Notably, the water system was already equipped with online sensors and automation:

  • A continuous loop circulated the water (to ensure recycling of high-quality water). Sensors measured parameters like TOC, temperature, flow rate, pressure, and conductivity in real time.
  • The system had periodic sanitization (pyrogenic steam to kill microbes) and flush routines, which were logged.
  • The sensor readings were stored in a process historian, but historically they were looked at only if something went wrong (“let’s see what happened around the time of that OOS”).

So, aside from lab results, there was a wealth of process data not being fully utilized. The hypothesis was that by analyzing the continuous sensor data alongside the daily lab results, we might detect patterns indicating an upcoming quality drift. For example, maybe a drop in water circulation flow at a certain point combined with a slight temperature increaser for several hours correlates with a higher microbial count the next day.

Connecting the Data Points

We formed a small cross-functional team: a QC lab specialist, a process engineer for the water system, an IT data engineer, and a data scientist. The first step was gathering and integrating the data:

  • We pulled historical lab results for all water samples over the past couple of years (from the LIMS). This gave us the “ground truth” of water quality (e.g., microbial counts, which were typically very low or zero, with occasional spikes).
  • We extracted the historical sensor data from the water system’s data historian for the same period. This included readings taken every few minutes for temperature, flow, pressure, conductivity, tank levels, valve statuses, etc.
  • We compiled a timeline of maintenance or sanitization events (from maintenance logs and the system’s automation records), since those events can cause temporary disturbances (for example, after a hot sanitization, you often see a short-time increase in TOC as biofilm gets flushed out).

Now, the key was to make these data sources talk to each other. Each lab sample had a timestamp and a location (which point-of-use it came from). We aligned the sensor data to those: for each sample result, we gathered the sensor readings from the preceding 24-48 hours for that sample’s location in the loop. Essentially, each lab result (good or bad) became a data point labeled with “quality outcome,” and the features were the prior sensor trends.

Figure 3: All collected variables that were utilized for the determination of relationships.
The identified relationships can be turned into statistical models allowing to provide for predictive capabilities on water quality.

It quickly became apparent that certain sensor signals were indeed correlated with quality changes. For example, we discovered:

  • If the water circulation pump speed dropped (or the flow rate in the loop dropped) below a threshold for an extended period, there was often a slight rise in microbial counts subsequently. This made sense: low circulation can create spots where water does not move, allowing microbes to multiply.
  • Temperature is critical. The system was designed to keep water at ~80°C (hot) to prevent microbial growth. Whenever we saw a deviation where temperature fell under 70 °C for several hours due to heater issues, the next day’s microbial sample quite often came back elevated.
  • We also saw that right after certain maintenance events (like replacing a filter or opening the system), there were spikes in particle counts or TOC. The team knew this (“after maintenance, the next sample is often a bit off”), but now we had data to quantify it.

The moment of insight – our eureka – was realizing that by combining these data, we could create a predictive model. We were essentially turning what was once just tribal knowledge or guesses into a data-driven predictive algorithm.

Importantly, the process engineer on the team validated these correlations. Nothing looked completely off; rather, the model was surfacing known cause-effect relationships and quantifying them. This gave us confidence to proceed – the found features where validated by the SME:

  • Stagnant flow leads to risk of microbes (confirmed by microbiology theory).
  • Lower temp leads to risk of microbes (makes sense, bacteria thrive at moderate temperatures vs very hot).
  • Maintenance event likely leads to disturbance increasing particle counts and TOC (understood by QA).

We had to join data across different systems and timestamps with quite some manual effort. In hindsight, having a unified data platform (like a data fabric) would have saved time. In fact, this project later inspired our company to invest in a more integrated data infrastructure. Today, something like NTT DATA’s Data Fabric Accelerated for Life Sciences could connect the historian and LIMS data in a GxP-compliant way, making such analysis much faster. But for our POC, we did a one-off data merge and it worked.

Building the Model

With historical data in hand, we moved to the model-building phase. Our goal was to create a model that, given recent sensor readings, could predict the likelihood that the next lab sample would be out-of-spec (or above a warning threshold). We approached this step-by-step:

  1. Feature engineering – Instead of feeding raw high-frequency sensor readings into the model, which can be too granular and noisy, we derived summary features. For each sample, we extracted metrics like: “minimum and maximum circulation flow in last 24h,” “average water temperature in last 12h,” “min/max conductivity change in last 6h,” etc. We also included categorical features like “was there a maintenance event in last 12h (yes/no).” This condensed the data into meaningful indicators which are much easier to handle than millions of individual data points.
  2. Choosing a modeling technique – We tried a few, including logistic regression for predicting a binary outcome of Pass/Fail, and decision trees. Given the data size and desire for interpretability, a random forest classifier ended up working well. It’s an ensemble of decision trees that can capture nonlinear interactions but also provides feature importance, telling us which factors were most predictive.
  3. Training and validation – We trained the model on two years of data and reserved the most recent 6 months as a test set. The output was a probability that a sample would exceed certain limits. The results were encouraging: the model could predict high microbial counts with about 90% accuracy, and importantly, had a very low false-negative rate (Type II error).
  4. User interface – For the POC, we put together a simple dashboard. It showed the water loop diagram with real-time sensor data and a colored indicator for each point-of-use. We also included trend graphs and the model’s top factors (e.g., “Temperature drop detected” as a note) so users had context. This was our way of making the model’s insights explainable, which is crucial for user trust and for validation purposes.

5. Testing live: We then ran the model in real time (without yet relying on it for decisions) for a few weeks as a dry run. In that period, no OOS events actually occurred (thankfully), but the model did flag a couple of “yellow” warnings when it noticed conditions deviate from normal. In one instance, it turned out a valve was partially stuck, slowing flow; maintenance fixed it, and indeed no quality issue occurred. The team considered that a success: the model helped catch a process deviation that could have led to an issue if left unchecked. This also lead to an upgrade of the SOP and work instruction for maintenance activities on the water loop. Hygiene matters were improved and the sanitization approach was adapted to reduce the chances of impacting the water quality.

Figure 4: Overview of live sensor information, mapped along the water-loop in a graphical representation. The meters also contain threshold data, in order to indicate when important parameters are breached. All data in the overivew is updated every 5 seconds with fresh sensor information. (Click for more details)

Figure 5: The ‘Production Batch Overview‘ allows QA departments to retrospectively check the water quality used in batch production. It also includes a functionality to generate batch report documenation containing the critical parameters of all water that was used in producing a specific batch. This allows QA departments to directly have insight and provide proof that relevant parameters were in control. (Click for more details)

Figure 6: The holy grail of our solution: the predictive dashboard. The charts on the left contain the historical data of a certain dependent variable, where the middle and right set of charts already predict the water quality. The middle row of charts can look 1 hour ahead, the right row even 3 hours in the future. On the far right of the screen you can see all generated alarms on the future quality of the water. This allows companies to anticipated on possible threshold breaches (e.g determine to postpone production of a batch until water parameters are in control). (Click for more details)

The proof-of-concept demonstrated several benefits:

  • We could now get an early warning of water quality drift. Instead of waiting for the lab, the system would alert if conditions were trending poorly, essentially implementing continuous process verification for the water loop.
  • It reduced uncertainty – if the model stayed green, QA and operations had more confidence to continue as normal. Eventually this can enable real-time batch release on the water quality paramater with fewer lab tests.
  • The data analysis also led to process knowledge gains. For example, seeing how much a slight temperature change affected microbial risk prompted the team to adjust some control settings to keep the temperature more stable, thereby improving the process.

By the end of the POC, we had a working prototype of a predictive QC monitoring tool. We documented everything (data sources, model version, performance metrics) because we knew if this were to move to production, we’d need to validate it. The response from stakeholders was enthusiastic – they could clearly see how this would enhance quality assurance. An inspector or auditor could also appreciate that we were using modern tools to augment our quality system, not replace it: we still did daily samples until the model is fully validated, but we now had an extra layer of safety.

In a full implementation, we planned to integrate this with the existing SCADA system and send alerts via the operators’ interface – making it a seamless part of operations. We also discussed linking it to the deviation management process: e.g., if the model predicts a likely OOS, automatically create a notification or even a draft deviation record for QA to review, in line with SOPs, which is a step toward prescriptive analytics, where the system not only predicts but also triggers actions.

This case study underscores how a focused ML project can yield tangible benefits in pharma manufacturing. We took a familiar problem in water quality monitoring and solved it in a new way, made possible by the data that was already collected. The project took only a few months from start to finish and required no new hardware or costly infrastructure – just smarter use of existing resources.

Moving forward, the question became: how do we scale and deploy such solutions across other systems and sites? That’s where things like a solid data platform, model lifecycle management, and validation come into play.

In Part 3, we will shift from the nuts-and-bolts of this single project to a broader view of implementing AI solutions in a GxP-regulated organization. We’ll discuss the technical architecture (hint: leverage a data fabric) the validation and compliance considerations, and how to drive organizational change to support AI. Essentially, how to go from a successful pilot to a sustainable, value-driving AI capability across the company.

Stay tuned for Part 3, where we delve into deployment, scale-up, and the strategic aspects of pharma AI implementation.

Continue to part 3