Vernaio BLOG
Robert Meißner

The 10 Biggest Mistakes in Process Data Collection

Key Takeaways
  • Process and quality data should be captured in a way that truly reflects the reality of production.
  • Data collection is not just an IT task, but also requires input from experienced process engineers.
  • Pay special attention to timestamps, as they are one of the biggest sources of errors.
  • Ensure that your data collection infrastructure is aligned with your forecasting requirements (e.g., update interval).

Harnessing data from the Internet of Things (IoT) can transform operations and unlock significant economic value, potentially reaching up to $12.6 trillion by 2025, according to McKinsey & Company. A substantial proportion of this data is process and quality data such as that supplied by sensors, devices, PIDs, lab measurements etc. But to leverage this data, it must first be captured in a way that truly reflects the reality of production, which is critical to optimizing operations, improving efficiency, and ensuring product quality.

There are several pitfalls to collecting and storing data in a way that preserves its integrity. The best way to avoid them is to recognize that data collection is not just an IT task. Instead, it requires the combined input of IT and experienced process engineers, as well as attention to detail, to preserve the maximum value of your data. Let's dive into the top ten mistakes that can make efficient data analytics nearly impossible, and explore how to avoid them.

1. Inaccurate Timestamps

A data point's timestamp must always reflect its origin, not the time it's written to the database. Otherwise, you will get a distorted view of your production process. Always choose a date time format with a sufficient time resolution, e.g., seconds or even better milliseconds, including its timezone. Ideally, use the same timezone for all of your data, e.g. Unix epoch (UTC), to ensure that data points from multiple sources are properly aligned.

2. Inaccurate Lab Value Timestamps

Externally collected quality data, such as lab measurements, should not be mapped to the time the lab results are received.  To be usable with process data, it must always be time-stamped to the exact time the sample was produced. Accurate time stamping, ideally to the minute or second (depending on process dynamics), is critical for reliable analysis.

3. Aggregating and Rounding Raw Data

Both aggregation and rounding obscure the nuances of your production process. Instead, store data in its raw, non-aggregated, non-rounded form (except for high-frequency data, see below). This practice preserves the richness of your data and enables accurate live production simulations using historical data, which is critical for developing efficient predictive models.

4. Inadequate Frequency for Slowly Varying Signals

Slowly changing signals such as flows, temperatures, and pressures are often recorded at too sparse intervals, causing you to miss subtle but important changes. Make sure these signals are recorded at least once per minute to accurately capture important process changes.

5. Ignoring some PID-Controlled Variables

Storing only some of the proportional–integral–derivative (PID) controlled variables gives you only half the picture. Instead, store all available PID variables: Process Variable (PV), Set Points (SP), and Manipulated Variable (MV)/Control Variable (CV). This allows for more accurate and insightful analysis.

6. Improper Handling of High-Frequency Data

Storing high-frequency data continuously in its raw form may use too much storage and bandwidth. Instead, you can store samples at regular intervals (e.g. 2-second samples every 15 minutes), just make sure, the interval isn’t longer than the prediction interval. However, it's important to capture all relevant system states, especially transient ones. In addition, the use of Short-Time Fourier Transform (STFT) can help reduce storage and bandwidth requirements while preserving essential information.

7. Changing Signal IDs

Renaming signal IDs creates confusion and breaks continuity. For example, if signal_1 suddenly becomes signal_2, it appears that signal_1 has died and signal_2 has begun, even though they represent the same measurement. Maintain consistent signal IDs and only change a signal's name in the metadata to ensure that your data remains coherent and reliable.

8. Imputation During Data Extraction

Sometimes data needs to be extracted, e.g. in CSV format, for external analysis. Imputation is a practice that simply fills in missing values with a default value, potentially distorting the data and masking real problems. Ensure that data is extracted without imputation. Missing values should remain missing and accurately reflect the state of your data.

9. Infrequent Live Data Uploads

Uploading live data in large, infrequent chunks is a surefire way to miss critical insights. For instance, if you're predicting minute-by-minute outcomes, hourly data uploads just won't cut it. To keep your predictions relevant and timely, upload live data to the ML platform at least as frequently as your predictions. Avoid uploading large chunks of data to ensure you're always working with the freshest data.

10. Deleting Long-Term Data

Deleting data is like burning old books; you lose valuable historical information that could provide critical insights. Always archive data for long-term storage instead of deleting it. Historical data is essential for comprehensive analysis and long-term optimization efforts, and there are many inexpensive long-term storage options.

Avoiding these ten common mistakes in process data collection is critical in realizing the full potential of your data. Accurate, well-managed data is the foundation for insightful analysis, precise optimization, and continuous improvement of production processes. By involving experienced engineers in data collection decisions and adhering to best practices (e.g. Excel is not a database), you can ensure that your data truly reflects production realities and supports your operational goals.

Remember, data is not just a technical asset, it is a strategic one. When properly collected and managed, it can be a powerful tool for increasing efficiency, improving product quality, and driving innovation. Follow these guidelines to get the most out of your process data and stay ahead in the competitive world of production.

You want to know where you stand with your data? Contact us for a free data assessment!

Join Our Newsletter

Stay updated with the latest blog posts and don't miss out on our exciting updates!

Thank you!
You have been added to our newsletter. Please check your mail.
Oops! Something went wrong while submitting the form. If this problem persists, please contact hello@vernaio.com