Leveraging Time Aggregated Data and Advanced Machine Learning Techniques to Improve Semiconductor Demand Forecasting at an Anonymized Semiconductor Company

Semiconductor Demand Forecasting

“Semiconductor shortage still a concern, says Bajaj”; “Chrysler Parent Stellantis Needs Chips. The Shortage Is Still Hurting the Auto Industry”, and; “Unfortunately, the global chip shortage will continue”

It’s just a grasp of recent headlines that have to do with the semiconductor industry of the last years. Forecasting is extremely difficult because of a wide variety of factors. Nonetheless, meeting demand in this industry is essential for functioning products. Michiel de Hoop conducted a case study to improve semiconductor demand forecasting with the use of advanced machine learning techniques and time aggregated data.

Where it started

When a 1 percent improvement in forecasting accuracy can lead to a total cost reduction of 1.2 million dollars in terms of inventory on hand and inventory carrying costs, it becomes interesting. Especially when advanced machine learning techniques are currently not being used and the forecasts are based on historic requested orders and distributer resale data.

In the (anonymized) case, twelve subgroups of aggregated data were used, where the 6TG level is the lowest level of aggregation besides the individual products and the MAG level is a higher level of aggregation. The forecasts within the company are done at the 6TG level but the evaluation metric is the volume-weighted MAPE computed at the MAG level. After careful consideration, five forecasting techniques were selected:

  • Linear Regression (LR)
  • Extra Trees (ET) Regression
  • Fully connected Recurrent Neural Network (RNN)

Furthermore, data preparation was essential for the success of this research, as production data was quarterly and demand monthly. Relevant datasets that were included in the prediction were, among others; Design Win Data*, Distributer Inventory, and Point of Sales (reseller data of company’s products at the distributors)


The different prediction techniques and their optimal parameter settings were compared with each other, as well as with existing techniques of the company. It was found that ET regression and LSTCN provided the highest accuracies. The substantial results showed that the ET model outperformed the LR, ARIMAX and RNN models by maximally 4.8%, 7.9% and 11.3%, respectively. The LSTCN model outperformed the LR, ARIMAX and RNN models by, respectively, 7.7%, 6.8% and 17.2%.

The models provided similar accuracies to the ones of the company. However, the models were also trained and tested on pre-covid 19 data. The results showed that, despite considerably less data points, the ET and LSTCN models performed distinctly better than the company’s model, with ET model’s accuracy improving up to 10.4% and LSTCN’s accuracy increasing with 5.2%.

Eventually, the company decided to add the new forecast techniques to its existing forecasting models, to find the pest performing one for different cases. While the ET regression has a higher overall accuracy, the computational expenses are also higher compared to the LSTCN. This consideration is case-specific and therefore the generalizability is also depended on the circumstances. However, this research shows that the use of time aggregated data and machine learning techniques can tremendously increase accuracy.

*Design Registration/Design Win Data: “This dataset contains Design Registrations and Design Wins. Design Registrations are design opportunities, containing expected future demand quantities with yearly or quarterly intervals, that are approved by the company after being registered in a database by distributors. Designs Wins are Design Registrations with a realized revenue of at least 1,000 dollars. Hence, this data is considered the relevant time aggregated data.”