Reinforcement learning for stocking decisions – ASML



Company Name / Department ASML
Contact Person

ASML: Douniel Lamghari-Idrissi,

Eindhoven University of Technology: Willem van Jaarsveld

Location Veldhoven, the Netherlands
Study programme(s)

Operations Management and Logistics (OML)

Community High Tech Supply Chain
Start Date

As soon as a qualified candidate is available.

Housing arranged by company



Company Description

ASML is the world’s leading provider of lithography systems for the semiconductor industry. ASML designs, develops, integrates, markets, and services these advanced systems used by customers – the major global semiconductor manufacturers – to create chips that power a wide array of electronic, communications, and information technology products.

ASML’s corporate headquarters is in Veldhoven, the Netherlands. Manufacturing sites and research and development facilities are located in Connecticut, California, Taiwan, and the Netherlands. Technology development centers and training facilities are located in Japan, Korea, the Netherlands, Taiwan, and the United States. Overall, ASML has more than 70 locations in 16 countries. ASML is traded on Euronext Amsterdam and NASDAQ under the symbol ASML.

With over 60 facilities, the ASML organization is extensive, yet it is characterized by close cooperation — internally and with customers and suppliers worldwide. The structure reflects ASML’s business strategy based on technology leadership, customer focus, and operational excellence. Consequently, innovation tailored to the strategic business segments is at the organization’s heart. And this is backed up by comprehensive logistics and customer support.

Supply Chain Management (SCM) ensures material availability to ASML’s factories and customers. Any break in material availability can have an unacceptable impact on our customers. So SCM must proactively identify and solve potential issues by working closely with our technical departments and customers and actively managing an extensive supplier base. With over 500 highly educated and motivated people, SCM has contributed to ASML being named the “Best Factory” in the Netherlands.

ASML has service contracts with its customers, in which performance agreements are made. For example, a maximum number of hours of downtime allowed per time period. A service contract is typically based on multiple systems that a customer procured from ASML. To satisfy the customers and avoid penalties for exceeding the maximum system downtime, ASML monitors the status of their service contracts. If, for example, ASML is running at 90% of the maximum downtime allowed for a particular service contract, an alert is generated that ASML is at risk, and interventions are needed to avoid more downtime.

Project Description

One of the main strengths of Deep Reinforcement Learning (DRL) is its adaptability and flexibility. One can encode different types of information in the state space for the method to make the decisions and use various design rewards to optimize for the chosen purpose. Therefore, the DRL approach has a good perspective for integrating existing systems within the company. For instance, the method works successfully with the provided base stock levels computed by the system within ASML (an adapted version was used in this research). A DRL approach, PPO, can be trained flexibly for various periods and demand forecasts. Unlike NORA (the name of the current system at ASML), PPO is more adaptive toward unfavorable situations and still can generate relatively acceptable results with sub-optimal base stock levels and less accurate forecast predictions. DRL is a generalized learning framework,  and this research shows that it can be adaptively applied within the close-to-real-life scenarios on a smaller scale.

In a previous project, PPO provided (in comparison with the version of the current system, NORA) about 9% less holding and transport costs. These results primarily relate to the instances with 1 or 2 SKUs. While the current DRL approach struggles with scalability for the number of SKUs used, it is still possible to make small-scale optimization for several most important SKUs in the system in the as-is state. However, the main recommendation is to continue the exploration of DRL applications in the planning. They managed to derive policies with sufficient training and tuning, beating the state-of-the-art heuristic. Therefore, potentially similar results can be achieved for larger-scale instances of the problem; however, further research is required to either support or dismiss such a hypothesis.

For example, Deep Controlled Learning (DCL) is a promising DRL technique for large-scale problems. No value approximation is required. If one has Q-values with potentially high variance, methods with value function approximation (PPO) may fail. An extensive tuning is also required. Hence, DCL is safer for domains where good policies are easier to represent and learn than the corresponding value functions. It incorporates ideas to decrease the variance (Common Random Numbers) and develops a bandit algorithm for efficient rollout allocation while suitable for Supercomputing. It performed well in some tasks: lost sales and 2D online bin packing (Tetris).

Goals of the Project

  • Develop a reinforcement learning (RL) algorithm.
  • Extract the business rules out of the RL.
  • Perform a case study on the ASML service supply chain comparing the current situation vs. RL vs. defined business rules/

    Essential Student Knowledge

    • Being an enthusiastic MSc student in Industrial Engineering and Management / Business Information Systems / Econometrics with good communication skills.
    • Being able to work independently and organize and run your project smoothly.


    More information: