Innitium combines Reinforcement Learning and Digital Twin simulation to solve the toughest inventory routing problems — optimizing stock decisions and vehicle routing simultaneously, across stochastic, fast-moving environments.
Modern logistics networks are partially observable, multi-agent, stochastic and path-dependent. Classical heuristics leave significant value on the table.
Innitium Analytics Consulting was founded to apply state-of-the-art Reinforcement Learning and Digital Twin modelling to these exact challenges — delivering AI systems that learn optimal long-run policies, not just greedy one-step decisions.
Our RL agent maximises discounted cumulative reward over the full planning horizon, not just the next delivery cycle.
The Digital Twin updates its stochastic demand and traffic models daily from real operations data, keeping the policy current.
Inventory and routing are solved jointly — eliminating the sub-optimality of treating them as separate problems.
From raw order records to real-time optimised routing, CityOps handles every layer of the Inventory Routing Problem.
Machine learning models capture the stochastic demand behaviour per node and SKU, feeding the Digital Twin with realistic load scenarios.
Depot topology, vehicle capacities, traffic data and distance matrices are integrated into the routing layer that evaluates each candidate policy.
A deep neural network maps observable inventory state to optimal drop-off / pickup decisions per SKU and node, updated as conditions evolve.
Revenue from node servicing, vehicle operating costs, holding costs and a discount factor are combined into an exhaustive long-run objective.
Daily exports of operational data trigger automatic model and policy re-evaluation — the system improves with every delivery cycle.
Field operations are monitored in real time, with barcode scanning integration and dynamic re-routing as conditions change.
The IRP is inherently systemic, non-linear, episodic and multi-scalar. Standard heuristics break down. Our RL–Digital Twin architecture is purpose-built for exactly this complexity class.
The agent observes inventory levels per SKU and node, pending orders, date, and traffic conditions — a partial view of a complex environment.
From the high-dimensional action space (which nodes to visit, how much of each SKU to transact), the policy network selects the optimal action.
The Digital Twin simulates route execution, SKU transactions, and the stochastic load and traffic variables to produce the next state.
The agent receives a reward (revenues minus logistics and holding costs), and updates its policy to maximise long-run discounted returns.
The trained policy is deployed to production. Throughout the day it observes the current real-world state — inventory levels, pending orders, traffic — and outputs optimal routing and delivery decisions immediately, with no simulation required.
At any point, real operations data — actual demand fulfilled, routes completed, inventory consumed — is captured to update the Digital Twin's stochastic model. A new policy is then computed, incorporating everything observed since the last training cycle and keeping decisions aligned with evolving conditions.
Deep neural network policy. Maps state → optimal action (nodes, volumes). Updated via RL algorithm.
Simulates route execution and stochastic demand and traffic transitions to produce reward and next state.
Revenue − transport cost − holding cost, discounted over time horizon. Drives long-run optimisation.
The trained policy observes the current real-world state and outputs optimal routing and inventory decisions immediately — no simulation needed.
Observed operations update the Digital Twin's stochastic model. A new policy is computed, incorporating all data since the last training cycle.
We focus on sectors where fast stock dynamics, large fleet operations, and highly variable demand make conventional approaches insufficient.
High-volume, low-value products across dense city networks with variable demand patterns.
Retail and oil & gas operators requiring automated replenishment across distributed networks.
Waste collection, industrial gases, recycled containers and domestic butane distribution.
Water, beer and perishable distribution to restaurants, schools and catering outlets.
Real outcomes across field services, pharma distribution, and reverse logistics operations.
Optimisation of day-to-day routing and long-term preventive maintenance scheduling across a Spanish subsidiary with 1.1 million contracts, part of a 15.9 M-contract global group.
Last-mile logistics optimisation for pharma distribution, coordinating three territories simultaneously with live supervision and barcode scanning integration.
Last-mile reverse logistics for used cooking oil collection from restaurants and food outlets. Modelled on 6+ years of historical pickup and route data.

Over 25 years at the intersection of energy and digital innovation, leading multiple industrial cleantech projects across Europe and Southeast Asia for major corporations, as well as founding two startups in cleantech and digital sectors. Served as Programme Director and Data Science Lead in Business Analytics at UFV University, teaching Machine Learning, AI, Algorithms and Decision Support Systems.
Interested in applying AI to your logistics operations? We work with companies facing complex, stochastic inventory and routing challenges.
Send us a message