Reinforcement Learning & Digital Twins

AI-Powered
Inventory Routing
for Complex Logistics

Innitium combines Reinforcement Learning and Digital Twin simulation to solve the toughest inventory routing problems — optimizing stock decisions and vehicle routing simultaneously, across stochastic, fast-moving environments.

CityOps — Key Capabilities
Real-time state awareness
Captures vehicle and inventory status continuously, adapting decisions to real-time changes in operations and field conditions.
Deep policy network
Joint routing and inventory decisions modelled by a deep neural network, capturing complex, non-linear, stochastic and path-dependent environment behaviour.
Customised financial model
Reward function built on a full cash flow model — revenue, logistics and holding costs — with proper discounting to capture both short and long-term impact of decisions.
Continuous retraining
Learns from observed traffic and demand data after each operational cycle, using it to update the Digital Twin model and recompute an improved policy.
−26%
Kilometres between stops
+39%
Increase in daily collections
15 days
Planning horizon visibility
About Innitium

Complexity-ready AI for logistics

Modern logistics networks are partially observable, multi-agent, stochastic and path-dependent. Classical heuristics leave significant value on the table.

Innitium Analytics Consulting was founded to apply state-of-the-art Reinforcement Learning and Digital Twin modelling to these exact challenges — delivering AI systems that learn optimal long-run policies, not just greedy one-step decisions.

Reinforcement Learning Digital Twin Route Optimisation Demand Forecasting Deep Neural Networks Inventory Management

Long-run optimisation

Our RL agent maximises discounted cumulative reward over the full planning horizon, not just the next delivery cycle.

Continuous learning

The Digital Twin updates its stochastic demand and traffic models daily from real operations data, keeping the policy current.

Coupled decisions

Inventory and routing are solved jointly — eliminating the sub-optimality of treating them as separate problems.

Our Solution

CityOps — end-to-end IRP platform

From raw order records to real-time optimised routing, CityOps handles every layer of the Inventory Routing Problem.

Demand modelling

Machine learning models capture the stochastic demand behaviour per node and SKU, feeding the Digital Twin with realistic load scenarios.

Network & routing engine

Depot topology, vehicle capacities, traffic data and distance matrices are integrated into the routing layer that evaluates each candidate policy.

RL policy optimisation

A deep neural network maps observable inventory state to optimal drop-off / pickup decisions per SKU and node, updated as conditions evolve.

Financial reward function

Revenue from node servicing, vehicle operating costs, holding costs and a discount factor are combined into an exhaustive long-run objective.

Real-time policy update

Daily exports of operational data trigger automatic model and policy re-evaluation — the system improves with every delivery cycle.

Live supervision

Field operations are monitored in real time, with barcode scanning integration and dynamic re-routing as conditions change.

How it works

Reinforcement Learning meets Digital Twin

The IRP is inherently systemic, non-linear, episodic and multi-scalar. Standard heuristics break down. Our RL–Digital Twin architecture is purpose-built for exactly this complexity class.

01

State observation

The agent observes inventory levels per SKU and node, pending orders, date, and traffic conditions — a partial view of a complex environment.

02

Action selection

From the high-dimensional action space (which nodes to visit, how much of each SKU to transact), the policy network selects the optimal action.

03

Environment transition

The Digital Twin simulates route execution, SKU transactions, and the stochastic load and traffic variables to produce the next state.

04

Reward & learning

The agent receives a reward (revenues minus logistics and holding costs), and updates its policy to maximise long-run discounted returns.

05

Operational deployment

The trained policy is deployed to production. Throughout the day it observes the current real-world state — inventory levels, pending orders, traffic — and outputs optimal routing and delivery decisions immediately, with no simulation required.

06

Continuous recalibration

At any point, real operations data — actual demand fulfilled, routes completed, inventory consumed — is captured to update the Digital Twin's stochastic model. A new policy is then computed, incorporating everything observed since the last training cycle and keeping decisions aligned with evolving conditions.

RL Agent — Full Operational Cycle
Training phase
Agent

Deep neural network policy. Maps state → optimal action (nodes, volumes). Updated via RL algorithm.

⬇ action: nodes to visit, SKU quantities
Digital Twin Environment

Simulates route execution and stochastic demand and traffic transitions to produce reward and next state.

⬇ reward + next state
Reward function

Revenue − transport cost − holding cost, discounted over time horizon. Drives long-run optimisation.

⬆ policy gradient update → trained policy
Operational phase
Live deployment

The trained policy observes the current real-world state and outputs optimal routing and inventory decisions immediately — no simulation needed.

⬇ real operations: demand, routes, inventory
Recalibration

Observed operations update the Digital Twin's stochastic model. A new policy is computed, incorporating all data since the last training cycle.

⬆ updated model → retrain policy
Target Markets

Logistic-intensive, fast, stochastic

We focus on sectors where fast stock dynamics, large fleet operations, and highly variable demand make conventional approaches insufficient.

Urban last mile

High-volume, low-value products across dense city networks with variable demand patterns.

Vendor managed inventory

Retail and oil & gas operators requiring automated replenishment across distributed networks.

Reverse logistics

Waste collection, industrial gases, recycled containers and domestic butane distribution.

Food & beverages

Water, beer and perishable distribution to restaurants, schools and catering outlets.

Proven Results

Deployed in production

Real outcomes across field services, pharma distribution, and reverse logistics operations.

Field Service · HVAC

Major home services multinational

Optimisation of day-to-day routing and long-term preventive maintenance scheduling across a Spanish subsidiary with 1.1 million contracts, part of a 15.9 M-contract global group.

−26%
km between stops
+18%
daily operations
15 days
planning horizon visibility
Pharma Logistics · Last Mile

Leading Spanish urgent courier

Last-mile logistics optimisation for pharma distribution, coordinating three territories simultaneously with live supervision and barcode scanning integration.

−15%
km between stops
+12%
daily collections
Reverse Logistics · IBEX 35

Waste oil collection network

Last-mile reverse logistics for used cooking oil collection from restaurants and food outlets. Modelled on 6+ years of historical pickup and route data.

−21%
km between stops
+39%
daily collections
15 days
planning horizon visibility
Founder

The team

Roberto Vázquez Lucerga
Roberto Vázquez Lucerga
Founder & CEO

Over 25 years at the intersection of energy and digital innovation, leading multiple industrial cleantech projects across Europe and Southeast Asia for major corporations, as well as founding two startups in cleantech and digital sectors. Served as Programme Director and Data Science Lead in Business Analytics at UFV University, teaching Machine Learning, AI, Algorithms and Decision Support Systems.

MIT Sloan Fellow MEng MBA MPhys MTech Mgmt
Contact

Let's talk

Interested in applying AI to your logistics operations? We work with companies facing complex, stochastic inventory and routing challenges.

Send us a message
Innitium Analytics Consulting SL