IRP / Reinforcement Learning / Digital Twin

AI-Powered
Inventory Routing,
Beyond
Route Optimization

CityOps goes beyond sequencing stops. It learns long-run IRP policies: which nodes to serve, how much to move per SKU, and how vehicles should route as demand, load, and traffic evolve.

CityOps — IRP Policy Engine
Joint inventory-routing actions
Selects nodes, SKU quantities and route execution together, instead of optimizing routes after replenishment decisions are fixed.
Deep policy network
Maps observable inventory, order, depot, vehicle and traffic state to policy decisions across a high-dimensional action space.
Financial reward function
Optimizes long-run return from revenues, vehicle operating costs, holding costs and discounting, not distance alone.
Digital Twin recalibration
Updates stochastic demand, load and traffic transitions from operating data, then recomputes the policy when conditions move.
−26%
Kilometres between stops
+39%
Increase in daily collections
15 days
Planning horizon visibility
Why It Is Different

Not another routing optimizer

CityOps is built for logistics teams whose routing problem cannot be separated from inventory state, SKU-level demand and long-run operating economics.

Dimension
Typical routing engine
CityOps IRP policy engine
Decision scope
Orders or stops are fixed first; the engine sequences routes afterwards.
Decides which nodes to visit, SKU quantities to transact and vehicle routes jointly.
Planning logic
Optimizes a dispatch cycle or short deterministic plan.
Learns long-run policies over replenishment cycles and future state transitions.
System state
Focuses on stops, time windows, distances, vehicle capacity and service rules.
Uses inventory by node and SKU, pending orders, depots, vehicles, traffic and financial state.
Uncertainty
Re-optimizes when demand, traffic or workload changes.
Trains against stochastic demand, load and traffic behaviour inside a Digital Twin.
Objective
Primarily minimizes distance, time or dispatch cost under constraints.
Maximizes discounted financial reward: revenue served minus transport and holding costs.
Learning loop
Runs again with updated inputs.
Updates the stochastic environment model and recomputes the policy from observed operations.
About Innitium

Complexity-ready AI for logistics

Modern logistics networks are partially observable, multi-agent, stochastic and path-dependent. Classical heuristics leave significant value on the table.

Innitium Analytics Consulting was founded to apply state-of-the-art Reinforcement Learning and Digital Twin modelling to these exact challenges — delivering AI systems that learn optimal long-run policies, not just greedy one-step decisions.

Reinforcement Learning Digital Twin Route Optimisation Demand Forecasting Deep Neural Networks Inventory Management

Long-run optimisation

Our RL agent maximises discounted cumulative reward over the full planning horizon, not just the next delivery cycle.

Continuous learning

The Digital Twin updates its stochastic demand and traffic models daily from real operations data, keeping the policy current.

Coupled decisions

Inventory and routing are solved jointly — eliminating the sub-optimality of treating them as separate problems.

Our Solution

CityOps — end-to-end IRP platform

From raw order records to real-time optimised routing, CityOps handles every layer of the Inventory Routing Problem.

Demand modelling

Machine learning models capture the stochastic demand behaviour per node and SKU, feeding the Digital Twin with realistic load scenarios.

Network & routing engine

Depot topology, vehicle capacities, traffic data and distance matrices are integrated into the routing layer that evaluates each candidate policy.

RL policy optimisation

A deep neural network maps observable inventory state to optimal drop-off / pickup decisions per SKU and node, updated as conditions evolve.

Financial reward function

Revenue from node servicing, vehicle operating costs, holding costs and a discount factor are combined into an exhaustive long-run objective.

Real-time policy update

Daily exports of operational data trigger automatic model and policy re-evaluation — the system improves with every delivery cycle.

Live supervision

Field operations are monitored in real time, with barcode scanning integration and dynamic re-routing as conditions change.

How it works

Reinforcement Learning meets Digital Twin

The IRP is inherently systemic, non-linear, episodic and multi-scalar. Standard heuristics break down. Our RL–Digital Twin architecture is purpose-built for exactly this complexity class.

01

State observation

The agent observes inventory levels per SKU and node, pending orders, date, and traffic conditions — a partial view of a complex environment.

02

Action selection

From the high-dimensional action space (which nodes to visit, how much of each SKU to transact), the policy network selects the optimal action.

03

Environment transition

The Digital Twin simulates route execution, SKU transactions, and the stochastic load and traffic variables to produce the next state.

04

Reward & learning

The agent receives a reward (revenues minus logistics and holding costs), and updates its policy to maximise long-run discounted returns.

05

Operational deployment

The trained policy is deployed to production. Throughout the day it observes the current real-world state — inventory levels, pending orders, traffic — and outputs optimal routing and delivery decisions immediately, with no simulation required.

06

Continuous recalibration

At any point, real operations data — actual demand fulfilled, routes completed, inventory consumed — is captured to update the Digital Twin's stochastic model. A new policy is then computed, incorporating everything observed since the last training cycle and keeping decisions aligned with evolving conditions.

RL Agent — Full Operational Cycle
Training phase
Agent

Deep neural network policy. Maps state → optimal action (nodes, volumes). Updated via RL algorithm.

⬇ action: nodes to visit, SKU quantities
Digital Twin Environment

Simulates route execution and stochastic demand and traffic transitions to produce reward and next state.

⬇ reward + next state
Reward function

Revenue − transport cost − holding cost, discounted over time horizon. Drives long-run optimisation.

⬆ policy gradient update → trained policy
Operational phase
Live deployment

The trained policy observes the current real-world state and outputs optimal routing and inventory decisions immediately — no simulation needed.

⬇ real operations: demand, routes, inventory
Recalibration

Observed operations update the Digital Twin's stochastic model. A new policy is computed, incorporating all data since the last training cycle.

⬆ updated model → retrain policy
Target Markets

Logistic-intensive, fast, stochastic

We focus on sectors where fast stock dynamics, large fleet operations, and highly variable demand make conventional approaches insufficient.

Urban last mile

High-volume, low-value products across dense city networks with variable demand patterns.

Vendor managed inventory

Retail and oil & gas operators requiring automated replenishment across distributed networks.

Reverse logistics

Waste collection, industrial gases, recycled containers and domestic butane distribution.

Food & beverages

Water, beer and perishable distribution to restaurants, schools and catering outlets.

Proven Results

Deployed in production

Real outcomes across field services, pharma distribution, and reverse logistics operations.

Field Service · HVAC

Major home services multinational

Optimisation of day-to-day routing and long-term preventive maintenance scheduling across a Spanish subsidiary with 1.1 million contracts, part of a 15.9 M-contract global group.

−26%
km between stops
+18%
daily operations
15 days
planning horizon visibility
Pharma Logistics · Last Mile

Leading Spanish urgent courier

Last-mile logistics optimisation for pharma distribution, coordinating three territories simultaneously with live supervision and barcode scanning integration.

−15%
km between stops
+12%
daily collections
Reverse Logistics · IBEX 35

Waste oil collection network

Last-mile reverse logistics for used cooking oil collection from restaurants and food outlets. Modelled on 6+ years of historical pickup and route data.

−21%
km between stops
+39%
daily collections
15 days
planning horizon visibility
Founder

The team

Roberto Vázquez Lucerga
Roberto Vázquez Lucerga
Founder & CEO

Over 25 years at the intersection of energy and digital innovation, leading multiple industrial cleantech projects across Europe and Southeast Asia for major corporations, as well as founding two startups in cleantech and digital sectors. Served as Programme Director and Data Science Lead in Business Analytics at UFV University, teaching Machine Learning, AI, Algorithms and Decision Support Systems.

MIT Sloan Fellow MEng MBA MPhys MTech Mgmt
Contact

Let's talk

Interested in applying CityOps to a specialist logistics operation? We work with teams facing coupled inventory and routing decisions under uncertainty.

Good fit Node/SKU inventories, variable demand or production, depot and vehicle constraints, and routes where replenishment or pickup timing changes future cost.
Typical inputs Historical orders, inventory records, node geolocation, depot topology, vehicle parameters, distance or traffic matrices, and revenue/cost assumptions.
Operational outputs Replenishment or pickup quantities, vehicle assignment, dispatch/routing plans, and policy updates as observed operations recalibrate the Digital Twin.
Innitium Analytics Consulting SL