IRP / Reinforcement Learning / Digital Twin

AI-Powered
Inventory Routing,
Beyond
Route Optimization

CityOps goes beyond sequencing stops. It learns long-run IRP policies: which nodes to serve, how much to move per SKU, and how vehicles should route as demand, load, and traffic evolve.

Explore the solution View operations demo

CityOps — IRP Policy Engine

Joint inventory-routing actions

Selects nodes, SKU quantities and route execution together, instead of optimizing routes after replenishment decisions are fixed.

Deep policy network

Maps observable inventory, order, depot, vehicle and traffic state to policy decisions across a high-dimensional action space.

Financial reward function

Optimizes long-run return from revenues, vehicle operating costs, holding costs and discounting, not distance alone.

Digital Twin recalibration

Updates stochastic demand, load and traffic transitions from operating data, then recomputes the policy when conditions move.

Why It Is Different

Not another routing optimizer

CityOps is built for logistics teams whose routing problem cannot be separated from inventory state, SKU-level demand and long-run operating economics.

Dimension

Typical routing engine

CityOps IRP policy engine

Decision scope

Orders or stops are fixed first; the engine sequences routes afterwards.

Decides which nodes to visit, SKU quantities to transact and vehicle routes jointly.

Planning logic

Optimizes a dispatch cycle or short deterministic plan.

Learns long-run policies over replenishment cycles and future state transitions.

System state

Focuses on stops, time windows, distances, vehicle capacity and service rules.

Uses inventory by node and SKU, pending orders, depots, vehicles, traffic and financial state.

Uncertainty

Re-optimizes when demand, traffic or workload changes.

Trains against stochastic demand, load and traffic behaviour inside a Digital Twin.

Objective

Primarily minimizes distance, time or dispatch cost under constraints.

Maximizes discounted financial reward: revenue served minus transport and holding costs.

Learning loop

Runs again with updated inputs.

Updates the stochastic environment model and recomputes the policy from observed operations.

About Innitium

Complexity-ready AI for logistics

Modern logistics networks are partially observable, multi-agent, stochastic and path-dependent. Classical heuristics leave significant value on the table.

Innitium Analytics Consulting was founded to apply state-of-the-art Reinforcement Learning and Digital Twin modelling to these exact challenges — delivering AI systems that learn optimal long-run policies, not just greedy one-step decisions.

Reinforcement Learning Digital Twin Route Optimisation Demand Forecasting Deep Neural Networks Inventory Management

Long-run optimisation

Our RL agent maximises discounted cumulative reward over the full planning horizon, not just the next delivery cycle.

Continuous learning

The Digital Twin updates its stochastic demand and traffic models daily from real operations data, keeping the policy current.

Coupled decisions

Inventory and routing are solved jointly — eliminating the sub-optimality of treating them as separate problems.

Our Solution

CityOps — end-to-end IRP platform

From raw order records to real-time optimised routing, CityOps handles every layer of the Inventory Routing Problem.

Demand modelling

Machine learning models capture the stochastic demand behaviour per node and SKU, feeding the Digital Twin with realistic load scenarios.

Network & routing engine

Depot topology, vehicle capacities, traffic data and distance matrices are integrated into the routing layer that evaluates each candidate policy.

RL policy optimisation

A deep neural network maps observable inventory state to optimal drop-off / pickup decisions per SKU and node, updated as conditions evolve.

Financial reward function

Revenue from node servicing, vehicle operating costs, holding costs and a discount factor are combined into an exhaustive long-run objective.

Real-time policy update

Daily exports of operational data trigger automatic model and policy re-evaluation — the system improves with every delivery cycle.

Live supervision

Field operations are monitored in real time, with barcode scanning integration and dynamic re-routing as conditions change.

How it works

Reinforcement Learning meets Digital Twin

The IRP is inherently systemic, non-linear, episodic and multi-scalar. Standard heuristics break down. Our RL–Digital Twin architecture is purpose-built for exactly this complexity class.

State observation

The agent observes inventory levels per SKU and node, pending orders, date, and traffic conditions — a partial view of a complex environment.

Action selection

From the high-dimensional action space (which nodes to visit, how much of each SKU to transact), the policy network selects the optimal action.

Environment transition

The Digital Twin simulates route execution, SKU transactions, and the stochastic load and traffic variables to produce the next state.

Reward & learning

The agent receives a reward (revenues minus logistics and holding costs), and updates its policy to maximise long-run discounted returns.

Operational deployment

The trained policy is deployed to production. Throughout the day it observes the current real-world state — inventory levels, pending orders, traffic — and outputs optimal routing and delivery decisions immediately, with no simulation required.

Continuous recalibration

At any point, real operations data — actual demand fulfilled, routes completed, inventory consumed — is captured to update the Digital Twin's stochastic model. A new policy is then computed, incorporating everything observed since the last training cycle and keeping decisions aligned with evolving conditions.

RL Agent — Full Operational Cycle

Training phase

Agent

Deep neural network policy. Maps state → optimal action (nodes, volumes). Updated via RL algorithm.

⬇ action: nodes to visit, SKU quantities

Digital Twin Environment

Simulates route execution and stochastic demand and traffic transitions to produce reward and next state.

⬇ reward + next state

Reward function

Revenue − transport cost − holding cost, discounted over time horizon. Drives long-run optimisation.

⬆ policy gradient update → trained policy

Operational phase

Live deployment

The trained policy observes the current real-world state and outputs optimal routing and inventory decisions immediately — no simulation needed.

⬇ real operations: demand, routes, inventory

Recalibration

Observed operations update the Digital Twin's stochastic model. A new policy is computed, incorporating all data since the last training cycle.

⬆ updated model → retrain policy

Target Markets

Logistic-intensive, fast, stochastic

We focus on sectors where fast stock dynamics, large fleet operations, and highly variable demand make conventional approaches insufficient.

Urban last mile

High-volume, low-value products across dense city networks with variable demand patterns.

Vendor managed inventory

Retail and oil & gas operators requiring automated replenishment across distributed networks.

Reverse logistics

Waste collection, industrial gases, recycled containers and domestic butane distribution.

Food & beverages

Water, beer and perishable distribution to restaurants, schools and catering outlets.

Proven Results

Deployed in production

Real outcomes across field services, pharma distribution, and reverse logistics operations.

Field Service · HVAC

Major home services multinational

Optimisation of day-to-day routing and long-term preventive maintenance scheduling across a Spanish subsidiary with 1.1 million contracts, part of a 15.9 M-contract global group.

−26%

km between stops

+18%

daily operations

15 days

planning horizon visibility

Pharma Logistics · Last Mile

Leading Spanish urgent courier

Last-mile logistics optimisation for pharma distribution, coordinating three territories simultaneously with live supervision and barcode scanning integration.

−15%

km between stops

+12%

daily collections

Reverse Logistics · IBEX 35

Waste oil collection network

Last-mile reverse logistics for used cooking oil collection from restaurants and food outlets. Modelled on 6+ years of historical pickup and route data.

−21%

km between stops

+39%

daily collections

15 days

planning horizon visibility

Contact

Let's talk

Interested in applying CityOps to a specialist logistics operation? We work with teams facing coupled inventory and routing decisions under uncertainty.

Good fit Node/SKU inventories, variable demand or production, depot and vehicle constraints, and routes where replenishment or pickup timing changes future cost.

Typical inputs Historical orders, inventory records, node geolocation, depot topology, vehicle parameters, distance or traffic matrices, and revenue/cost assumptions.

Operational outputs Replenishment or pickup quantities, vehicle assignment, dispatch/routing plans, and policy updates as observed operations recalibrate the Digital Twin.

Send us a message

info@innitium.com

www.innitium.com

Innitium Analytics Consulting SL

AI-PoweredInventory Routing,BeyondRoute Optimization

Not another routing optimizer

Complexity-ready AI for logistics

Long-run optimisation

Continuous learning

Coupled decisions

CityOps — end-to-end IRP platform

Demand modelling

Network & routing engine

RL policy optimisation

Financial reward function

Real-time policy update

Live supervision

Reinforcement Learning meets Digital Twin

State observation

Action selection

Environment transition

Reward & learning

Operational deployment

Continuous recalibration

Agent

Digital Twin Environment

Reward function

Live deployment

Recalibration

Logistic-intensive, fast, stochastic

Urban last mile

Vendor managed inventory

Reverse logistics

Food & beverages

Deployed in production

Major home services multinational

Leading Spanish urgent courier

Waste oil collection network

The team

Let's talk

AI-Powered
Inventory Routing,
Beyond
Route Optimization