From 50-Person Manual Operation to Fully Automated Pipeline
A major European retailer cut catalog onboarding from 3–4 days of 50-person manual work to under 3 hours, with zero human touchpoints in the standard processing flow.
The Problem
Large European retailers refresh their product catalogs multiple times per year — at season changeovers, on the back of new supplier contracts, and in response to regulatory requirements around pricing and labeling accuracy. For a retailer carrying hundreds of thousands of SKUs across multiple regions, each refresh is a project in its own right, not a routine operation.
Before this engagement, the client's catalog process worked like this: dozens of suppliers would send product data in whatever format each vendor had standardised on — CSV exports, XML files, proprietary flat files, occasionally half-populated spreadsheets. A central operations team would receive these feeds, manually review each one for obvious errors, then mobilise fifty operations staff to key the data into the product information management system. The cycle ran for three to four consecutive days, always timed to a weekend to minimise disruption to live trading.
Product information management platforms are the industry's standard answer to catalog complexity — but they manage structured data that is already clean and inside the system. They solve nothing upstream. The real cost sat in the gap between what vendors sent and what the PIM could ingest: schema variance across suppliers, missing mandatory fields, currency formatting inconsistencies, duplicate SKUs introduced when suppliers recycled identifiers across seasons, and locale-specific pricing that needed mapping to the retailer's internal taxonomy. Manual entry existed because the upstream data was untrustworthy, and trusting it required human judgement applied row by row.
That human judgement added errors of its own. A QA cycle ran after each entry batch — and QA sometimes uncovered problems that required a partial re-run, adding a full day to an already expensive process. The hidden cost was structural: fifty experienced operations staff — people with institutional knowledge of supplier relationships and the product hierarchy — were unavailable for anything else during the cycle, multiple times per year. Across the multi-day cycle, each refresh consumed roughly 3,000 person-hours. Run three times a year, that exceeded 9,000 staff-hours of lost capacity, conservatively measured, before accounting for the QA cycles and error rework.
The business had attempted automation twice before. Once with a bespoke ETL pipeline that broke every time a vendor changed their file format, and once with a middleware platform that proved too slow to clear a full refresh inside the weekend trading window. The appetite for a third attempt was cautious. The goal was a system resilient to vendor schema drift, fast enough to complete inside a weekend, and accurate enough to eliminate the QA cycle entirely.
The Constraints
- Fixed weekend processing window. Peak-season refreshes had to complete before trading opened Monday morning — a hard deadline, not a target. Every architectural decision had to account for this constraint explicitly.
- Customer-facing pricing accuracy. A price error that reached the live catalog created immediate consumer-facing and regulatory exposure. Pricing fields required explicit validation, not assumed correctness from the source feed.
- No schema control over vendor feeds. Vendors set their own data formats and changed them without notice. The system had to absorb format variance gracefully and map it to a canonical internal schema, not demand compliance.
- Downstream systems must receive consistent state. Search indexing, inventory management, and the sales platform all consume catalog data. They could not see partial updates or mid-refresh state — every event they received had to represent a complete, valid product record.
- Full audit lineage required. Every SKU in every batch needed a traceable record from vendor feed to catalog record to downstream event, queryable after the fact for regulatory and supplier dispute purposes.
The Architecture
The pipeline follows a fan-in, process, fan-out pattern with two clearly defined processing boundaries: one at ingestion, one at publication to downstream systems.
On the inbound side, each vendor's feed arrives as a batch file dropped into Azure Blob Storage. A lightweight blob-triggered Function detects new arrivals and publishes individual product records as messages onto Azure Service Bus — one topic per feed type. This topic-per-feed-type design was a deliberate early decision: it confines the blast radius of any single vendor's malformed feed to exactly one topic. Its dead-letter queue fills, its consumers retry, and no other vendor's data is affected. Replaying one vendor's batch is a two-click operation; it never touches anyone else's.
The core processing runs in a Function App bound to each Service Bus topic. Before any business
logic executes, the handler performs an idempotency check: each inbound message carries a composite
dedup key derived from vendor_id + sku + revision.
The key is checked against a Redis cache with a 24-hour TTL. Azure Service Bus guarantees
at-least-once delivery — meaning the same message can arrive more than once under failure or retry
conditions. The dedup check collapses those duplicates to a single write without needing distributed
transaction semantics.
Validation runs as a pipeline of small, composable IValidationRule
implementations registered in the DI container. Each rule receives the payload and returns either a pass
or a structured error record. The first failure short-circuits the chain and routes the message to the
dead-letter queue with the error reason attached as a message property — not as a logged exception.
The operations team can query the dead-letter queue directly in Service Bus Explorer and see exactly
which field failed, for which SKU, from which vendor, at what time.
After validation passes, an enrichment step decorates the canonical product record with data from internal reference sources: category hierarchy, regional VAT rates, locale-specific labels. The enriched record is written to Cosmos DB and an outbound event is published for downstream consumers. Three systems subscribe: search indexing, inventory management, and the pricing platform. Each is independently deployed with its own subscription and dead-letter queue — a failure in search indexing never holds up the inventory update.
The entire pipeline runs on consumption-plan Azure Functions, so cost aligns directly with volume: near-zero during off-peak months, predictable burst cost during refresh cycles — with no pre-provisioned capacity sitting idle. The design we rejected was a traditional batch ETL job. The client's previous failed attempt used exactly that model, and its fundamental weakness was sequential processing: one large vendor feed would hold up all subsequent feeds behind it in the queue. An event-driven architecture eliminates head-of-line blocking entirely — every product record becomes an independent unit of work, processed in parallel up to the Function App's concurrency limit.
Implementation Highlights
Idempotent Message Handler
Azure Service Bus guarantees at-least-once delivery. Under normal conditions, each message arrives exactly once. Under failure — a Function host restarting mid-processing, a network partition during the acknowledgement — the same message can be redelivered. Without an idempotency guard, this produces duplicate catalog writes that corrupt the state downstream systems depend on. The guard uses a composite key checked against a Redis entry with a 24-hour TTL, which safely covers the maximum retry window for any single refresh cycle.
[FunctionName("CatalogIngestion")]
public async Task Run(
[ServiceBusTrigger("catalog-feeds", "vendor-a",
Connection = "ServiceBus:ConnectionString")] ServiceBusReceivedMessage message,
ServiceBusMessageActions messageActions,
ILogger log)
{
var dedupeKey = message.ApplicationProperties
.GetValueOrDefault("dedup_key")?.ToString();
if (string.IsNullOrWhiteSpace(dedupeKey))
{
await messageActions.DeadLetterMessageAsync(message, "MissingDedupeKey",
"No dedup_key property.");
return;
}
if (await _cache.ExistsAsync(dedupeKey))
{
log.LogInformation("Duplicate discarded. {DedupeKey}", dedupeKey);
await messageActions.CompleteMessageAsync(message);
return;
}
var payload = message.Body.ToObjectFromJson<VendorProductPayload>();
var result = _validator.Validate(payload);
if (!result.IsValid)
{
await messageActions.DeadLetterMessageAsync(message, result.ErrorCode,
result.ErrorMessage);
log.LogWarning("Rejected. {DedupeKey} {Error}", dedupeKey, result.ErrorCode);
return;
}
await _catalog.UpsertAsync(payload.ToCanonical());
await _cache.SetAsync(dedupeKey, TimeSpan.FromHours(24));
await messageActions.CompleteMessageAsync(message);
log.LogInformation("Ingested. {DedupeKey} {Sku}", dedupeKey, payload.Sku);
}
Composable Validation Chain
Rather than a monolithic validator, the pipeline registers a sequence of
IValidationRule
implementations in the DI container. Adding a new rule — say, when a new vendor introduces a field
the others don't carry — requires one new class and one DI registration. No existing code changes.
Cheap checks (required field presence) are registered first; expensive checks (image URL reachability)
run last. The first failure short-circuits the chain immediately.
public interface IValidationRule
{
ValidationResult Validate(VendorProductPayload payload);
}
// Registered in DI in order; first failure short-circuits.
public sealed class ValidationPipeline
{
private readonly IEnumerable<IValidationRule> _rules;
public ValidationPipeline(IEnumerable<IValidationRule> rules) => _rules = rules;
public ValidationResult Validate(VendorProductPayload payload)
{
foreach (var rule in _rules)
{
var result = rule.Validate(payload);
if (!result.IsValid) return result;
}
return ValidationResult.Pass();
}
}
public sealed class RequiredFieldsRule : IValidationRule
{
private static readonly string[] _fields =
{ "Sku", "Title", "CategoryCode", "PriceGbp" };
public ValidationResult Validate(VendorProductPayload payload) =>
_fields.Select(f => (Field: f, Value: payload.GetField(f)))
.Where(x => string.IsNullOrWhiteSpace(x.Value))
.Select(x => ValidationResult.Fail("MISSING_FIELD", x.Field))
.FirstOrDefault() ?? ValidationResult.Pass();
}
public sealed class PriceRangeRule : IValidationRule
{
public ValidationResult Validate(VendorProductPayload payload) =>
payload.PriceGbp is >= 0.01m and <= 99_999.99m
? ValidationResult.Pass()
: ValidationResult.Fail("PRICE_OUT_OF_RANGE",
$"Price {payload.PriceGbp:C} outside accepted range.");
}
Dead-Letter as Structured Audit Trail
The conventional approach to validation failure is to log an exception and discard the message. We rejected this because it fragments the audit record: the error reason lives in the log stream, the original payload is gone, and correlating the two for an ops investigation requires cross-referencing timestamps across systems. Instead, rejected messages are dead-lettered with the full original payload and a structured error envelope attached as message properties. The dead-letter queue becomes a first-class data store. The operations team can review rejections in Service Bus Explorer, fix the upstream vendor data, and replay the message directly from the queue — without needing to regenerate the original feed file.
private async Task DeadLetterWithContextAsync(
ServiceBusMessageActions messageActions,
ServiceBusReceivedMessage message,
ValidationResult failure,
ILogger log)
{
var envelope = new
{
ErrorCode = failure.ErrorCode,
ErrorMessage = failure.ErrorMessage,
ReceivedAt = DateTimeOffset.UtcNow,
MessageId = message.MessageId,
DedupeKey = message.ApplicationProperties.GetValueOrDefault("dedup_key"),
OriginalBody = message.Body.ToString()
};
await messageActions.DeadLetterMessageAsync(
message,
deadLetterReason: failure.ErrorCode,
deadLetterErrorDescription: JsonSerializer.Serialize(envelope));
log.LogWarning("Dead-lettered {MessageId} Reason={Reason}",
message.MessageId, failure.ErrorCode);
}
The Outcome
The first full production run completed in 2 hours 47 minutes — inside the weekend window with time to spare. The QA cycle that had previously added a day to every refresh did not run, because there was nothing to QA: every rejected item had a machine-readable reason in the dead-letter queue, and every accepted item had been validated by the pipeline before it reached the catalog.
The fifty staff who had previously been mobilised for the refresh cycle were not made redundant — they were reassigned. The three days of their calendar that had been blocked every quarter returned to regular work. The downstream effect was subtler but equally real: the fear around catalog refreshes disappeared. Refreshes stopped being events that the CEO's calendar blocked around. In the first full year after deployment, the client ran six catalog refreshes where they had previously run three. Supplier onboarding accelerated. New vendor relationships that had been avoided because the manual process couldn't absorb another feed source became feasible. The pipeline has been running without structural changes for more than four years.
What We'd Do Differently
The vendor validation rules are implemented as individual C# classes — one class per rule, one class per vendor quirk. This worked well for the first twelve months while the rule set was relatively stable. As new vendors were onboarded and existing vendors changed their feed schemas, the rule set grew. By the end of the first year, the repository had over forty validation rule classes, each requiring a code change and a deployment cycle to add or modify.
In retrospect, a small configuration-driven rule definition — something as simple as a JSON schema specifying field names, required/optional, type, and range constraints — would have paid back inside six months. The C# class-per-rule approach is transparent and unit-testable, properties we still value. A configuration-driven engine doesn't have to sacrifice those — it just needs the engine itself to be tested against its interpretation of the config. The operations team could have responded to a vendor schema change with a config deploy rather than a code deploy and a full release cycle.
The Service Bus topic-per-feed-type design has aged extremely well. Under peak load with five concurrent vendor feeds processing simultaneously, the isolation meant one feed's retry storm had no effect on the others. We'd make the same call again without hesitation.
The Cosmos DB partition key choice has not aged as well. We partitioned the catalog store by category code, which made sense when the largest vendor's catalog had a roughly even distribution across categories. Eighteen months in, one new vendor's catalog skewed heavily toward a single category — creating a hot partition that raised RU consumption by 30% and required a re-partitioning exercise. If designing this today, we'd partition by a compound key from the start, or take a harder look at Azure SQL with a regional read replica — which for this access pattern may be simpler and cheaper.
If You're Solving This Today
This was built on the Azure Functions v3 in-process model — the standard in 2021, using
[FunctionName]
attributes and DI wired directly through Startup.cs.
In 2026 we'd start with the isolated worker model (.NET 8 default, [Function]
attribute) — it gives cleaner dependency injection, proper middleware pipelines, and dramatically
easier integration testing because the host no longer owns your DI container.
We'd also evaluate Durable Functions for the orchestration layer rather than binding directly to
Service Bus topics: the fan-out/fan-in pattern would have simplified the batch-completion signalling
we implemented as a Redis counter. And we'd look at Azure Container Apps for the host — for
workloads with predictable burst patterns, the per-replica pricing often beats the consumption plan
once you factor in cold-start latency at scale.
Related Case Studies
Questions about retail catalog automation.
- How long does a retail catalog automation project take to implement?
- The first production deployment shipped in under 12 weeks from initial diagnosis; the embedded engagement then continued for 14 months as new vendors and validation rules were added. Timeline depends on the complexity of source data formats and the number of upstream systems requiring integration.
- Can catalog automation integrate with an existing ERP or PIM system?
- Yes. The pipeline in this engagement was designed as a layer on top of the existing ERP — receiving events from upstream systems and writing to the catalog without replacing any existing infrastructure. No rip-and-replace was required.
- What Azure technologies are used for retail catalog automation?
- Azure Service Bus for event coordination, Azure Functions for stateless processing, Redis Cache for deduplication, Cosmos DB for catalog state, and Azure Blob Storage for raw ingestion. The architecture is event-driven and idempotent — each step can safely re-run under at-least-once delivery without producing duplicates.
- What ROI should we expect from catalog automation?
- The client in this case study reclaimed over 9,000 staff-hours per year from manual catalog refresh cycles. For any operation involving more than ten people across multiple days, full automation typically pays for itself within the first cycle and enables more frequent refreshes — which this client used to double their annual refresh count.
Solving a Problem Like This?
The fastest way to know if the same approach fits your context is a two-week Discovery Sprint — a fixed-price backend diagnosis with no long-term commitment required.