Published on

Integration by Design: Analytical Identifiers

Authors
Isometric line-art diagram of multiple identifiers - ISIN, FIGI, CUSIP, internal ID - converging through an Owner+Type prefix into a single instance node, rendered as a wireframe in Daana blue on near-black, signaling the integration-by-design thesis

Identification in Analytical Systems

A single financial instrument - Google's Class C stock, say - can be identified by an ISIN, a FIGI, a Bloomberg ticker, a CUSIP, and an internal source-system ID. Each of those is a legitimate way to refer to the same instance; none of them, on its own, is the instrument. Operational systems do not have to reconcile this multiplicity, because each system owns its own identifier and never has to agree with another. The moment data is brought together for analytics, the multiplicity becomes the problem.

The challenge is to represent that one instance as one instance, regardless of how many systems describe it or how many identifiers they bring along. Three concerns sit underneath this. Multiple sources arrive with different identifiers for the same instance, so without resolution the CRM's email-identified customer, billing's account_id-identified customer, and support's phone-identified customer become three customers instead of one. New identifiers appear over time as new sources are onboarded, and they need to attach to existing instances without fragmenting what has already been unified. And the same identifier value, in different contexts, can refer to entirely different instances - account "12345" in system A is not account "12345" in system B - so naively merging on the raw value corrupts the data. The third concern has a name: key collision.

Identity vs Identifier

Identity is the representation of a thing as a single instance - not a property of the thing itself. "Does this instance already exist?" is the identity question. Two identifiers share an identity when they refer to the same instance.

Identifier is a concrete, observable value used to refer to a thing - unique within its data space, but not necessarily covering the thing's full lifecycle. An email identifies a person today but may change tomorrow. Strive for full-lifecycle identifiers; design for the reality that many are not.

The relationship is many-to-one: multiple identifiers converge to a single instance.

DNA ─────────────┐
Iris scan ───────┤
Fingerprint ─────┼──→  1  ──→  One person, one instance
National ID ─────┤
Phone number ────┘

In a financial context:

ISIN ────────────────┐
FIGI ────────────────┤
Source system ID ─────┼──→  1  ──→  Google Stock (Class C), one instance
Bloomberg ID ────────┘

The number "1" is the surrogate key - the system-generated value that says "these identifiers all refer to the same instance." Without it, identifiers float independently; with it, the instance is unified.

Business Identifiers vs Analytical Identifiers

Business identifiers are what people use in daily work - an order number, a customer name, an account reference. They:

  • Do not need to be globally unique - "Account 12345" makes sense within one system
  • Identify an instance at a specific point in time, not necessarily across its full lifecycle
  • Are what arrives in the source data

Analytical identifiers are what the data warehouse uses to ensure one-instance-per-thing. They:

  • Must be globally unique - no two instances can share the same identifier
  • Must represent the instance from creation to deletion - cradle to grave
  • Bridge the messy reality of business identifiers and the clean requirement of unified instances

The analytical system's job is to resolve business identifiers from many sources into analytical identifiers that guarantee one instance.

Building Analytical Identifiers

An analytical identifier has three parts:

PartPurposeExample
Owner/IssuerWho issued this identifierISIN Organization, Bloomberg
TypeWhat kind of identifierSTOCK, CUSTOMER, ACCOUNT
ValueThe identifier itselfUS02079K1079

Combined into a token: {Owner+Type code}{Value}. For example, G0010001US02079K1079, where G0010001 is an 8-character code representing "ISIN Organization + STOCK type" and US02079K1079 is the ISIN value.

The Owner+Type prefix is fixed-width by design: 4 characters for the Owner, 4 characters for the Type. The exact widths are not the point - what matters is that every Owner and every Type is a unique string of the same length across the system. Concatenation without a separator is then collision-safe by construction.

The structure is easier to see with a second issuer. Suppose ISIN Organization is assigned the Owner code G001 and the Type "STOCK" is assigned 0001; an ISIN-identified stock carries the prefix G0010001. The same stock may also be identified by a CUSIP, issued by the American Bankers Association. Assigning that issuer the Owner code G002 while keeping the Type 0001 for STOCK yields G0020001. The CUSIP value 02079K107 then forms the full identifier G002000102079K107 - same stock, same Type, different Owner, different token.

A word on the Owner codes used throughout this article: the G series identifies global registries (ISIN Organization, CUSIP / American Bankers Association, Bloomberg, Reuters); local systems use A001 through A999, rolling into B past 999; country-specific issuers use L - a Swedish personal number, for example. The convention is a design choice, not a property of the system; what matters is that every Owner is a unique 4-character string.

The result is integration; collision avoidance is a consequence. Once the identifier class (ISIN + STOCK) is mapped to its Owner+Type code, that definition applies everywhere - three systems all sending stock data with ISIN codes produce the same analytical identifier and resolve to the same instance. No per-source matching rules, no manual mapping, no reconciliation at load time.

Collision prevention falls out as a side effect: two systems using the same raw value ("12345") for different purposes - one an internal account ID, the other a loyalty program number - produce different analytical identifiers because their Owner+Type codes differ.

The principle: standardise the identifier definition across systems, and integration follows automatically. Collision prevention is a structural consequence, not a separate mechanism.

Connecting Identifiers to an Instance

There is no magic - only relation. A new identifier connects to an existing instance only if the incoming data carries at least one identifier already linked to that instance.

Example: instance 1 is already linked by its ISIN US02079K1079 and its CUSIP 02079K107. A new Bloomberg source arrives with FIGI BBG009S3NB30 (Owner code G003, Type 0001 for STOCK). The FIGI links to instance 1 only if the same row also carries one of the known identifiers.

IDENTIFIERTIMESTAMPSTATUSINSTANCE
G0010001US02079K10792001-01-01ACTIVE1
G002000102079K1072001-01-01ACTIVE1

Later, a FIGI arrives alongside the known ISIN:

IDENTIFIERTIMESTAMPSTATUSINSTANCE
G0010001US02079K10792001-01-01ACTIVE1
G002000102079K1072001-01-01ACTIVE1
G0030001BBG009S3NB302001-04-01ACTIVE1

The FIGI gets instance 1 because the source row included the already-linked ISIN. The set of identifiers for this instance grows over time - but the instance remains one.

The mechanism is co-occurrence in source data; new identifiers attach to existing instances through the identifiers they share, without any need to know the full set upfront.

How This Differs from a Data Vault Hub

Readers familiar with Data Vault will recognise a related lineage. A Data Vault hub also assigns every business key a stable surrogate, and it also separates the durable identity of an instance from its attributes and relationships. The design decisions, however, diverge in where the integration of an instance actually happens.

Data Vault uses the same-as-link: when two different business keys in the same hub refer to the same real-world thing, a link records the fact that they are the same. But the link does not integrate the instance. You still have two hubs, each with its own surrogate key. The integration is deferred - pushed downstream into queries, satellites, or business-vault logic. Eventually, somewhere in the chain, the two surrogate keys have to be reconciled to one. Otherwise you end up with duplicate instances of the same real-world thing.

The IDFR pattern resolves identity in one step. Many identifiers map to one instance directly, in a single table, with the Owner+Type prefix doing the integration work at design time. By the time the instance is loaded, it already has one identity. Nothing about it is deferred.

Both approaches solve real problems. Data Vault is shaped to capture and version every source's view of a key over time. The IDFR pattern is shaped to make many identifiers resolve to one instance, up-front, without per-source matching. Same-as-links can simulate IDFR-style unification, but only by adding the reconciliation step the IDFR pattern was designed to remove.

Key Collision vs Key Integration

Two situations arise when identifiers from different sources meet:

Key integration: same Owner+Type, same value → same instance. Three systems all sending ISIN US02079K1079 for stocks unify to one instance by design.

Key collision: same raw value, different meaning → different instances. "Customer 100" in system A is not "Customer 100" in system B. A naive merge would collapse two real instances into one. The Owner+Type prefix prevents this: with 0002 as the Type code for CUSTOMER, system A's identifier becomes A0010002100 and system B's becomes A0020002100 - different analytical identifiers, different instances.

Implementation: The IDFR Table

The pattern described below comes from the Focal Framework, an analytical modelling approach established in the late 1990s with its own modelling technique called Focal point. The IDFR table structure and the code that loads it - both described below - give you a concrete pattern for key integration that any team can adopt on any analytical platform, regardless of modelling technique. Further reading: Focal Framework documentation. These patterns are also walked through in conversation in the AgileData Podcast episode with Shane Gibson. The same describe-the-definition-then-generate-the-code shape shows up one layer down too - see Contract-Driven Data Transformation for the DAS-layer instance.

The Owner+Type prefix discipline established earlier is a separate concern: a quality and consistency commitment, recommended as best practice, but not required by the code itself. The pipeline below matches identifier strings byte-for-byte; what those strings encode is the modeller's choice.

The _idfr Table

Every entity gets an {ENTITY}_idfr table - the identifier-to-instance mapping. The {ENTITY} prefix is a placeholder for the business entity you are modelling: an INSTRUMENT_idfr table for instruments, a CUSTOMER_idfr table for customers, and so on. The design below is a generic representation; depending on your platform you can add or remove columns or define them differently. Only the IDFR and KEY columns are essential.

ColumnTypePurpose
{ENTITY}_idfrVARCHAR(255)The identifier value
eff_tmstpTIMESTAMP(6)When this identifier became active
ver_tmstpTIMESTAMP(6)Version timestamp (for change tracking)
row_stVARCHAR(1)Row status: Y=active, N=soft-deleted
{ENTITY}_keyVARCHAR(256)The surrogate key - the instance
inst_keyBIGINTPipeline execution instance
popln_tmstpTIMESTAMP(6)When this row was loaded

Multiple rows share the same {ENTITY}_key - the many-to-one relationship. Three identifiers with the same key = one instance known by three names. The structure is deliberately generic: one column for the value, one for the instance key.

Multi-Identifier Slots

The pattern supports up to 5 identifier slots per entity (IDFR_01 through IDFR_05); unused slots stay NULL. Five slots is typical of the identifier counts found in customer, instrument, and account modelling - a customer might have an internal ID, email, phone, national ID, and a loyalty program number; a financial instrument might have ISIN, FIGI, CUSIP, SEDOL, and a source system ID.

When an entity legitimately needs a sixth slot beyond the default five, you extend the code by one more slot at every stage of the pipeline. The logic is exactly the same, only the slot count changes. The _idfr table itself is untouched and just receives more rows. The cost is a code change, not a schema change - destructive only in the deployment sense, since a code release is required, but the data and table structure stay stable.

Why the Structure Scales

Two properties of the design carry the scaling story, and they reinforce each other.

The first is that the pipeline matches identifier values, not slot positions. It searches for matching identifier strings across all slots in the _idfr table, regardless of where each one lands in the incoming row. Source column names are irrelevant - whether the source calls it isin, stock_id, or security_code makes no difference once the transformation expression has produced the prefixed value. Slot assignment order can vary across sources: Bloomberg might put its ID in IDFR_01 and the ISIN in IDFR_02, while Reuters puts RIC in IDFR_01 and the ISIN in IDFR_02. The shared ISIN matches because the integration LEFT JOINs each incoming slot against the entire _idfr table - not slot-to-slot. The value creates the connection; the position is incidental.

The second is that the table itself is generic. One column for the identifier value, one column for the instance key. It does not know or care which source attribute produced the identifier. Adding a new way to identify something - a new identifier from a new source, a new identifier type not previously anticipated - only creates new rows. There is no DDL change, no ALTER TABLE, no code change to the table itself. You add the new identifier to the transformation logic, redeploy, and the next execution inserts new identifier rows that link to existing instances through shared values. The table grows; the instance stays one.

The IDFR Pipeline

The pipeline follows the standard Focal pattern: READER → INTEGRATION → TRUEDELTA → INSERT. Stages are separate for the same reason DAS, DAB, and DAR are separate layers in the architectural blueprint - separation of concerns and testability in isolation.

Stage 1 - READER: extracts up to 5 identifier values from the source table into one set of IDFR_01..05 values per row, normalised regardless of source layout.

Stage 2 - INTEGRATION: LEFT JOINs each slot against _idfr to mark each identifier NEW or EXISTING and retrieve the focal_key of any match. If any slot matches, the row resolves to that instance. The only stage that reads existing state.

Stage 3 - TRUEDELTA: splits rows by match outcome. EXISTING_KEY (at least one match) reuses the matched focal_key and attaches any NEW identifiers in unmatched slots - this is how identifiers accumulate. NEW_KEY (no match anywhere) gets a fresh focal_key (UUID).

Stage 4 - INSERT: writes NEW identifiers to _idfr. Writes are deferred to the last stage so resolution stays side-effect-free until final.

The cost is real. Four stages run before any attribute data is loaded. For an entity with a single, stable primary key, this overhead is unjustified - the pipeline should be skipped entirely. Treat multi-identifier resolution as opt-in: turn it on only for entities that genuinely have multiple, source-spanning identifiers.

How Instance Resolution Works in Practice

Two loads make the mechanism concrete: a financial instrument arrives first from Bloomberg, then from Reuters. Both source rows carry the ISIN as one of their identifiers - the shared value that enables resolution. The Owner+Type prefixes are applied wherever the source values are turned into analytical identifiers, by whatever transformation layer the platform provides.

First load - Bloomberg sends an instrument. Suppose the entity is named INSTRUMENT, so the table is INSTRUMENT_idfr and the surrogate key column is INSTRUMENT_key. The source row carries IDFR_01 = G0030001BBG009S3NB30 and IDFR_02 = G0010001US02079K1079 - Bloomberg's FIGI and the ISIN, each prefixed with its Owner+Type code. INTEGRATION finds neither in _idfr. TRUEDELTA classifies the row as NEW_KEY and generates a fresh surrogate key value (UUID-ABC) which is stored in INSTRUMENT_key. INSERT writes both identifiers to _idfr:

_idfr:
  (G0030001BBG009S3NB30UUID-ABC)
  (G0010001US02079K1079UUID-ABC)

Second load - Reuters sends the same instrument under a different identifier. The row carries IDFR_01 = G0040001GOOG.OQ and IDFR_02 = G0010001US02079K1079. The RIC is unknown but the ISIN is already linked. That single match is enough: TRUEDELTA reuses UUID-ABC and tags the RIC as a NEW identifier on a known instance:

_idfr:
  (G0040001GOOG.OQUUID-ABC)new identifier, same instance

Three identifiers - the Bloomberg ID, the ISIN, and the RIC - now point at the same instance. Any data arriving via any of them lands on the same entity. Automatic integration, no duplicates, as long as the shared identifier appears in the source.

Downstream tables see the result, not the resolution. A table keyed by {ENTITY}_key returns rows from both sources for a single key; the surrogate key is the only thing the rest of the architecture needs to integrate the attribute data.

SELECT * FROM INSTRUMENT_desc WHERE INSTRUMENT_key = 'UUID-ABC';
-- Returns data from BOTH Bloomberg and Reuters sources
-- Because both resolved to the same instance through the shared ISIN

Failure Modes

Two failure modes shadow this mechanism:

  • False unification: a wrong Owner+Type taxonomy maps two source systems with disjoint identifier spaces to the same prefix, and integration becomes corruption. The prefix is the integration contract - assigning it sloppily is the failure mode.
  • Identifier orphaning: a new data source arrives with no incoming row that shares an identifier with an existing instance. Resolution creates a duplicate, not because the design failed but because the precondition - co-occurrence in source data - was not met.

These two failure modes are governance concerns at design time. At runtime, the _idfr table can become corrupt - a false-unification slipped through, source data was wrong, an external writer touched the table, or a new source row brings two previously-independent instances together and reveals they should have been one. The fix is the same in every case: identify the affected rows and repoint the identifiers to their correct keys. Corruption is maintenance, not a flaw in the mechanism.

Implementation: The Code

The code that produces the behaviour above is broken into four stages, walked through below. The example carries only the fundamental logic; handling of timestamps, batch IDs, and other bookkeeping is left to your platform's conventions.

The pipeline is parameter-driven: 'p_*' strings are substitution placeholders that a templating layer replaces at compile time with actual column or table names, and {{*}} are template placeholders for platform-specific syntax (data types, functions like NOW() or your key generator). Treat them as fill-in-the-blank slots when adapting the example to your stack.

The code refers to the surrogate key as focal_key throughout - this is the Focal Framework's generic name for the column you saw earlier as {ENTITY}_key in the schema and INSTRUMENT_key in the walkthrough. The naming is a Focal Framework convention; rename it to whatever fits your code standards.

Stage 1 - Reader

The reader extracts the identifier values from the source row into the five normalised IDFR_01..05 slots, applying the Owner+Type prefix in the CAST.

WITH idfr_reader AS (
SELECT DISTINCT
  CAST('p_FOCAL01_IDFR01' AS {{STRING}}) AS IDFR_01,    -- IDFR_01 definition, for example CONCAT('G0010001', ISIN_code)
  CAST('p_FOCAL01_IDFR02' AS {{STRING}}) AS IDFR_02,    -- IDFR_02 definition, for example CONCAT('G0020001', CU_ID)
  CAST('p_FOCAL01_IDFR03' AS {{STRING}}) AS IDFR_03,
  CAST('p_FOCAL01_IDFR04' AS {{STRING}}) AS IDFR_04,
  CAST('p_FOCAL01_IDFR05' AS {{STRING}}) AS IDFR_05,
  CAST('p_PROCINST_KEY'   AS {{BIGINT}}) AS procinst_key
FROM "p_SOURCE_PHYSICAL_SCHEMA"."p_SOURCE_TABLE"        -- the source table where the columns ISIN_code and CU_ID (CUSIP) reside
)

Stage 2 - Integration

The integration stage LEFT JOINs each incoming slot against the _idfr table to retrieve a focal_key for any match. The slot-by-slot classification (NEW_IDFR vs EXISTING_IDFR) is stored in DELTAFILTER_01..05 and used downstream to decide what to insert.

, idfr_integration AS (
SELECT
  CASE WHEN (KDW01."p_FOCAL01_KEY" IS NOT NULL
          OR KDW02."p_FOCAL01_KEY" IS NOT NULL
          OR KDW03."p_FOCAL01_KEY" IS NOT NULL
          OR KDW04."p_FOCAL01_KEY" IS NOT NULL
          OR KDW05."p_FOCAL01_KEY" IS NOT NULL)
       THEN 'EXISTING_KEY' ELSE 'NEW_KEY' END AS exist_new,
  STG.IDFR_01,
  KDW01."p_FOCAL01_KEY" AS focal_key_01,
  CASE WHEN STG.IDFR_01 IS NOT NULL AND KDW01."p_FOCAL01_KEY" IS NULL     THEN 'NEW_IDFR'
       WHEN STG.IDFR_01 IS NOT NULL AND KDW01."p_FOCAL01_KEY" IS NOT NULL THEN 'EXISTING_IDFR'
       ELSE NULL END AS DELTAFILTER_01,
  STG.IDFR_02,
  KDW02."p_FOCAL01_KEY" AS focal_key_02,
  CASE WHEN STG.IDFR_02 IS NOT NULL AND KDW02."p_FOCAL01_KEY" IS NULL     THEN 'NEW_IDFR'
       WHEN STG.IDFR_02 IS NOT NULL AND KDW02."p_FOCAL01_KEY" IS NOT NULL THEN 'EXISTING_IDFR'
       ELSE NULL END AS DELTAFILTER_02,
  STG.IDFR_03,
  KDW03."p_FOCAL01_KEY" AS focal_key_03,
  CASE WHEN STG.IDFR_03 IS NOT NULL AND KDW03."p_FOCAL01_KEY" IS NULL     THEN 'NEW_IDFR'
       WHEN STG.IDFR_03 IS NOT NULL AND KDW03."p_FOCAL01_KEY" IS NOT NULL THEN 'EXISTING_IDFR'
       ELSE NULL END AS DELTAFILTER_03,
  STG.IDFR_04,
  KDW04."p_FOCAL01_KEY" AS focal_key_04,
  CASE WHEN STG.IDFR_04 IS NOT NULL AND KDW04."p_FOCAL01_KEY" IS NULL     THEN 'NEW_IDFR'
       WHEN STG.IDFR_04 IS NOT NULL AND KDW04."p_FOCAL01_KEY" IS NOT NULL THEN 'EXISTING_IDFR'
       ELSE NULL END AS DELTAFILTER_04,
  STG.IDFR_05,
  KDW05."p_FOCAL01_KEY" AS focal_key_05,
  CASE WHEN STG.IDFR_05 IS NOT NULL AND KDW05."p_FOCAL01_KEY" IS NULL     THEN 'NEW_IDFR'
       WHEN STG.IDFR_05 IS NOT NULL AND KDW05."p_FOCAL01_KEY" IS NOT NULL THEN 'EXISTING_IDFR'
       ELSE NULL END AS DELTAFILTER_05,
  STG.procinst_key
FROM idfr_reader STG    -- the reader output above
LEFT OUTER JOIN "p_TARGET_PHYSICAL_SCHEMA"."p_FOCAL01_IDFRTABLE" KDW01 ON STG.IDFR_01 = KDW01."p_FOCAL01_IDFR"
LEFT OUTER JOIN "p_TARGET_PHYSICAL_SCHEMA"."p_FOCAL01_IDFRTABLE" KDW02 ON STG.IDFR_02 = KDW02."p_FOCAL01_IDFR"
LEFT OUTER JOIN "p_TARGET_PHYSICAL_SCHEMA"."p_FOCAL01_IDFRTABLE" KDW03 ON STG.IDFR_03 = KDW03."p_FOCAL01_IDFR"
LEFT OUTER JOIN "p_TARGET_PHYSICAL_SCHEMA"."p_FOCAL01_IDFRTABLE" KDW04 ON STG.IDFR_04 = KDW04."p_FOCAL01_IDFR"
LEFT OUTER JOIN "p_TARGET_PHYSICAL_SCHEMA"."p_FOCAL01_IDFRTABLE" KDW05 ON STG.IDFR_05 = KDW05."p_FOCAL01_IDFR"
)

Stages 3 & 4 - True Delta and Insert

The true delta and insert stages unfold across four chained CTEs. They classify each row into EXISTING_KEY or NEW_KEY (sharing the matched key across slots within EXISTING, generating a fresh UUID within NEW), union them, unpivot into one row per identifier-key pair, and finally insert only the NEW_IDFR rows.

The symmetric 5-slot expansion and the UNION ALL unpivot below are deliberately platform-neutral - they work on any SQL engine. If your platform offers compact alternatives (UNPIVOT, arrays, LATERAL views), feel free to substitute; the logic stays the same.

-- 3a. EXISTING_KEY rows: at least one slot found a match.
-- Spread the matched key across all slots so every IDFR in the row carries the same key.
, idfr_existing AS (
SELECT
  exist_new,
  DELTAFILTER_01, DELTAFILTER_02, DELTAFILTER_03, DELTAFILTER_04, DELTAFILTER_05,
  IDFR_01, IDFR_02, IDFR_03, IDFR_04, IDFR_05,
  CASE WHEN IDFR_01 IS NOT NULL THEN COALESCE(focal_key_01, focal_key_02, focal_key_03, focal_key_04, focal_key_05) ELSE NULL END AS focal_key_01,
  CASE WHEN IDFR_02 IS NOT NULL THEN COALESCE(focal_key_01, focal_key_02, focal_key_03, focal_key_04, focal_key_05) ELSE NULL END AS focal_key_02,
  CASE WHEN IDFR_03 IS NOT NULL THEN COALESCE(focal_key_01, focal_key_02, focal_key_03, focal_key_04, focal_key_05) ELSE NULL END AS focal_key_03,
  CASE WHEN IDFR_04 IS NOT NULL THEN COALESCE(focal_key_01, focal_key_02, focal_key_03, focal_key_04, focal_key_05) ELSE NULL END AS focal_key_04,
  CASE WHEN IDFR_05 IS NOT NULL THEN COALESCE(focal_key_01, focal_key_02, focal_key_03, focal_key_04, focal_key_05) ELSE NULL END AS focal_key_05,
  procinst_key
FROM idfr_integration
WHERE COALESCE(IDFR_01, IDFR_02, IDFR_03, IDFR_04, IDFR_05) IS NOT NULL
  AND exist_new = 'EXISTING_KEY'
)

-- 3b. NEW_KEY rows: no slot found a match. Generate a fresh key.
, idfr_new AS (
SELECT
  exist_new,
  DELTAFILTER_01, DELTAFILTER_02, DELTAFILTER_03, DELTAFILTER_04, DELTAFILTER_05,
  IDFR_01, IDFR_02, IDFR_03, IDFR_04, IDFR_05,
  {{KEY_GENERATOR}} AS focal_key_NEW,    -- a fresh surrogate key from your platform's key generator
  procinst_key
FROM idfr_integration
WHERE COALESCE(IDFR_01, IDFR_02, IDFR_03, IDFR_04, IDFR_05) IS NOT NULL
  AND exist_new = 'NEW_KEY'
)

-- 3c. Union the two sides into one stream of (IDFR slot, key) pairs.
, idfr_delta AS (
SELECT
  exist_new,
  DELTAFILTER_01, DELTAFILTER_02, DELTAFILTER_03, DELTAFILTER_04, DELTAFILTER_05,
  IDFR_01, IDFR_02, IDFR_03, IDFR_04, IDFR_05,
  CASE WHEN IDFR_01 IS NULL THEN NULL ELSE focal_key_NEW END AS focal_key_01,
  CASE WHEN IDFR_02 IS NULL THEN NULL ELSE focal_key_NEW END AS focal_key_02,
  CASE WHEN IDFR_03 IS NULL THEN NULL ELSE focal_key_NEW END AS focal_key_03,
  CASE WHEN IDFR_04 IS NULL THEN NULL ELSE focal_key_NEW END AS focal_key_04,
  CASE WHEN IDFR_05 IS NULL THEN NULL ELSE focal_key_NEW END AS focal_key_05,
  procinst_key
FROM idfr_new
UNION ALL
SELECT
  exist_new,
  DELTAFILTER_01, DELTAFILTER_02, DELTAFILTER_03, DELTAFILTER_04, DELTAFILTER_05,
  IDFR_01, IDFR_02, IDFR_03, IDFR_04, IDFR_05,
  focal_key_01, focal_key_02, focal_key_03, focal_key_04, focal_key_05,
  procinst_key
FROM idfr_existing
)

-- 4. Unpivot so each (IDFR, key) becomes its own row, keep only NEW_IDFR rows for insert.
, idfr_load AS (
SELECT
  exist_new,
  IDFR,
  COALESCE(CAST('p_EFF_TMSTP_DATA' AS {{TIMESTAMP}}), {{NOW}}) AS eff_tmstp,
  {{NOW}} AS ver_tmstp,
  'Y'     AS row_st,
  focal_key,
  procinst_key AS inst_key,
  {{NOW}} AS popln_tmstp
FROM (
  SELECT exist_new, IDFR_01 AS IDFR, DELTAFILTER_01 AS DELTAFILTER, focal_key_01 AS focal_key, procinst_key
  FROM idfr_delta WHERE IDFR_01 IS NOT NULL
  UNION ALL
  SELECT exist_new, IDFR_02, DELTAFILTER_02, focal_key_02, procinst_key
  FROM idfr_delta WHERE IDFR_02 IS NOT NULL
  UNION ALL
  SELECT exist_new, IDFR_03, DELTAFILTER_03, focal_key_03, procinst_key
  FROM idfr_delta WHERE IDFR_03 IS NOT NULL
  UNION ALL
  SELECT exist_new, IDFR_04, DELTAFILTER_04, focal_key_04, procinst_key
  FROM idfr_delta WHERE IDFR_04 IS NOT NULL
  UNION ALL
  SELECT exist_new, IDFR_05, DELTAFILTER_05, focal_key_05, procinst_key
  FROM idfr_delta WHERE IDFR_05 IS NOT NULL
) UNPIVOTED
WHERE DELTAFILTER = 'NEW_IDFR'
)
INSERT INTO "p_TARGET_PHYSICAL_SCHEMA"."p_FOCAL01_IDFRTABLE"   -- INSERT placement depends on platform syntax
SELECT
  IDFR        AS "p_FOCAL01_IDFR",
  eff_tmstp,
  ver_tmstp,
  row_st,
  focal_key   AS "p_FOCAL01_KEY",
  inst_key,
  popln_tmstp
FROM idfr_load
;

Closing: Integration as a Property of the Design

The promise at the top was automatic integration across systems. The mechanism is now visible: a fixed-width Owner+Type prefix turns identifier definitions into a contract that source systems share, and the IDFR pipeline resolves identifiers to a single instance without per-source logic. No matching engine, no reconciliation step, no rules table. The structure does the work.

The cost is real - a 5-slot default that requires a code release to extend, four pipeline stages before any attribute data loads, and two failure modes (false unification from sloppy taxonomy, identifier orphaning from missing co-occurrence) that demand active design attention. These are demands the mechanism makes on the modeller, not flaws in the mechanism itself. And the pattern is portable: any analytical platform with relational tables and a pipeline orchestrator can carry it, regardless of modelling technique. The design discipline is what does the work, not the runtime. (For the broader case of why this discipline matters - and what data engineering looks like when the model drives the code - see The Rise of the Model-Driven Data Engineer.)

This maps to the absolute principles from the architectural blueprint: integration is not a transformation applied after loading - it is a property of the analytical identifier itself, decided at design time, in the DAB layer. The IDFR pipeline is the gatekeeper doing its work, enforcing what the design already declared instead of asking the runtime to guess.