Dual Database Architecture:
- The Dual Database Architecture refers to the Knowledge Base of identities separated from the federated agency-specific data.
- The Knowledge Base (KB) is composed of two parts: the Knowledge-based Identity Management (KIM) system and a Trusted Identifier Manager (TIM).
Knowledge-based Identity Management (KIM) System:
- Kept offline (powered down) unless actively processing data.
- Able to keep track of multiple representations of the same person.
- Enables higher accuracy in matching.
- Protects privacy by re-using non-personal references in reports and studies.
- Without a KB, PII must be pulled and transmitted for each data request.
- Enables longitudinal research by providing a single path for data flow.
- Limits the number of copies of identity data being transmitted to one.
- Completely removes any agency data, creating a wall of separation.
- Replaces personal information with a hashed cluster ID (KIM ID).
- Once the data have been processed by KIM, the personal information is no longer needed for research requests or for matching across agency data.
Trusted Identifier Manager (TIM):
- After receiving a KIM ID, TIM assigns an agency-specific research ID.
- Agency records cannot be matched without approval from the agencies.
- When records from two state agencies require linking, TIM creates a temporary crosswalk and a research-specific substitute ID.
- The server that contains KIM/TIM is offline unless actively processing data.
Federated Agency-Specific Anonymized Data:
- Research data is on a physically separate server from personal information.
- Each agency retains ownership of its data, each agency has its own set of research IDs, and the agency data cannot be linked elsewhere without TIM.
- The anonymized agency data is updated with KIM to allow longitudinal use.
- In the event of a data breach, there is no connection between separate agencies’ records, and agency-specific IDs cannot be joined without TIM.
- Full extracts of multi-year state agency data are no longer needed because recent updates are the only extracts required from state agencies.
- Long-term studies can be supported and privacy protected using the consistent research IDs, rather than re-matching personal information.