Inside the HIPAA compliance pack: what we redact and why

Five customer interviews, one healthcare CISO, and a lot of re-reading the Privacy Rule. Notes from shipping the most technically demanding feature we've built.

When our third healthcare customer asked whether OrgLens was HIPAA-compliant, we said "yes" — because we are a SaaS product that processes metadata, not PHI, and we had signed a BAA with our hosting provider. That answer was technically accurate.

It was also the wrong answer for what they were actually asking. What they wanted to know was: "When OrgLens scans our Salesforce org and generates documentation, will it expose PHI that happens to be stored in field-level values?" That is a different question entirely. The answer required us to build the HIPAA compliance pack.

Why HIPAA is hard for metadata tools specifically

Most software that claims HIPAA compliance is protecting data at rest and in transit. That's solved — TLS, AES-256, access controls. The HIPAA problem for metadata tools is different: it's a semantic problem.

A Salesforce org used by a healthcare organization often contains picklist values, default field values, formula outputs, and field descriptions that were written by a human and contain PHI. A picklist value called "Patient is experiencing..." is not a database record. It's metadata. It's in the Metadata API response. It flows through our scanner. If we naively include that picklist value in the generated documentation, we have transferred PHI to our documentation system without authorization.

We also discovered something more subtle during the five customer interviews we ran before designing the pack. In three of the five orgs, there were custom fields with labels that included patient identifiers. Not field values — field labels. Something like SSN_John_Smith_Backup__c is obviously wrong, but it exists in real production orgs. Our scanner, ingesting the field label, would propagate that identifier.

The 18 Safe Harbor identifiers and how they appear in Salesforce fields

The HIPAA Safe Harbor method defines 18 categories of identifiers that must be removed from any dataset before it can be considered de-identified. We built a detector for each one as it appears in Salesforce metadata.

const PHI_DETECTORS: Detector[] = [
  { type: 'name',        pattern: /\b(patient|member|subscriber)\s+name\b/i },
  { type: 'geographic',  pattern: /\b(zip|postal|county|city|state)\b.*\bpatient\b/i },
  { type: 'date',        pattern: /\b(dob|date.of.birth|birth.?date|admit.?date|discharge)\b/i },
  { type: 'phone',       pattern: /\b(phone|fax|tel)\b.*\bpatient\b/i },
  { type: 'email',       pattern: /patient.*email|email.*patient/i },
  { type: 'ssn',         pattern: /\b(ssn|social.?security)\b/i },
  { type: 'mrn',         pattern: /\b(mrn|medical.?record|chart.?number)\b/i },
  { type: 'account',     pattern: /patient.*account.?number/i },
  { type: 'certificate', pattern: /\b(license|certificate|permit).?number\b.*patient/i },
  { type: 'vehicle',     pattern: /vehicle.?(vin|serial|plate)/i },
  { type: 'device',      pattern: /device.?(serial|identifier|id)\b/i },
  { type: 'url',         pattern: /patient.*\b(url|web|site)\b/i },
  { type: 'ip',          pattern: /ip.?address.*patient/i },
  { type: 'biometric',   pattern: /\b(fingerprint|retina|iris|voice)\b/i },
  { type: 'photo',       pattern: /patient.*(photo|image|picture)/i },
  { type: 'age',         pattern: /patient.*\bage\b|\bage\b.*patient/i },
];

These detectors run against field labels, picklist value labels, default values, formula text, description fields, and help text. They do not run against actual record data — we never access record data. But they do run against every string in the Metadata API response.

What we redact: the exact rules

When the HIPAA pack is enabled (it's a per-org toggle, requires the Enterprise plan), the following happens:

Picklist values: Any picklist value label that matches a PHI detector is replaced with [REDACTED — HIPAA §164.514] in the generated documentation. The value's API name is preserved because it's needed for validation rule analysis. Only the label is suppressed.

Default field values: If a text or formula field has a default value containing a PHI pattern, the entire default value is suppressed from documentation output. We log the suppression to the audit trail.

Field labels: If a field label contains a PHI pattern, OrgLens flags it as a HIGH-severity finding and does not generate documentation for that field until the admin reviews and resolves the finding. We do not auto-redact field labels — that would silently corrupt the documentation structure.

Field descriptions and help text (existing): If the Salesforce org already has a field description or help text that contains a PHI pattern, we suppress it from the documentation export and log the finding. We do not delete it from the org — we only omit it from our output.

What we never do: We do not send any detected PHI to the model. The context assembly step runs PHI detection before the model context is built, and detected content is replaced with type-tagged placeholders. The model never sees the original string.

"I expected to spend three months on the BAA review alone. OrgLens came in with a detailed technical spec of exactly what they access, what they detect, and what they suppress. Our legal team had questions answered before they could ask them."
— Chief Information Security Officer, Helios Bio (name withheld per request)

What we do not redact — and why the distinction matters

We do not redact field API names. An API name like Patient_DOB__c does not contain PHI — it contains a reference to the concept of a date of birth. Redacting it would make the documentation useless. We do flag field API names that match PHI patterns as findings, but we document them as-is.

We do not redact object names or relationship names. A custom object called Patient_Encounter__c is a healthcare concept, not PHI. Its documentation — "this object stores interactions between a member and a care team" — is clinical in nature but not a privacy violation.

We do not redact validation rule logic. A validation rule that checks IF(ISPICKVAL(Diagnosis_Code__c, 'ICD-10'), ...) contains a clinical concept but not a patient identifier. Including it in documentation is necessary for admins to understand the data model.

The distinction: PHI is information about a specific patient. Clinical concepts and field structures that describe a care delivery data model are not PHI, even when they use clinical language.

What a healthcare CISO actually wanted

We interviewed the CISO of a 9,000-employee health system during the design phase. Their actual concern was not our privacy controls — they assumed those were solved. Their concern was auditability. When an external auditor asks "what tools have access to your Salesforce metadata?" they need to be able to show exactly what OrgLens accessed, what it processed, what it sent to a model, and what it stored.

That conversation shaped the audit trail architecture more than any compliance requirement did. Our CISO interview produced four requirements that weren't on our original spec: per-scan access logs with API call counts, per-field model context logs (what we sent to the model for each field), a PHI detection log showing every positive match and its disposition, and a monthly data retention report showing what we've purged.

The audit trail: why it's not an afterthought

The audit trail is a first-class feature, not a logging system bolted on after the fact. Every scan generates four log streams:

Access log — timestamped list of every Metadata API call, with the object and field targeted and the response size in bytes. No field values, no PHI.
Detection log — every PHI pattern match, with the field reference, the pattern type that matched, and the action taken (redacted, flagged, suppressed). The matched string itself is stored encrypted at rest and auto-deleted after 90 days.
Model context log — what we sent to the inference API per field, with PHI replacements applied. Reviewable by the org admin. Auto-deleted after 30 days.
Disposition log — admin decisions on every generated description: approved, edited (with before/after), rejected. Permanent retention.

All four streams are exportable as JSON or CSV. They feed into the Trust Center, which healthcare customers use as evidence in their annual audits.

What's not in v1 and why

We did not include automated de-identification of live field values in v1. This is the natural extension of the compliance pack — not just detecting PHI in metadata, but scanning for PHI that admins have accidentally stored in field default values or description text. We didn't build it because it requires record-level access, which requires a separate data processing agreement and a substantially different security model. That's v2.

We also didn't build per-field access controls in v1. The HIPAA pack applies at the org level — either the whole org has HIPAA mode on, or it doesn't. A field-level toggle would require us to maintain a field ACL system that mirrors Salesforce's own permission model. That's architecturally complex and creates its own compliance surface. We're evaluating it for v2 based on customer demand.

If you operate a Salesforce org in a healthcare context and want to see how the HIPAA compliance pack applies to your specific data model, book a call with our compliance team. We'll walk through your org's structure and identify the exact findings the pack would surface on day one.

Inside the HIPAA compliance pack: what we redact and why

Why HIPAA is hard for metadata tools specifically

The 18 Safe Harbor identifiers and how they appear in Salesforce fields

What we redact: the exact rules

What we do not redact — and why the distinction matters

What a healthcare CISO actually wanted

The audit trail: why it's not an afterthought

What's not in v1 and why

HIPAA-compliant Salesforce documentation.

How we scan a 50,000-field Salesforce org in 12 minutes

Help text that ships: writing field descriptions admins actually use

Helios Bio: 4× faster compliance audit