TL;DR

Protect user data with anonymization, differential privacy, federated learning, and on-device AI. Never send sensitive data to public AI APIs.

Privacy risks with AI

Data sent to APIs:

  • Stored by provider
  • May be used for training
  • Could be breached

Model memorization:

  • LLMs can memorize training data
  • May leak private information

Inference attacks:

  • Adversaries extract training data from model

Anonymization techniques

PII removal:

  • Strip names, addresses, emails
  • Replace with placeholders
  • Use NER models to detect

Example:

K-anonymity:

  • Generalize data so each record matches k others
  • Harder to identify individuals

Pseudonymization:

  • Replace identifiers with pseudonyms
  • Reversible with key (store separately)

Differential privacy

Concept:

  • Add mathematical noise to data
  • Individual records can't be identified
  • Aggregate statistics remain useful

Use cases:

  • Analytics and reporting
  • Model training
  • Survey data

Trade-off:

  • Privacy vs accuracy
  • More noise = more privacy, less accuracy

On-device AI

Concept:

  • Run AI models locally
  • No data sent to cloud

Examples:

  • Apple's on-device Siri
  • Google Photos face detection (on-device mode)

Benefits:

  • Complete privacy
  • Works offline
  • Faster response

Limitations:

  • Smaller models only
  • Device hardware constraints

Federated learning

Concept:

  • Train models across devices without centralizing data
  • Devices send model updates, not raw data

Use cases:

Benefits:

  • Privacy preserved
  • Learn from distributed data

Secure enclaves

  • Protected memory areas
  • Code + data encrypted
  • Even cloud provider can't access

Compliance strategies

GDPR:

  • Lawful basis for processing
  • Data minimization
  • Right to deletion
  • Transparent about AI use

CCPA:

  • Disclosure of data collection
  • Opt-out rights
  • Don't sell personal data

HIPAA (healthcare):

  • PHI protection
  • Business Associate Agreements
  • Audit logs

Best practices

Data minimization:

  • Collect only what's needed
  • Delete after use
  • Don't send PII to public APIs

Access controls:

  • Role-based permissions
  • Audit who accesses data
  • Encrypt at rest and in transit

Transparency:

  • Privacy policies
  • Consent mechanisms
  • Explain AI use

Vendor management:

  • Vet AI providers' privacy practices
  • Use enterprise/private deployments
  • Data Processing Agreements

Implementing privacy

Pre-processing:

  • Anonymize before sending to AI
  • Remove PII programmatically
  • Use test data for development

Enterprise AI:

  • Private deployments (no data sharing)
  • Azure OpenAI, AWS Bedrock (data isolated)

Self-hosting:

  • Run open-source models
  • Complete control
  • More complex to maintain

What's next

  • Responsible AI Deployment
  • Compliance and AI
  • Security Best Practices