Deploy Securely

Sensitive Data Generation

StackAware

I’m worried about data leakage from LLMs, but probably not why you think.

While unintended training is a real risk that can’t be ignored, something else is going to be a much more serious problem: sensitive data generation (SDG).

A recent paper (https://arxiv.org/pdf/2310.07298v1.pdf) shows how LLMs can infer huge amounts of personal information from seemingly innocuous comments on Reddit.

And this phenomenon will have huge impacts for:

- Material nonpublic information
- Executive moves
- Trade secrets

and the ability to keep them confidential.

Check out the full post in Deploy Securely for a breakdown: https://blog.stackaware.com/p/sensitive-data-generation