The annual Black Hat conference brings together hackers and researchers from all over the world, eager to share their latest discoveries. We have some tantalizing guesses about what we'll learn this year.
does know how to reach a certain Peter W - (name redacted for privacy). When prompted with a short snippet of Internet text, the model accurately generates Peter’s contact information, including his work address, email, phone, and fax:
In our recent paper, we evaluate how large language models
memorize and
regurgitate such rare snippets of their training data.
We focus on GPT-2 and find that at least 0.1% of its text generations (a very conservative estimate) contain long verbatim strings that are “copy-pasted” from a document in its training set.
Such memorization would be an obvious issue for language models that are trained on private data, e.g., on users’ emails, as the model might inadvertently output a user’s sensitive conversations. Yet, even for models that are trained on