Showing 1 - 1 of 1

Files from Krishnamurthy Dvijotham

First Active	2024-03-13
Last Active	2024-03-13

Stealing Part Of A Production Language Model: Posted Mar 13, 2024; Authored by David Rolnick, Jonathan Hayase, Eric Wallace, Nicholas Carlini, Arthur Conmy, Thomas Steinke, Matthew Jagielski, Florian Tramer, Krishnamurthy Dvijotham, Daniel Paleka, Katherine Lee, Milad Nasr, A. Feder Cooper; In this whitepaper, the authors introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, their attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under $20 USD, their attack extracts the entire projection matrix of OpenAI's ada and babbage language models. They thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. They also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix. They conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend this attack.; tags | exploit, paper, vulnerability; SHA-256 | 35bb26fb1fe58d91b595fbecc219b129076e6cc3ae746288dc27c6fa0d128e6a; Download | Favorite | View

Page 1 of 1 Back 1 Next

Top Authors In Last 30 Days