Listen up all talented early-stage researchers! 👂🤖
We're hiring for a 6-month residency in my team at @AISecurityInst to assist cutting-edge research on how frontier AI influences humans!
It's an exciting & well-paid role for MSc/PhD students in ML/AI/Psych/CogSci/CompSci 🧵
Come to LLMSEC at ACL & hear Niloofar's keynote
"What does it mean for agentic AI to preserve privacy?" - @niloofar_mire, Meta/CMU
(Friday 1st Aug, 11.00; Austria Center Vienna Hall B)
See you there!
#acl2025#acl2025nlp
First keynote at LLMSEC 2025, ACL:
"A Bunch of Garbage and Hoping: LLMs, Agentic Security, and Where We Go From Here" Erick Galinkin
Friday 09.05 Hall B
Details: sig.llmsecurity.net/workshop/ - #ACL2025NLP
Gritty Pixy
"We leverage the sensitivity of existing QR code readers and stretch them to their detection limit. This is not difficult to craft very elaborated prompts and to inject them into QR codes. What is difficult is to make them inconspicuous as we do here with Gritty…
ChatTL;DR – You Really Ought to Check What the LLM Said on Your Behalf 🌶️
"assuming that in the near term it’s just not machines talking to machines all the way down, how do we get people to check the output of LLMs before they copy and paste it to friends, colleagues, course…
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
"we introduce the Generative Offensive Agent Tester (GOAT), an automated agentic red teaming system that simulates plain language adversarial conversations while leveraging multiple adversarial prompting…
LLMmap: Fingerprinting For Large Language Models
"With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM…
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis 🌶️
"Our study evaluates prominent scanners - Garak, Giskard, PyRIT, and CyberSecEval - that adapt red-teaming practices to expose these vulnerabilities. We detail the distinctive features…
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
(-- look at that perf/latency pareto frontier. game on!)
"State-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). We propose…
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
"To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. We find (1) leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking, (2) simple universal…
Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge
"This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information."
"for unlearning methods with utility constraints, the…
unpopular opinion: maybe let insecure be insecure and worry about the downstream effects on end users instead of protecting the companies that bake it into their own software.
Safety comes first to deploying LLMs in applications like agents. For richer opportunities of LLMs, we mitigate prompt injections, the #1 security threat by OWASP, via Structured Queries (StruQ). Preserving utility, StruQ discourages all existing prompt injections to an ASR <2%.
6 Followers 95 Followingcomputer science student at University of Yaoundé 1,interested in computer security.Passionate about everything related to computer science,economics,geopolitic
845 Followers 129 FollowingSolidity development framework for Ethereum by @AckeeBlockchain with fuzzing, testing, and detectors.
Securing smart contracts of Lido, Aave, Axelar & more.
28 Followers 489 FollowingI am a Helpdesk Technician studying to change jobs and work in cyber security, I have started my learning by joining https://t.co/fE587NTOdO
75 Followers 755 FollowingIntellica. The Ultimate Assistant. Official Account for the Intellica Token ($INTCA). Follow for the latest updates. Official Telegram: https://t.co/TmOluQTuXA
74K Followers 4K FollowingSocialPilot is a powerful social media suite for agencies, SMBs, & multi-location brands.
Streamlined scheduling, powerful analytics & easy collaboration.
497 Followers 1K FollowingLecturer (Assistant Professor) in #NLProc @SheffieldNLP @shefcompsci opinions are my own (which are shaped by media and random things unfortunately)
1K Followers 1K Followingprobe to improve | Ph.D. @VTEngineering | Amazon Research Fellow | #AI_safety 🦺 #AI_security 🛡 | I deal with the dark side of machine learning.
852 Followers 304 FollowingSenior Lecturer at @shefcompsci Member of @SheffieldNLP Natural Language Processing, Text Analytics, Data Science, Artificial Intelligence
1K Followers 1K FollowingAI researcher. Current postdoc at @mldcmu, Ph.D. from @Penn, B.S. & B.A. from @Swarthmore, working with @GraySwanAI, formerly @GoogleAI, @Livermore_Lab.
34K Followers 832 FollowingProfessor in Computer Science at UC Berkeley, co-Director of Berkeley RDI Center; Building safe, secure, decentralized AI; Serial entrepreneur
1K Followers 298 FollowingCS Ph.D. Candidate at UIUC @IllinoisCS, focusing on scalable foundation models. I’m on the industry job market, seeking full-time research scientist positions!
25 Followers 2 FollowingThe Security & AI Podcast by @nataliepis (@OpenAI Dev Ambassador) and @justicerage (Senior Security Researcher @kaspersky) | We’re on Apple Podcasts and Spotify
123K Followers 1 FollowingTrue stories from the dark side of the Internet. Host @jackrhysider.
New episodes released on the first Tuesday of each month.
Discord: https://t.co/bZZRR8C59R
16K Followers 39 FollowingWe want to help as many people as possible understand their networks as much as possible.
Shared amongst several of the core team, but mostly @GeraldCombs.
2K Followers 219 FollowingSharing the latest in AI safety research.
"One who says they have no time to read papers will never read papers even when they have time a-plenty."
No recent Favorites. New Favorites will appear here.