Hongbin Na | 那洪彬 @HongbinNLP

Second Year MRes student @UTSEngage. Prev @XJTLU, @LivUni. AI for Social Good. #NLProc. INTJ 5w4 hongbin-ze.github.io Sydney Joined October 2021

Tweets

97
Followers

47
Following

625
Likes

274

Yulong Chen @Yulongchen1010

4 days ago

Can LLMs learn a new language using only a grammar book and a dictionary like how human adult L-2 learners do? Check our in-progress paper! The Gold Medals in an Empty Room: Diagnosing Metalinguistic Reasoning in LLMs with Camlang arxiv.org/pdf/2509.00425

2 6 28 5K 11

Omar Shaikh @oshaikh13

a month ago

BREAKING NEWS! Most people aren’t prompting models with IMO problems :) They’re prompting with tasks that need more context, like “plz make talk slides.” In an ACL oral, I’ll cover challenges in human-LM grounding (in 60K+ real interactions) & introduce a benchmark: RIFTS. 🧵

5 52 272 36K 177

Download Image

Dora Zhao @dorazhao9

a month ago

While we’re building amazing new human-AI systems, how do we actually know if they work well for people? In our #ACL2025 Findings Paper, we introduce SPHERE, a framework for making evaluations of human-AI systems more transparent and replicable. ✨aclanthology.org/2025.findings-…

1 26 91 12K 38

Download Image

Kobi Hackenburg @KobiHackenburg

2 months ago

Today (w/ @UniofOxford @Stanford @MIT @LSEnews) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19  LLMs, 707 political issues. We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more 🧵

14 129 436 69K 372

Download Image

Hongbin Na | 那洪彬 @HongbinNLP

2 months ago

Excited to present our survey on LLMs in Psychotherapy at #ACL2025! Let's discuss a coherent path forward! Find me at my poster: 📍Location: Hall 4/5 🗓️Session: IP-Poster Session 5, Mon, July 28, 18:00-19:30 #LLM #AI #MentalHealth #Psychotherapy

0 0 5 324 0

Download Image

马东锡 NLP @dongxi_nlp

2 months ago

Kimi K2 的一大亮点，是将文本任务里基于 token 的处理思路，成功迁移到 Agentic 场景中的 tool-call 级别：在 Agentic 任务中，tool call 就相当于“行动 token”。什么意思呢？解释如下：在文本任务中: CoT 是一串 token 而在Agentic 场景中： CoT 是一段 tool-call 序列，即planning…

18 31 226 117K 160

Tim Althoff @timalthoff

2 months ago

Most complex tasks feature ill- and underdefined goals, at least initially. Check out @vysrini's paper demonstrating that agents need to do a lot more than follow instructions, incl. refining goals, and balance competing objectives. Current frontier models fail at these tasks

Vidya Srinivas @vysrini

2 months ago

1 3 8 2K 6

Download Image

0 1 10 1K 0

马东锡 NLP @dongxi_nlp

2 months ago

Agentic 的文章，因为采用的post training recipe类似，方法同质化非常严重，容易带来审美疲劳，往往读完abstract 就不想往下读了。但依然有一些文章，在RL天然task-specific的限制下，可以让task本身超越benchmark，颇具美感。这几个月我个人最喜欢的几篇： Absolute Zero：…

8 30 236 24K 194

Nikhil Prakash @nikhil07prakash

2 months ago

How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it…

9 97 568 95K 618

Download Image

Andrew Piper @_akpiper

3 months ago

Amazing thread. So much to unpack but for starters: “After writing, only 17 % of ChatGPT users could quote their own sentences, versus 89 % in the brain-only group.”

Rohan Paul @rohanpaul_ai

3 months ago

Amazing thread. So much to unpack but for starters: “After writing, only 17 % of ChatGPT users could quote their own sentences, versus 89 % in the brain-only group.”

324 3K 12K 2.3M 10K

Download Image

1 6 36 4K 15

Isabel Papadimitriou @isabelpapad

3 months ago

Check out our ACL paper! We use shapley interactions to see which words (and phones) interact non-linearly -- what we lose when we assume linear relationships between features. Chat to Diganta in Vienna!

Naomi Saphra @nsaphra

3 months ago

1 4 42 7K 32

Download Image

0 7 36 3K 6

Andrew Piper @_akpiper

3 months ago

"Fiction-brained". This needs to be the next big cultural analytics project.

Roman Helmet Guy @romanhelmetguy

3 months ago

"Fiction-brained". This needs to be the next big cultural analytics project.

516 2K 19K 1.2M 5K

0 3 4 596 0

Omar Shaikh @oshaikh13

3 months ago

What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs? In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use. 🧵

18 93 357 62K 210

Download Video

Lindia Tjuatja @lltjuatja

3 months ago

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs: 🧵1/9

1 33 153 32K 66

Download Image

Jocelyn Shen @jocelynjshen

3 months ago

Happy to share our preprint "Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication" 🧵(1/9) 📄Paper: arxiv.org/pdf/2505.21451

1 4 31 4K 4

Download Image

Myra Cheng @ ACL2025 🎶 @chengmyra1

3 months ago

Do people actually like human-like LLMs? In our #ACL2025 paper HumT DumT, we find a kind of uncanny valley effect: users dislike LLM outputs that are *too human-like*. We thus develop methods to reduce human-likeness without sacrificing performance.

5 26 171 19K 73

Download Image

Andrew Piper @_akpiper

3 months ago

In case you're curious whether you can use LLMs (including small local ones) for narrative topic modeling. The answer is yes! LLMs perform on par or better than humans at the task.

1 4 19 1K 8

Download Image

Diyi Yang @Diyi_Yang

4 months ago

@ysu_nlp second! also to "human-AI interaction" 😃

1 1 26 2K 1

Kabir @kabirahuja004

5 months ago

📢 New Paper! Tired 😴 of reasoning benchmarks full of math & code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a story’s world 🌎 W/ @melaniesclar, and @tsvetshop 1/n

3 51 262 41K 126

Download Image

Dimitris Papailiopoulos @DimitrisPapail

4 months ago

Do you want to do RL for coding and agentic workflows? Do you want to do science, and figure out when RL kicks in? What is the right algorithm (it's not GRPO)? how much reasoning you need in your base (you def need some! but is it a lot or A LOT)? Do you want to figure out how…