Xinting Huang @timhuangxt

Senior Researcher @TencentGlobal, working on LLMs. Ph.D. at @UniMelb; Ex @BytedanceTalk, @MSFTResearch timhuang1.github.io Melbourne, Australia Joined June 2016

Tweets

12
Followers

140
Following

344
Likes

30

Longyue Wang @ACL2025 @wangly0229

5 months ago

🌺GPT-4o’s image generation is stunning — but how well does it handle complex scenarios? 🤔 We introduce 🚀CIGEVAL🚀, a novel method to evaluate models' capabilities in Conditional Image Generation 🖼️➕🖼️🟰🖼️. Find out how top models perform when conditions get truly…

2 22 50 4K 20

Download Image

Xinting Huang @timhuangxt

9 months ago

These findings resonate with my impressions. AFAIC, structured prompting outperforms CoT & ICL by steering LLMs through workflows. Great to see this ‘rebuttal’ backed by such rigorous analysis — reminds me of the insights in LLMs Cannot Self-Correct. We need more like this!

Philipp Schmid @_philschmid

9 months ago

10 38 312 34K 237

Download Image

0 0 0 206 0

Xinting Huang @timhuangxt

11 months ago

Exciting to see our old friend continuing to push the real-world boundaries of LLM applications (shoutout to MT here)!

Longyue Wang @ACL2025 @wangly0229

11 months ago

Exciting to see our old friend continuing to push the real-world boundaries of LLM applications (shoutout to MT here)!

0 9 21 1K 3

Download Image

0 0 0 91 0

AK @_akhaliq

a year ago

To Code, or Not To Code? Exploring Impact of Code in Pre-training discuss: huggingface.co/papers/2408.10… Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been…

6 66 304 30K 137

Download Image

Longyue Wang @ACL2025 @wangly0229

a year ago

🚀Check out VideoVista, a comprehensive video-LMMs evaluation benchmark! videovista.github.io 🚀 Dive into our leaderboard: - 📊 Evaluating 33 Video-LMMs across 27 tasks; - 🥉 The latest GPT-4o-Mini clinches 3rd place; - 🏆 InternLM-XComposer-2.5 emerges as the…

Yunxin Li @LyxTg

a year ago

3 13 26 5K 6

Download Image

0 3 7 1K 1

Xinting Huang @timhuangxt

a year ago

Open-sourced Multimodal models -- fascinating Open-sourced MOE models -- fascinating Open-sourced Multimodal MOE models -- WOW! check this out 👇

Longyue Wang @ACL2025 @wangly0229

a year ago

Open-sourced Multimodal models -- fascinating Open-sourced MOE models -- fascinating Open-sourced Multimodal MOE models -- WOW! check this out 👇

4 44 145 17K 62

Download Image

0 0 0 61 0

Longyue Wang @ACL2025 @wangly0229

2 years ago

🚀 A game-changer benchmark: LLM-Uncertainty-Bench 🌟 📚 We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation. 💡 Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5…

7 175 250 33K 162

Download Image

AK @_akhaliq

2 years ago

FuseChat Knowledge Fusion of Chat Models While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative…