About objc_direct, a thread. I should have probably anticipated that people would raise eyebrows and spent more time explaining the point in the LLVM commit, so here it is...
About objc_direct, a thread. I should have probably anticipated that people would raise eyebrows and spent more time explaining the point in the LLVM commit, so here it is...
The Obj-C dynamic dispatch comes with many costs, this is common "knowledge". However the details of it are rarely known. Beside the obvious cost of the h-lookup, it comes with 4 other kinds of costs: - codegen size - optimization barrier - static metadate - runtime metadata
(1) Codegen size: In addition to `self`, `_cmd` is passed to objc_msgSend to be able to lookup your IMP. A selector is loaded from a GOT-like slot, called a selref, which in arm64 generates assembly akin to: adrp x1, mySelector@PAGE ldr x1, [x1, mySelector@PAGEOFF]
This is 8 bytes that you pay at every call site. The number of calls to objc_msgSend is large enough that these 8 useless bytes add up. For example, in CloudKit, these 2 instructions represent 10.7% of __text. This is fairly typical of Objective-C heavy code.
(2) Optimization barrier: Swizzling is a powerful tool, but it requires huge guarantees from the compiler. People will call out "inlining" as the lost optimization, but it's way worse. Even without inlining, the optimizer cannot be allowed to know that a trivial readonly...
... integer property won't as a side effect release `self`, which in turn causes ARC to insert superfluous objc_retain()/objc_release() calls around the property access. Even without inlining, in an LTO world, ARC has the opportunity to be smarter because it sees more.
@pedantcoder You say "optimization barrier”, I hear “modularity enforcer" and “separate-compilation enabler”.