tile-ai/TileOPs
High-performance LLM operator library built on TileLang.
First Claude commit: Mar 15, 2026Last Claude commit: 1mo agoDiscovered: Mar 16, 2026
Recent Claude Commits
[Perf] Add lru_cache to remaining 40 builder functions (#600)
1d558961mo agomessage_footer[Chore] Codify lru_cache builder rule into DEVELOPMENT.md and .claude/rules (#602)
417b2101mo agomessage_footer[Fix][Elementwise] Clamp scalar literals to dtype range in kernel layer (#598)
12fa73e1mo agomessage_footer[Perf] Add lru_cache to builder functions (gated_deltanet, flash_attn, flash_decode) (#597)
684143b1mo agomessage_footer[Refactor][LinearAttn] Rename files to match DEVELOPMENT.md convention (#592)
301b3b71mo agomessage_footer[Chore] Scaffold foundry overlay directory structure (#568)
59a199a1mo agomessage_footer[Fix][MoE] Lower small-batch expert threshold to prevent JIT hang (#564)
3fde98c1mo agomessage_footer[Doc] Add elementwise kernel performance checklist and evidence (#556)
d4852d21mo agomessage_footer[BugFix][CI] Fix warmup error handling, monkeypatch cleanup, and portable cache paths (#548)
27c66881mo agomessage_footer[Refactor][Elementwise] Extract shared _wrap_fp8_accumulation helper for fp8 kernels (#535)
449250a1mo agomessage_footer[Bench][Elementwise] Validate performance risk points and determine optimal defaults (#537)
21a85e61mo agomessage_footer[CI] Separate kernel compilation from profiling and parallelize compilation in nightly (#542)
7b943931mo agomessage_footer[Perf][Elementwise] Optimize binary max/min kernels to close bandwidth gap with PyTorch (#539)
81c0eaf1mo agomessage_footer[Feat][Reduce] Implement logical_reduce sub-category (any/all/count_nonzero) (#531)
fa298d41mo agomessage_footer[BugFix][Linear-Attn] Cleanup dead kernels and add custom_op wrappers (#476)
b53390e1mo agomessage_footer[Fix][MHC] Fix MHC pre auto-tuning: sigmoid serialization and scalar params (#520)
29008751mo agomessage_footer