Why "LLM for Compiler" Is a Reasonable Research Question
May 27, 2025
This document captures Genghan’s personal ideas on compiler construction in an era shaped by rapidly advancing AI agents. The ideas are clearly rooted in the community Genghan is part of. It may also stand as a small historical record—should future readers ever look back at these ancient programmers—to understand their ambitions, their unease, and their determination. It preserves the tension between anxiety and excitement, confidence and confusion, hope and doubt. And perhaps, with hindsight, it may reveal how their pathetic naivety contributed to whatever future followed.
2025/05/28
Large Language Models (LLMs) are statistical systems trained to compress and reconstruct knowledge drawn from vast corpora of human-created data. In that sense, they implicitly store and recombine known information. Programming languages—whether general-purpose, domain-specific (DSLs), or architecture-specific (ASPLs)—are often carefully crafted abstractions that build upon decades of existing language design. Their syntax and semantics tend to evolve incrementally, shaped by the cognitive constraints of human users: a language too difficult to learn simply won’t gain traction.
This evolutionary pattern implies that many programming abstractions have a structural similarity to one another. Since LLMs are trained on large amounts of code and documentation, they are well-positioned to recognize and recombine these shared abstractions. With well-chosen in-context examples, LLMs can be prompted to generate valid and even semantically meaningful programs by mimicing the structure of previously seen patterns.
In parallel, compiler research is often seen as a field that advances slowly. A common view is that making meaningful progress in compiler design requires a deep accumulation of systems knowledge, domain-specific constraints, and algorithmic insights. This high barrier to entry can lead to a conservatism in innovation—new techniques are often minor extensions of established ones.
That’s precisely why LLMs offer an intriguing angle: if much of compiler innovation involves recombining deep but well-established patterns, then a statistical tool like an LLM—trained on the output of expert human effort—might be able to surface novel combinations or optimizations that are still grounded in known techniques but not previously explored.
In this view, prompting an LLM with expert-curated examples is a form of injecting inductive bias—we guide the model toward the known-good regions of the solution space. However, the innovation happens when the model deviates from that expert knowledge and finds non-obvious solutions—combinations or transformations that haven’t been explicitly taught, yet still perform well. In this setting, rewarding outputs (e.g., correctness, performance, or simplicity) rather than mimicing processes may be a better optimization strategy.
2026/01/31
Moreover, DSL is important for validating the programs generated by “vibe coding”. Humans have limited reasoning steps and DSL abstracts the details so that the programs can be easily understood. Another interesting direction is to let coding agents build DSL, which might make the codebase more manageable.
2026/02/11
As coding agents become more stable and capable than junior programmers, compiler development will demand far more thorough testing. The paradigm where researchers propose a programming model and validate it on only a handful of workloads will become less relevant. Instead, we can expect production-ready compilers to emerge even from small research teams.
Scientific discovery is limited not only by the data we collect, but by how quickly humans can interpret it and decide what to do next. If agents can absorb results and propose sensible next steps at human-level quality—but far faster—then discovery could accelerate dramatically during the “experience accumulation” phase. What remains unclear is whether simply accumulating more experience is enough to trigger paradigm shifts. Put differently: as AI systems generate and refine knowledge faster than human institutions can keep up, how can individuals still understand, critique, and ultimately trust what those systems produce?
We’ve already seen this cycle repeat: an AI system reproduces what experts can do, people claim experts will be displaced, experts identify flaws, and the next model release fixes many of them. The more interesting challenge for experts could be to think bigger, to pursue ambitious projects that previously felt out of reach. As I always believe: every life is a miracle and no entity can measure or optimize away human values.
2026/02/13
Human society runs on finite energy. Time isn’t shareable. When something new rises, something else often fades. It can begin quietly: thirty minutes spent with AI instead of reading a compiler paper. Over time, fewer newcomers enter a field. Eventually, a subarea can vanish—not because it was solved, but because no one chose to continue it.
Trust does not stem from the code generator—whether human or agent—but from the reviewers who approve the code on behalf of the organization. The real issue, therefore, is not the end user’s trust in the software, but the level of confidence reviewers have in the generated code before it is endorsed and integrated. Moreover, there is a distinction between committing and initiating code.
2026/02/14
The question is not “it’s all about software”. Instead, it’s abot the gap between system integration and isolated tests. Mappings from real workload parallelization strategies to hardware primitives often exist only in architects’ mental models. Those assumptions can easily drift from actual behavior in the full system.
2026/02/17
Are we preparing for “intelligence outage” caused by outage of any physical resource in the future? Community leaders’ opinions can also influence the adoption of AI agents in a field. AI could help humans to evolve. By automating routine cognitive work, it frees humans from getting lost in implementation details and gives humans more space to focus on deeper questions. Meanwhile, such deep thinking is not metaphysical. Instead, it is grounded in practice and reality. In that sense, AI does not replace human thought; it elevates it. It creates room for reflection and self-examination.
2026/02/21
Chris Latter makes a good point for the unique values of compiler construction in “What are Compilers? Why do they matter as an AI Benchmark?”. It is clear that compiler engineers will design systems and coding agents will be in charge of implementation. Will there be a “language” that compiler engineers will use to describle the recipes for compiler consturction? It can be a norm (like the general principles people follow to write essays). Compiler construction is an example of strategic exploration. In strategic exploration tasks, the final criteria is not binary, but continuous and multidimensional. The criteria might not even be quantitive. For example, insights could also be the outcome of strategic exploration. However, that doesn’t mean the criteria is subjective. Instead, the goal is objective such as generating faster kernels for a set of PyTorch programs.
There can be different levels of compiler construction. L0: code completion as other software engineering; L1: improve programs compiled by existing compilers (kernel optimization; phase order optimization; etc.); L2: implement documented new concepts to achieve the human-defined goal (new intermediate representations, new optimization models, new ways of structuring programs and hardware interaction); L3: invent and implement new concepts to achieve the human-defined goal