KAIST CS420: Compiler Design (2020 Spring)
Due to COVID-19, we're going to conduct online sessions.
Online sessions will be provided via this YouTube channel.
You're required to watch the video, and based on the contents, to solve pop quizzes that will be posted at gg.kaist.ac.kr. The details will be announced in the issue tracker, e.g., https://github.com/kaist-cp/cs420/issues/3
If it's difficult to understand English, please turn on the subtitles in the YouTube videos. Auto-transcribed subtitles will be shown.
Compilers bridge the gap between human and machine. Human wants to easily express complex idea. On the other hand, machine understands only a few words (instructions) to be efficiently implemented in silicon. Compilers transform programs from a form suitable for human to easily express complex idea, to a form suitable for machine to efficiently execute. Since the gap between human and machine is fundamentally wide, compilers have been constructed and widely used since the beginning of the history of computing. Even, the first practical compiler predates the first practical operating systems (according to Wikipedia)!
In response to industry shifts, new compilers should be written and written again. First, human wants to express more and more complex idea, especially in the era of artificial intelligence and big data. Second, machine changes in response to physics (e.g. the ending of Dennard scaling and Moore's law) and industrial needs (e.g. Internet of Things and distributed systems). New compilers should be constructed to close the new gap between changing human and changing machine. For this reason, industrial needs for (and salary of) compiler engineers have been constantly high.
In this class, we will learn how to construct a compiler by actually building one. You are going to benefit from the provided skeleton code of a clean slate educational compiler--dubbed KECC: KAIST Educational C Compiler (think: KENS for networking or Pintos, xv6 for operating systems). We are going to discuss parsing only briefly, because the topic is assumed to be dealt with in CS322: Formal Languages and Automata. (You don't need to know parsing to take this course, though.) We will focus on translation from human-friendly form to machine-friendly form, and compiler optimizations. Specifically, we will discuss (1) how to transform a C program to an SSA-based intermediate representation (IR); (2) how to perform register promotion, static single assignment, global value numbering, and register allocation optimizations on the IR; and (3) how to transform an IR program to a RISC-V assembly program. KECC will provide a significant amount of skeleton code so that you can focus on the topic of this course.
We will also briefly study the theory of compiler. We will focus on the correctness of compiler. In general, in what sense a compiler is correct, and how to prove it? Specifically, how to prove the correctness of KECC's transformations and optimizations? As it will turn out, this compiler correctness theory will greatly help you efficiently build your own compiler.
Slides. If you have any suggestions to improve the slide, please leave comments in the slide.
Make sure you're capable of using the following development tools:
IMPORTANT: you should not expose your work to others. In particular, you should not fork the upstream and push there. Please the following steps:
$ git clone --origin upstream https://cp-git.kaist.ac.kr/cs420/kecc-public.git $ cd kecc-public $ git remote -v upstream https://cp-git.kaist.ac.kr/cs420/kecc-public.git (fetch) upstream https://cp-git.kaist.ac.kr/cs420/kecc-public.git (push)
$ git fetch upstream $ git merge upstream/master
If you want to manage your development in a Git server, please create your own private repository.
$ git remote add origin ssh://[email protected]:9001//kecc-public.git $ git remote -v origin ssh://[email protected]:9001//kecc-public.git (fetch) origin ssh://[email protected]:9001//kecc-public.git (push) upstream https://cp-git.kaist.ac.kr/cs420/kecc-public.git (fetch) upstream https://cp-git.kaist.ac.kr/cs420/kecc-public.git (push)
$ git push -u origin master
Rust: as the language of homework implementation. We chose Rust because its ownership type system greatly simplifies the development of large-scale system software. If you want to "opt out", you can also use FFI and implement your compiler in C/C++.
We recommend you to read this page that describes how to study Rust.
Visual Studio Code (optional): for developing your homework. If you prefer other editors, you're good to go.
CodeLLDB Extensioninto the remote server, please follow the steps:
fail to create hard linkerror message, please follow the steps:
Connection timed outerror message, try again after a few minutes.
Commentsoptions in the filter menu at the top of the compiler window.
-O0 -Xclang -disable-O0-optnone -emit-llvmflags.
If you want, you'll be provided with a Linux server account. Please submit your SSH key here. You can connect to server by
ssh [email protected] -p10005, e.g.,
ssh [email protected] -p10005.
id_ed25519should be in
Host cs420 Hostname cp-service.kaist.ac.kr Port 10005 User s
Then you can connect to the server by
ssh cs420. + Now you can use it as a VSCode remote server as in the video.
It is strongly recommended that students already took courses on:
Without a proper understanding of these topics, you will likely struggle in this course.
Other recommendations which would help you in this course:
You will implement translations and optimizations on KECC. All homework submissions will be automatically graded online so that you can immediately see your score. If your compiler is correct and the generated assemblies perform comparably to those generated by
gcc -O1, you're going to get A+ (if not A#) even if you miss the final exam.
Since compiler construction requires nontrivial undertaking, you're encouraged to ask questions on the homework in the issue tracker at the early stage of the semester.
The exam will evaluate your understanding of compiler theory. There will not be a midterm exam.
You should submit a token to the Course Management website for each session. You should submit a token within 12 hours from the beginning of a session.
Course-related announcements and information will be posted on the website as well as on the GitHub issue tracker. You are expected to read all announcements within 24 hours of their being posted. It is highly recommended to watch the repository so that new announcements will automatically be delivered to you email address.
Ask your questions via email only if they are either confidential or personal. Otherwise, ask questions in this repository's issue tracker. Any questions failing to do so (e.g. email questions on course materials) will not be answered.
Emails to the instructor or TAs should begin with "CS420:" in the subject line, followed by a brief description of the purpose of your email. The content should at least contain your name and student number. Any emails failing to do so (e.g. emails without student number) will not be answered.