Princeton Programming Languages Group

Committee

Andrew Appel (advisor)
David Walker
David August

Title

Ergonomics and verification of a foreign function interface between Coq and C

Abstract

CertiCoq is a compiler from Coq to C that is verified in Coq. Thanks to the mechanically checked proof of compiler correctness, users can be sure that programs they prove correct in Coq's rich type system output the same results when compiled. However, in practice, large programs are rarely written in a single language. Different languages are often used for better performance, for code reuse, or for capabilities that the primary language lacks. In particular, Coq lacks a mechanism to perform input/output actions, thus CertiCoq-compiled code must interact with another language to achieve that. Namely, we want to allow Coq code to call C code and vice versa. But what happens to the correctness proofs when these two languages interact?

In my talk, I will present the foreign function interface (FFI) for CertiCoq and how it generates glue code in C that lets the client C program interact with the compiled Coq code. Glue functions can construct new Coq values by creating their C representations, inspect existing Coq values, and deal with higher-order Coq values. These helper functions are designed to be used easily even by C programmers who do not know how CertiCoq represents Coq values and closures in memory. These functions serve as building blocks for FFI code; if we provide proofs for the small blocks, users will have an easier time proving programs made out of these blocks correct. Towards that goal, I will explain how we can use the Verified Software Toolchain, a separation logic framework in Coq for verifying specifications of C functions, to prove that these glue functions respect the memory representation of Coq values.

Reading List

Papers

"No-longer-foreign: Teaching an ML compiler to speak C "natively"". Blume. 2001. (link)
"Checking type safety of foreign function calls". Furr, Foster. 2005. (link)
"Verifying Efficient Function Calls in CakeML". Owens, Norrish, Kumar, Myreen, Tan. 2017. (link)
"A modular foreign function interface". Yallop, Sheets, Madhavapeddy. 2017. (link)
"Iris from the ground up: A modular foundation for higher-order concurrent separation logic". Jung, Krebbers, Jourdan, Bizjak, Birkedal, Dreyer. 2018. (link)
"Modular Verification of Programs with Effects and Effect Handlers in Coq". Letan, Régis-Gianas, Chifflier, Hiet. 2018. (link)
"Oeuf: Minimizing the Coq Extraction TCB". Mullen, Pernsteiner, Wilcox, Tatlock, Grossman. 2018. (link)
"The MetaCoq project". Anand, Boulier, Cohen, Sozeau, Tabareau. 2019. (link)
"The next 700 compiler correctness theorems (functional pearl)". Patterson, Ahmed. 2019. (link)

Textbooks

Andrew W. Appel. Modern Compiler Implementation in ML. Cambridge University Press, 1998
- 13. Garbage Collection
- 15. Functional Programming Languages
- 16. Polymorphic Types
Benjamin C. Pierce. Types and Programming Languages. The MIT Press, 2002.
- Part I. Untyped Systems
- Part II. Simple Types
- Part IV. Recursive Types
- Part V. Polymorphism (Chapters 22-25)
Software Foundations
- Volume 3. Verified Functional Algorithms (link)
- Volume 5. Verifiable C (link)

Slides

They can be found here in PDF format.

Questions

(including follow-up questions)

(Appel) How do the two papers you read about foreign function interfaces represent C types in other languages? Especially how does Blume's paper embed C types in ML types? What is the relation of that to GADTs?
(Walker) Can you tell me about existential types and how you would add that to a language? What would you add to the grammar of terms? What kind of values does it have? What are the typing rules and small-step operational semantics rules for it?
(August) (Showed an example control flow graph) Convert this to static single assignment form. (Appel) Convert the control graph to a functional program, with mutually recursive functions, in the style of "SSA is Functional Programming". Can you remove unused arguments in some of these functions? How?
(Appel) How does the polymorphic mutable hash table you implemented know how to hash the key values it is given? How does garbage collection work for your mutable hash table? Especially for a generational garbage collector? Would a CertiCoq program ever have any pointers from an older generation to a newer generation? If not, when you put the mutable hash table in the CertiCoq heap, would that break that property? What would you do to fix that?
(Walker) Why would we want a nameless representations of variables? What are the downsides of a nameless representation? How does beta reduction and substitution work for that? What's an example De Bruijn term that would "capture" if you don't have the right offset?