Princeton Programming Languages Group

Committee

Aarti Gupta (advisor)
Andrew Appel
Mae Milano

Title

Equality saturation: Term extraction and an application to network synthesis

Abstract

Equality saturation is a recent technique that mitigates the phase ordering problem in compilers. Instead of performing destructive rewrite rules over the input program, equality saturation memorizes all the equivalent representations compactly in a data structure called e-graph (equality graphs) and executes multiple rewrite rules simultaneously over e-graphs.

Designing an efficient algorithm for term extraction from e-graphs is challenging because the candidate space can be exponentially large and candidate terms can be cyclic. We present a Weighted Partial MaxSAT encoding of the extraction problem by identifying cycles in the e-graph and encoding acyclic constraints with Tseitin transformation to avoid extracting cyclic terms. Our evaluation of real-world workload shows a ~3x speed-up in comparison with extraction with the ILP encoding. Next, we present a case study of applying equality saturation to resource synthesis for packet programs. We propose CatsTail, a packet program compilation pipeline that leverages equality saturation to find feasible ALU mappings to reconfigurable switches. We benchmarked CatsTail using real-world switch programs and compared its synthesis performance with a previous work CaT. CatsTail shows comparable stage utilization optimality while being up to two orders of magnitude faster than CaT in resource synthesis for real-world P4 programs.

Finally, if time permits, we will talk about an ongoing project of automated invariant synthesis for distributed systems.

Reading List

Textbooks

Software Foundations, Benjamin C. Pierce. University of Pennsylvania, 2010
- Volume 1, Chapters 1~11
- Volume 2, Chapters 1~11
Logic in Computer Science, Michael Huth and Mark Ryan. Cambridge University Press, 2004
- Chapters 1, 2, 3.1-3.5, 4, 6.1

Papers

Equality Saturation and its Applications

Willsey, Max, Chandrakana, Nandi, Yisu Remy, Wang, Oliver, Flatt, Zachary, Tatlock, Pavel, Panchekha. "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages 5. POPL(2021): 1–29. (link)
Smith, Gus Henry, Andrew, Liu, Steven, Lyubomirsky, Scott, Davidson, Joseph, McMahan, Michael, Taylor, Luis, Ceze, Zachary, Tatlock. "Pure tensor program rewriting via access patterns (representation pearl)." Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming. ACM, 2021. (link)
Gao, Xiangyu, Divya, Raghunathan, Ruĳie, Fang, Tao, Wang, Xiaotong, Zhu, Anirudh, Sivaraman, Srinivas, Narayana, Aarti, Gupta. "CaT: A Solver-Aided Compiler for Packet-Processing Pipelines." Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. Association for Computing Machinery, 2023. (link)
Yifan Li, Jiaqi Gao, Ennan Zhai, Mengqi Liu, Kun Liu, Hongqiang Harry Liu. "Cetus: Releasing P4 Programmers from the Chore of Trial and Error Compiling." 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, 2022. (link)

Verification for Distributed Systems

Alur, Rajeev, Rastislav, Bodik, Garvit, Juniwal, Milo M. K., Martin, Mukund, Raghothaman, Sanjit A., Seshia, Rishabh, Singh, Armando, Solar-Lezama, Emina, Torlak, Abhishek, Udupa. "Syntax-guided synthesis." 2013 Formal Methods in Computer-Aided Design. 2013. (link)
Yao, Jianan, Tao, Runzhou, Gu, Ronghui, Nieh, Jason. "DuoAI: Fast, Automated Inference of Inductive Invariants for Verifying Distributed Protocols." 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, 2022. (link)
Mora, Federico, Ankush, Desai, Elizabeth, Polgreen, Sanjit A., Seshia. "Message Chains for Distributed System Verification". Proc. ACM Program. Lang. 7. OOPSLA2(2023). (link)
Wilcox, James R., Doug, Woos, Pavel, Panchekha, Zachary, Tatlock, Xi, Wang, Michael D., Ernst, Thomas, Anderson. "Verdi: a framework for implementing and formally verifying distributed systems." Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery, 2015. (link)

Formal Methods Background

Jhala, Ranjit, Rupak, Majumdar. "Software model checking". ACM Comput. Surv. 41. 4(2009). (link)
Biere, Armin, Heule, Marijn, van Maaren, Hans and Walsch, Toby. "Handbook of Satisfiability". IOS Press, 2008. Chapter 12. (link)

Slides

They can be found here in PDF format.

Questions

(including follow-up questions)

(Milano) What are the differences between small-step and big-step semantics? When do you prefer one over the other and why? (I first answered that big-step cannot model diverging programs but I was wrong; Andrew corrected me, and then I gave an example about preferring big-step over small-step when reasoning the soundness of Hoare logic; but ultimately, they wanted the answer that big-step semantics cannot differentiate non-terminating programs)
(Appel) Prove the following program correct using Hoare logic.
```
while (x != y) {
  x = x + 2;
  y = y + 1;
}
```
(I first asked about what property are we proving about since there is no pre- and post-conditions annotated; Andrew said pick anything you would like to verify, and I chose x = y, which was a bit trivial but OK)
(Appel) What is the invariant of the while loop? (I gave x < y first but later realized it should be x <= y since otherwise the propagated post-condition of the while loop has a contradiction).
(Gupta) How can you use SMT solvers to prove Hoare triples for a program? Give an example tool that implements the algorithm. (I talked about weakest pre-conditions and Dafny)
(Gupta) What are the queries made to the SMT solver called? (I answered Verification Conditions)
Write down the VCs for a while loop. (I made a mistake when writing the VC, Q /\ not c ⇒ I, which should have been I /\ not c ⇒ Q, and corrected following Aarti’s hints)
(Milano) If one does not annotate loop invariants, what approaches can help finding them? (I thought the question was about inferring loop invariants so I answered strongest post-conditions and logical abduction, which was not on my reading list; Aarti gave me hints that it’s about synthesis so then I answered we could use syntax-guided synthesis)
(Milano) How does Verdi verify a distributed system implementation? What kind of invariants can it support? (I answered Verdi verifies trace invariants and checks the inductiveness of the invariants)
(Milano) How does Verdi verify trace invariants? (I answered proved in Coq)
(Milano) What are Verified System Transformers, what do they do? (I answered to give different network semantics, e.g. packet reordering, node failure, to systems which can be separated from the core implementations)
(Appel drew this diagram on the board)

Suppose this is a controller system, where z and s are two sensors and m is a pusher machine. z detects whether the arm of the pusher machine is fully retracted back to m and s detects whether there is a block on the top of the plane. Let variables l and r denote the arm is retracting (moving toward left) and expanding (moving toward right) respectively. The block will be pushed away from the machine towards an edge of the plane and can fall off. Write down LTL formulae that describe the property of the environment and then of the system. (I started with a simple one which says that if the arm continuously moving toward right, then the block eventually falls off the edge)
(Appel) What are the properties of the system? (I stated them informally and then wrote the formula: if there is something on s then the arm will continuously move towards right; if there’s nothing, then it will retract the arm; finally if s continuously detects the block, then the block eventually falls off the edge. When writing down the formulae I forgot to include z and corrected it afterwards)