Synthesizing Bijective String Transformers
Bidirectional transformers are groups of functions that convert between data formats. The most common bidirectional transformers are parsers and pretty printers, where a parser converts data from a string representation into a structured representation, and a pretty printer converts data from a structured representation back into a string representation. Other bidirectional transformers convert between data models and GUIs, database views and the underlying tables, and different string formats. Inversion guarantees are usually expected of bidirectional transformers; converting from one data representation and back should not alter the content of the data, beyond the modification of unimportant details like whitespace. Writing these functions in general purpose programming languages requires writing multiple functions and manual reasoning about the invertibility guarantees. Researchers have designed domain specific languages, like Boomerang and biXid, to express bidirectional transformations with a single term, while providing invertibility guarantees. However, coding in these languages requires learning a new paradigm, which has hindered their adoption. We aim to increase the accessibility of one of these languages, Boomerang, by synthesizing the bijective fragment of its programs. Boomerang is a bidirectional language for converting between string representations. The type of a Boomerang program is a pair of regular expressions that specify the format of the source and target data sources. Our program takes two regular expressions and a set of examples as input, and outputs a Boomerang program typed by the input regular expressions, that satisfies the examples. This is done using a method called “type-directed synthesis”, where the types inform the synthesizer how to efficiently search the space of possible programs. Existing work on type-directed synthesis operates on type systems with relatively few isomorphisms between the types. However, each regular expression is equivalent to an infinite number of regular expressions. Synthesis thus requires searching through the equivalent regular expression types as well as through possible terms of the types. Furthermore, the types for complicated data formats are much larger and more complex than the types used in existing work in type-directed synthesis, causing a combinatorial explosion when searching through terms. These issues are resolved through converting the types to a different language with fewer equivalences and treating the user defined types semi-opaquely. We evaluate our procedure on 25 examples taken from Augeas, a program for encoding bidirectional transformations of Linux configuration files, other papers, and microbenchmarks built to highlight the strengths and weaknesses of the program. Our synthesis algorithm is able to synthesize all of these programs in an average of less than half a second.