Balder's professional life

I am currently a software engineer at Google, where I help improve the search engine using machine learning techniques.

 

Before joining Google in 2015, I was an associate adjunct professor at UC Santa Cruz and a computer scientist at LogicBlox. I have also held short term visiting/adjunct positions at Stanford (2013 and 2019) and INRIA / ENS Cachan (2009). My alma mater is the University of Amsterdam, where I got my PhD in 2005.

 

Although most of my work at Google has been more applied, and I stopped advising graduate students in 2017 when my last student graduated, I still enjoy occasionally teaching and publishing in academic venues. 

 

Research Profile

 

Most of my academic research over the last 15 years has been on the intersection of logic and computer science and focused on applications in data management, knowledge representation, and computational learning theory. I describe some of the high level themes in my research below (in, roughly, chronological order). 

 

For a complete publication list see DBLP as well as Google Scholar.

My current research interests lie very much at the interface of logic (more generally, declarative specification languages and structured data) and machine learning. Both at a foundational level (logic as a framework for studying algorithmic learnability and for understanding the success of deep learning methods) and at a more applied level (for instance, techniques for leveraging structured data and declarative representations in deep learning). 

Extended modal languages and well-behaved fragments of first-order logic

My 2005 PhD thesis “Model Theory for Extended Modal Languages” systematically mapped out the model theory, as well as proof theory and algorithmic complexity aspects, for a variety of extended modal languages, including the family of “hybrid logics”. It has become (together also with the 2007 handbook chapter that I co-wrote with Carlos Areces) a well-cited reference on hybrid logics. In later years, I worked extensively on “guarded-negation logics”. Guarded-negation first-order logic, first introduced in a joint paper with Vince Barany and Luc Segoufin (ICALP 2011) has become a well studied fragment of first-order logic that generalizes multiple previously known decidable fragments, and is well-behaved both algorithmically and model theoretically. In a series of follow-up papers (AIML 2012, VLDB 2012, MFCS 2013, LMCS 2013, LICS 2014, LICS 2015 x2, JACM 2015, ToCL 2016, JSL 2018) with a variety of other co-authors, we further studied guarded negation logic and its fixed-point extensions, mapping out their finite model theory and establishing the decidability and pinpointing the computational complexity of various problems such as effective preservation theorems, Craig interpolation, fixed-point boundedness, first-order definability of certain-answers, etc. Many decidable logical formalisms admit translations into guarded-negation logics, and as such, they have become a convenient framework proving decidability results and complexity upper bounds.

 

XML query languages; automata and logics for finite trees 

Back in the heyday of XML, I published a series of papers with different co-authors, solving a number of problems regarding the computational complexity of static analysis tasks for the XPath navigational languages, sound and complete rewriting systems for query optimization, and expressive power (PODS 2006, PODS 2007, ICDT 2007, SIGMOD Record 2007, PODS 2008, TCS 2009, JACM 2009, JACM 2010). In joint work with Luc Segoufin (JACM 2010), motivated by questions about XPath, we solved a long-standing open problem in automata theory, namely the strict containment of monadic transitive-closure logic in monadic second-order logic on finite trees (through a novel, automata-theoretic characterization of monadic transitive-closure logic). Other papers in the general area of logics and automata for finite trees include LFCS 2009, FICS 2009, CSL 2009, IPL 2009, FOSSACS 2010, MFCS 2011, LMCS 2012.

Knowledge representation and reasoning

Over the years, I have published a number of papers in the area of knowledge representation (KR 2002, KR 2006, IJCAI 2011, PODS 2013, JAIR 2013, TODS 2014, PODS 2015, DL 2020 Keynote). In one paper that I am particular fond of (PODS 2013), we studied the data complexity of ontology-mediated database queries, with respect to a wide variety of ontology languages, and we obtained decision procedures for first-order rewritability and Datalog-rewritability, by establishing a surprising, intimate connection with core concepts and open problems in the area of constraint satisfaction.

 

Data interoperability and schema mappings

From 2008 onwards, I have done extensive research on schema mappings and their role as a declarative specification language for data exchange and data integration (ICDT 2009, VLDB 2009, CP 2010, CACM 2010  Research Highlights, SIGMOD 2011, VLDB 2011, ICDT 2012 Best Paper Award, EDBT 2013, TODS 2013, RR2014, AMW 2015, JDS 2016, TODS 2017, PODS 2018). Most of this focused on structural characterizations of schema mapping languages, algorithms for computing efficient solutions in data exchange, and example-driven approaches to schema mapping specification. In EDBT 2013 I gave a tutorial on “schema mappings and data examples” together with Phokion Kolaitis and Wang-Chiew Tan.

 

Declarative languages for data analysis and machine learning.

And, conversely, techniques for learning declarative specification.

In several papers (SIGMOD 2015, ICDT 2016, TODS 2017, ICDT 2019), we presented new languages as well as techniques and systems, that follow a declarative, rule-based approach towards the specification of data analysis tasks including text analytics, machine learning, and probabilistic programming. While part of this work is focused more on practical applications, some of the more technical results draw on techniques from finite model theory and descriptive complexity theory. Conversely, I have also studied a number of foundational problems in the area of machine learning for learning declarative specifications (CP 2010, SIGMOD 2011, TODS 2011, VLDB 2011, ICDT 2012 Best Paper Award, TODS 2013, EDBT 2013 Tutorial, AMW 2015, ICDT 2015, TODS 2015, TODS 2017, PODS 2018, DL 2020 Keynote, ICDT 2021). These papers range from foundational to experimental. On the more foundational side, we showed that certain core problems in this area (e.g., polynomial-time exact learnability and the existence of small uniquely characterizing sets of examples) have an intimate connection to techniques and problems studied in finite model theory and constraint satisfaction.