What Parsers Are They Using?

This is a quite boring post on programming language trivia, which doesn’t dig into anything deep.

GCC and Clang

The GCC is mostly implemented in C, and used to use Bison for parser generation, according to its Wikipedia page. By default, it generates right recursive table driven LALR parser.

Somehow according to the same page, GCC has switched to YACC before now switching to a hand-written recursive-descent parser for C/C++/Objective C. This could also be seen from GCC release notes 3.4 and 4.1.

Clang, as well as LLVM is implemented in C++. It also uses a unified recursive-descent parser for C, Objective C, C++ and Objective C++, according to the LLVM Clang Page. Both GCC and Clang now uses recursive-descent parser, claiming it provides with faster speed. On Clang page, it also states recursive-descent parser:

… makes it very easy for new developers to understand the code, it easily supports ad-hoc rules and other strange hacks required by C/C++, and makes it straight-forward to implement excellent diagnostics and error recovery.

Python

Here Python refers to the CPython implementation. In its repo under “Parser” directory, it could be seen CPython actually uses Zephyr ASDL for syntax description. Zyphyr ASDL is also described in its Princeton CS Dept. Page.

Python uses LL(1) grammar. Its AST file (Python-ast.c) isgenerated according to the ASDL description of the language. The detailed process is described in PEP339.

Ruby

Ruby MRI is implemented in C. According to Bison Wikipedia Page, Ruby also uses Bison for the parser generation, which should be a right-recursive parser.

The source code for parser and syntax could be found in its repo.

JavaScript

JavaScript v8 is implemented in C++. It claims to be using a hand-written top-down parser, according to this Quora post.

I have no time to dig in its code base at the moment. Also it’s not in the scope of this post.

Haskell

According to the Quora post above, GHC uses a generator called Happy. From its official website, it looks like it’s first created to generate parser specifically for GHC.

Its syntax is defined in its codebase directory “compiler/parser/Parser.y”.

Julia

Interestingly, after searching a while in its GitHub codebase, I found that Julia actually uses Scheme for its frontend and parser. Although most its other source files are in C.

It looks like it also uses a recursive-descent parser.

Golang

Golang parser is now implemented in Golang itself now. (Wow!) Not only that, most Golang implementation is in Go, according to its Github mirror repo. By the time I checked it, it contains 75.7% Golang, 19.4% C, 3.0% Assembly, and 1.9% other.

From its parser source code, it looks like it also uses a recursive-descent parser.