Explore the Creation of Programming Language — Goby ❤

By ・Sep 27 2017・技術文章

Explore the Creation of Programming Language — Goby ❤

Goby — New Programming Language Created by Stan Lo for Micro-services

Brief Introduction

Goby — a brand new programming language created by Stan Lo, inspired by Ruby’s beautiful syntax style and implemented by Go programming language. It has several special features such as built-in HTTP server, threads and channels are handled by Go and its Plugin System can let you access and control Go object via Goby. You can use Go package in your Goby project!

Moreover, because Goby is based on Ruby’s syntax style (although Goby is not 100% compatible to Ruby), in most cases, you can simply paste and run your Ruby code in Goby project! I’m a contributor of Goby and you can click here to see the sample site which its backend is entirely using Goby code, it just looks similar to the Sinatra in Ruby!

I’m From Fancy Front-End World, But…

How do I contribute to a project which builds a new programming language?

Initially, I think I’m absolutely crazy even insane that I join the project without having any knowledge of the infrastructure of the programming language. However, at that time, somehow I feel that I wanted to explore a new field in computer science. I always think of the ones who creates a new language as if a ruler who have a mighty magical wand that defined a new standard for creating interesting softwares.

One day, when the creator of Goby presented his project, I have a strong feeling to join the project as a contributor. However, in the mean while, I was so nervous cause I know nothing and dived into the new world entirely about creating a new programming language. This adds on more excitement during the journey of contribution. Actually, learning the basic structure of language is actually quite simple and easy to understand. (Well, implementation is another story ~)

Infrastructure of Programming Language

Understanding the Know How About What You’ve Already Know

Just think about a simple example even it’s not a programming language, it just pure human language. Let’s ask a simple question: How do we know the structure of the sentence in English grammar?

Sentence: "Alexius dreams to become a DJ."

Follow your intuition, first thing you’ll do is break up the sentence into a collection of words.

Words: ["Alexius", "dreams", "to", "become", "a", "DJ", "."]

Tokenizing (Lexing)

Now, in order to understand the structure of the sentence, you need to know the meaning and the type of the word. For instance, an apple is a noun (type) which is a kind of round fruit of a tree of the rose family — typically has thin green or red skin and crisp flesh. (Wow, never thought that the explanation of apple in dictionary is very meticulous.)

Anyway, the process about labelling the type of the word is called tokenizing. (Basically, there is also a term called lexing which is almost the same as tokenizing, but in this case, we can view them as synonym) We convert each plain words into series of meaningful tokens one by one.

The first word "Alexius" didn’t match any kinds of verbs, adjectives or prepositions, so we assign its type as a noun.

{ value: "Alexius", type: "noun" }

The second word "dreams" is a kind of verb, so we can represent as the token below.

{ value: "dreams", type: "verb" }

Actually, the word “dreams” can also be a noun, but in this case I want to simplify the example in order to let you easily understanding the notion of processing programming language. So, let’s view the word "dreams" as a type of verb token.

The third word — to, is kind of interesting, because it can be a type of infinitive if the following word is a kind of verb or preposition in English grammar. For now, we can just assign it as so-called an identifier in this case (just like the keywords in programming language).

{ value: "to", type: "identifier" }

And iterate through rest of the words, I assumed that we can get the results which is presented below.

Tokenized Result:
[
  { value: "Alexius", type: "noun"       },
  { value: "dreams",  type: "verb"       },
  { value: "to",      type: "identifier" },
  { value: "become",  type: "verb"       },
  { value: "a",       type: "article"    },
  { value: "DJ",      type: "noun"       },
  { value: ".",       type: "period"     }
]

Parsing with LALR Algorithm

The next step is to analyze and process these tokens, this step in Computer Science is basically called parsing. With the use of the algorithm called LALR (Look-Ahead Left Reversed Rightmost Derivation), we process the tokens from left to right and in the meanwhile, we need to “look-ahead” in order to match the English grammar rule if necessary.

In order to illustrate this concept, we begin from the first token pushes into a “grammar rule stack”, or you can say shift a token into the grammar stack.

[ "Alexius" ] "dreams" "to" "become" "a" "DJ" "."
[  (noun)   ] ACTION: SHIFT

Hmmm… It’s a noun and we know that “Alexius” might be the subject of the sentence in grammar rule. Let’s shift in the next token!

[ "Alexius" "dreams" ] "to" "become" "a" "DJ" "."
[  (noun)    (verb)  ] ACTION: SHIFT

Current token is the word “dreams”, in this case we can “look-ahead” and get to match to the basic grammar in English, which is just Subject(a noun) + Verb combination and simplify the grammar rule. This action is called reduce.

[ "Alexius dreams" ] "to" "become" "a" "DJ" "."
[  (Sub. + Verb)   ] ACTION: REDUCE

Then we shall push the next token and we get the result.

[ "Alexius dreams"     "to"     ] "become" "a" "DJ" "."
[  (Sub. + Verb)   (identifier) ] ACTION: SHIFT

Because this time we looked-ahead but didn’t know which kind of the word “to” belongs to, it can be a kind of preposition or an infinitive. So in this case the grammar rule stack maintains a stable state. Then we need to shift the next token in!

[ "Alexius dreams"     "to"     "become"] "a" "DJ" "."
[  (Sub. + Verb)   (identifier)  (verb) ] ACTION: SHIFT

Look-ahead from current token and then we found out that the phrase “to become” matches a combination in English grammar, which represents the form: to + verb, namely, infinitive. The phrase means a kind of planning in the future. (E.g. I plan to do sth in the future.) So we can reduce the “to become” combination as a infinitive in grammar rule stack.

[ "Alexius dreams" "to become"  ] "a" "DJ" "."
[  (Sub. + Verb)   (Infinitive) ] ACTION: REDUCE

Look-ahead again, we can now combine the Subject + Verb rule with the infinitive rule according to this grammar rule, which construct a complete structure of the sentence. The action — reduce triggers again.

[ "Alexius dreams to become" ] "a" "DJ" "."
[ (Sub. + Verb + Infinitive) ] ACTION: REDUCE

However, the structure of the sentence haven’t done yet, we need to push the next token and we get the article — the word “a”.

[ "Alexius dreams to become"    "a"    ] "DJ" "."
[ (Sub. + Verb + Infinitive) (article) ] ACTION: SHIFT

We shift the next token because the article cannot be standalone and reduce with the previous sentence.

[ "Alexius dreams to become"    "a"     "DJ"  ] "."
[ (Sub. + Verb + Infinitive) (article) (noun) ] ACTION: SHIFT

Then we can match the phrase — “a DJ” to the grammar rule when we look-ahead, in this case, it is a kind of indefinite article.

[ "Alexius dreams to become"        "a DJ"       ] "."
[ (Sub. + Verb + Infinitive) (indefinite article)] ACTION: REDUCE

Then we can again reduce the first grammar structure and the second grammar structure, merge into a complete grammar rule.

[  "Alexius dreams to become a DJ" ] "."
[ (Sub. + Verb + Infinitive + Obj.)] ACTION: REDUCE

According to the grammar rule, we can reduce the last indefinite article into the object because of its position is at the end of the sentence. Finally, we pushes the last token.

[  "Alexius dreams to become a DJ"     "."   ]
[ (Sub. + Verb + Infinitive + Obj.) (period) ] ACTION: SHIFT

The token “period” denotes the end of the sentence, so we perform the reduce action.

[   "Alexius dreams to become a DJ."    ]
[ (Sub. + Verb + Infinitive + Obj. + .) ] ACTION: REDUCE

Finally, we derived our sentence structure using the LALR algorithm according to the English grammar rules. You can conclude that the LALR algorithm mainly does these things:

Look-Ahead — Try to look-ahead of the tokens in order to match the grammar rules
Left — It means the parser will parse tokens from left to right
Rightmost Reversed Derivation — The parser will keep using the shift-reduce technique: shifting tokens to the right and reduce in the reversed order to derive the result.

And actually, this is also how Ruby parses the code. Cool, right?

AST (Abstract Syntax Tree)

Parser will not only parsing the result but also construct something tree-structured data called the AST. For example, to parse a simple expression presented below.

a += 1 - 2 * (3 + 4)

We can rewrite as below.

a = a + 1 - 2 * (3 + 4)

The expression in parentheses got the highest precedence. Then the multiplication/division got the secondary precedence. The addition/subtraction got the third precedence and at last, the assignment got the lowest precedence.

The following illustration describes how AST constructed step by step.

I took a lot effort on visualizing this process, somehow it looks quite harmonic XD — Created by Jun-Xin Huang (Maxwell)

When the abstract syntax tree being construct complete, we can then traverse through the tree and get the result of the expression!

Wrap it Up!

During the time when I contribute to the Goby project, I’ve learned a lot and feel even more powerful about knowing how the programming language being created! It’s cool and I’d never learned this kind of knowledges before. From tokenizing to parsing with a simple introduction to AST, aside from the process mentioned above, there’s still more knowledge that I learned from this project, such as the notion of VM and byte code processing…etc.

There’s a lot of details covered by this book called — Ruby Under a Microscope. It illustrates the way how Ruby process the codes. Goby implement in a similar way in Golang and process codes which is almost 100% to Ruby syntax.

Ruby Under a Microscope

A great appreciation to the creator of Goby — Stan Lo, who guided me through this fantastic journey (although I have a lot of concept about compiler still waited for me to understand), and thanks to the other Goby contributors that make this project better. I never thought that my first open source contribution experience would be step into this mysterious field and accomplished more than I thought!

Recommended Article: Orbital Motion Image Gallery Using JS & JQuery

快速找原始碼的好幫手：The Silver Searcher

踏入新的程式語言的世界 — Goby ❤