diessi.caBlog
June 20, 2018

Computer and Human Languages

Basic knowledge of computer languages (mainly JavaScript) is assumed.

Some time ago, a friend of mine asked how skilled I’d say I was with JavaScript. I told him I was fluent.

– Fluent? What does it mean to be fluent?

– Well, think Portuguese. – We both speak Portuguese — I don’t know every word, but I do know the language enough to express anything I need somehow.

It turned out that “fluent” was just the perfect word.


Fluency is a powerful and straightforward concept to express your ability with a language. When it comes to human languages, it’s the ability to fully communicate your intentions to a human; for computer languages, the ability to fully communicate your intentions to a computer.

That line of thought is shared in one of the most important and revolutionary introductory books to Computer Science. In Structure and Interpretation of Computer Programs, right before introducing Lisp (the programming language used throughout the book) a similar analogy is made:

Just as our everyday thoughts are usually expressed in our natural language (such as English, French, or Japanese), and descriptions of quantitative phenomena are expressed with mathematical notations, our procedural thoughts will be expressed in Lisp. 1

For both kinds of languages, the levels are the same: one cannot communicate; can poorly do it (beginner); can do it (native/fluent), or can brilliantly do it (think book authors, public speakers, and Software Engineers in a senior level).

Why do they match so well?

They match because the linguistics of computer and natural languages wonderfully intersect. Both consist of syntax, semantics and pragmatics functions, found in the core of theoretical linguistics.

In their ecosystem, we can also find syntax trees and different style guides.

Syntax

Syntax is the set of rules that govern how language elements are combined to form a valid expression.

In English, the basic rule is that elements like nouns, verbs and punctuation marks must follow the subject-verb-object order to produce a valid sentence. In JavaScript, a programming language, to produce a valid conditional statement, elements such as if, parenthesis, braces, comparison operators (===, <= ) must be combined in a certain order too.

In the same way that “Do they car a put?” (??) is not valid English, the following code is not valid JavaScript:

if {1 === 2} (
  return true
)
// => SyntaxError: missing ( before condition

add x y = x + y  // u can't use elm's lambda syntax on js, no

And just like a pedant friend would do when you write shitty English, the computer will throw a syntax error when running your program. Forgot closing a brace? Wrote an if statement condition using curly braces instead of parenthesis? Well, SyntaxError!

This, however, is valid JavaScript:

if (1 === 2) {
    return true
} // "if (1 === 2) return true" is also fine!

const add = (x, y) => x + y

In HTML there are syntax rules too. Adding a div inside ul, for instance, is invalid. An unordered list in HTML (ul) only allows list items as children, such as li – and div is not one.

Languages have different degrees of syntax strictness, which is what usually makes some harder to grasp than others. Think the German language – German imposes very strict rules for articles: they may vary in number, gender, and function. English, on the other hand, has a definite (the) and an indefinite article (a/an) – and that’s it.

The same applies to computer languages: some happen to be extremely strict when it comes to syntax, while others are designed for greater syntactic flexibility, letting you achieve the same result in different ways. Think semicolons: they are optional when ending statements in JavaScript, but mandatory in C.

Semantics

Semantics is what the language expression evaluates to. In human languages, expressions evaluate to thoughts, questions, answers; while computers evaluate expressions to CPU instructions that comprise a program’s flow.

Consider the following English statement:

The ant ran over the car.

I don’t know what ants you are familiar with, but even though the sentence contains the necessary elements in correct order and grammar, that doesn’t make sense. The statement is syntactically correct but semantically wrong – we can’t conceive a meaning from it.

(“The car ran over the ant”, however, would be definitely more acceptable to our ears.)

We can relate semantics to computer languages in several ways (like “solving the wrong problem”, basically), but the “semantic” word bubbles up more often when we think about HTML. Consider the following markup:

<span>What is a fruit?</span>
<div>
    A fruit is something that humans eat. Some examples of fruits are:
    watermelon, pineapple and apples.
</div>

The elements span and div do exist in the language and are validly marked in this case (I didn’t do >div<, right?), so the browser will understand and show it accordingly to the user. The syntax is correct; yet, span and divs are non-semantic elements, so computers won’t get any meaning from it. Again, syntactically correct but semantically wrong.

That text, though, is clearly a title and a paragraph. With a bit of HTML knowledge, we can communicate this in a meaningful way to computers! The language provides elements to achieve that.

The following code improves the previous markup for better semantics.

<h1>What is a fruit?</h1>
<p>
    A fruit is something that humans eat. Some examples of fruits are:
    watermelon, pineapple and apples.
</p>

The element h1 is being used to communicate the main title, and p communicates the paragraph. Both syntax and semantics make sense now!

Pragmatics

Pragmatics is what the language expression evaluates to within its interactional context – which might totally affect semantics and syntax.

In a natural language, the context is essentially built around the cultural aspects of the speaker, place, time, manner, and several others unpredictable aspects that humans introduce when communicating.

For computer languages, we can see pragmatics as the methodology or implementation used when approaching a problem: composition vs. inheritance, functional vs. imperative, picking a Design Pattern etc.

Consider the following examples:

  • A child saying that the food is horrible to a beginner that started cooking just recently.
  • A new joiner in the company filters items in an JavaScript array using forEach, but the team expected filter to be used – a more functional approach.
  • A programmer implements Fibonacci with recursion instead of iteration.
  • Your friend uses a very erudite language to add a cake recipe on the internet.

What do they all have in common? By themselves, they are neither right or wrong – unless you consider the context in which they were approached in.

Having either syntax or semantic skills won’t help you understand what’s wrong, since pragmatics is more about what’s appropriate within a context. Some approaches are more elegant, antiquated, or dialect-based than others, and we can only tell how valid they are considering what we are trying to achieve.

Syntax trees

Both kinds of languages can be represented using syntax trees, a concept that reuses the mathematical concept of partially ordered sets.

Human languages are analysed through the Concrete Syntax Tree (“CST”), a context-free analysis that tells us about the function of each language token (word, punctuation etc) used.

This is the Concrete Syntax Tree for the previous sentence, “The water drank a glass of Mary.”:

Concrete Syntax Tree of the sentence 'The water drank a glass of Mary.'
Concrete Syntax Tree for a sentence in a human language, where "S" stands for "sentence", "N" for "noun", "V" for "verb", and P for "phrase". See Wikipedia on Parse Trees for more information.

Computer languages are also represented using syntax trees! When the code (high-level) gets translated into instructions for the computer (low-level), a program called compiler make uses of them. The Concrete Syntax Trees for a program contains a lot of detailed information, from whitespaces to language expressions, so you can imagine a CST for a real-world JavaScript program as something really lengthy.

What compilers do might be tough to understand at first, but think of this program inside us that converts languages to meaning – our internal compiler. Kids improve their compilers while communicating with their families, so every day they get to understand their mother language even more. If you understand English yet struggle with German, we can say your German compiler is pretty bad and has to be improved.

The compiler simplifies the CST down to the expressions that truly represent the program after syntax analysis. That’s where the Abstract Syntax Tree (AST) comes in: the syntax tree that actually lies at the heart of semantic definitions!

Let’s consider the following JavaScript program:

let x = 0

if (x) {
    x++
}

This is the Abstract Syntax Tree for it:

Concrete Syntax Tree of the sentence 'The water drank a glass of Mary.'
Abstract Syntax Tree for a JavaScript program.

It’s really about the essentials when considering language grammar. If a whitespace means nothing in a language, that won’t show up in the AST.

Compare that against a Concrete Syntax Tree for another program of similar size, which is way more detailed, and therefore more useful in a language-specification level.

Style guides

When the same expression can be written in several ways because the language is not strict enough about it, style guides aim to answer the question of which style to go for. Curiously, style guides of writing are just as useful for programmers as they are for journalists!

Let’s say I’m telling someone my favourite fruits. I could do it in different ways:

My favourite fruits are watermelon, bananas and strawberries.

// or using the Oxford comma
My favourite fruits are watermelon, bananas, and strawberries.

The Oxford comma comes from the Oxford Style Guide.

In JavaScript, you can write a JavaScript object in different ways too:

const hero = {
    firstName: "Ada",
    lastName: "Lovelace",
    birthYear: 1815,
    superPower: "computers",
}

// or comma-first style
const hero = {
    firstName: "Ada",
    lastName: "Lovelace",
    birthYear: 1815,
    superPower: "computers",
}

Either way, there’s no definite rule. They mean the same, only changing their style of writing.

Some examples of style guides are Airbnb’s Style guide, for writing JavaScript; and Chicago style guide, for general writing.

Where they don’t match

The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. […] Yet the program construct, unlike the poet’s words, is real in the sense that it moves and works, producing visible outputs separately from the construct itself. It prints results, draws pictures, produces sounds, moves arms. 2

Even with all the similarities, programming languages are still artificial, while human languages are natural. Computers are rigorous, won’t allow you to be ambiguous, and only get what you mean if you put it the right way.

You won’t have filler words in a computer language, because computers don’t benefit from your thought process. Humans do benefit, though, and I’d even say that those communication imperfections is what connects us to others.

Apart from that, a programming language is an interface between you and a computer, while a natural language is an interface between you and another human being. You are communicating to a computer, but as a human – and you probably won’t be the only one doing it.

Writing in a way that’s meaningful for both is therefore part of the writing process – and probably the most demanding one.

Final thoughts

Just like experienced programmers look at documentation and easily apply language features to create pieces of software, fluent English speakers would look at the dictionary and easily come up with phrases.

Because both computer and natural languages share linguistic structures, speaking about programming skills in terms of fluency works just as well.

After all, fluency is all about how easily our intentions are expressed with a language.


As a next reading, I'd highly recommend Exploring the Linguistics Behind Regular Expressions if you are interested in the linguistics of computer languages.


Follow the discussion thread on Reddit

1

“Programming in Lisp” in “Structure and Interpretation of Computer Programs”, 1979.

2

Fred Brooks. “The Mythical Man-Month: Essays on Software Engineering”, 1975, page 7.