C++ trigraphs and digraphs

My watching videos from CppCon2015 proceeds. One of a lighting talks which I saw today was about how fancy syntax we can use in C++. Let's start by examples: 

int main() {
  /??/
  * A comment *??/
  /
  return 0;
}

Is this correct C++ code?

The answer is yes. It does nothing, main returns 0 and there is comment before return. But how? ??/ sequence has a special meaning. It's so called trigraph and as C++ standard says:

Before any other processing takes place, each occurrence of one of the following sequences of three characters (“trigraph sequences”) is replaced by the single character indicated in Table 1.


----------------------------------------------------------------------------
| trigraph | replacement | trigraph | replacement | trigraph | replacement |
----------------------------------------------------------------------------
| ??=      | #           | ??(      | [           | ??<      | {           |
| ??/      | \           | ??)      | ]           | ??>      | }           |
| ??’      | ˆ           | ??!      | |           | ??-      | ˜           |
----------------------------------------------------------------------------

You can deduce that ??/ is the same as backslash sign \. Having this knowledge you can consider a code like this:

 // Will the next line be executed????????????????/
 a++;

Examples comes from wikipedia.

 

There are even more fancy constructions, for example the one from mentioned video.

int main(){<:]()<%[](){[:>()<%}();}();}();}

This is also valid C++ code :) It uses digraphs, which are defined like this:

-------------------------
| digraph | replacement |
-------------------------
|       | ]           |
| <%      | {           |
| %>      | }           |
| %:      | #           |
-------------------------

So you can translate above line into this:

int main()
{
    []()
    {
        []()
        {
            [](){}();
        }();
    }();
}

You may ask - why are such a things in C++ language? In fact they come from C language. There is a legend, that in some countries there are keyboards that do not have some of symbols that are used commonly in programming like | or ~ and that's why tri and digraphs were introduced.

In fact code containing trigraphs may not compile, at least using clang and gcc, even with some older versions like gcc 4.3, because you need to add a compilation flag -Wtrigraphs. If you don't have it probably compiler will emit en error: 

warning: trigraph ??/ ignored, use -trigraphs to enable [-Wtrigraphs]


But you can freely use digraphs, so the example with lambda compiles without any warninigs.

I remember that I wanted to post information about trigraphs and digraphs much earlier, but I forgot about it. How "fancy" program can you write using digraphs or trigraphs? 

PS. Mentioned video is here:

 

 

Leave a comment

Your email address will not be revealed on this site.
(For my next comment on this site)
(Allow users to contact me through a message form -- Your email will not be revealed!)