embedded software boot camp

A foreign perspective on variable names

Wednesday, August 18th, 2010 by Nigel Jones

This blog is read by people from all over the world. I make this point not to brag, but rather to demonstrate that designing embedded systems is a truly global effort. Remarkably, despite this, it appears that a huge amount of embedded code is commented in English and / or uses English nomenclature for variable and function names. This is of course wonderful for those of us that are native English speakers. However I’ve often thought that designing embedded systems is hard enough without having the additional burden of working in a foreign language.

Anyway, I mention this as preamble because last week I found myself in a rather unusual situation for me. Namely I was handed a fairly sophisticated driver which was written by a native German speaker. Now one of the things I have always liked about the Germans is that they don’t kowtow to the altar of the English language – and so I found myself looking at code that was commented entirely in German and that used almost exclusively German for function and variable names. I was thus faced with trying to understand it – which with my limited knowledge of German was not at all easy.

Anyway, as I went through the code I found myself entering variable names into an online German-English dictionary – with very limited success. Now while part of the problem was undoubtedly the technical nature of the words, I don’t have the slightest doubt that the real problem was that the author was using abbreviations / slang / jargon as well as concatenating words (e.g. (in English) bufferindex) such that the online dictionaries were flummoxed. The net result was that I had a much harder time interpreting the code than would have been the case if I had understood the variable names. Needless to say this got me thinking. How many times has a non-native English speaker looked at some of my code and entered variable names into a dictionary only to be told that there is no such word? If you subscribe to the belief that you write code for other people to read then it follows that one should take the spoken language barrier into consideration. If one does, then certain ‘rules’ become apparent:

  1. Don’t abbreviate unless you have to. While BufrWrtLmt may be understandable to native English speakers, it must be really hard to comprehend for others.
  2. In concatenating words, make the word boundary clear either via underscore or via camel-case. Thus buffer_index or bufferIndex.
  3. Pay attention to your spelling. A simple spelling mistake such as writing temprature when you meant temperature can completely stymie someone using a dictionary. While I don’t know of a tool that spell checks variable names, there are several tools available for spell checking comments.

As a passing observation, not only will these changes make your code easier to comprehend for non-native English speakers, it will also do wonders for those of us that purport to speak English as our native tongue!

33 Responses to “A foreign perspective on variable names”

  1. Austin Morgan says:

    While it is not perfect, the Eclipse IDE has a feature that will try to spell check your variable/function/method names. It often has difficulty with camel case especially when mixed with TLA, but I hate TLA’s anyway.

  2. Darren says:

    Seeing as though this is about spelling and English, I think you’ll find its an _altar_ 🙂

  3. Bernhard Weller says:

    Being a native German speaker, I find myself wondering every time on which language I should use to name my variables and to comment things.
    In most cases something with mixed languages is the product. I’m under the impression, that almost every data-sheet is provided in English, very few of them are also available in German, some of them are also supplied in Japanese. That said, I suspect that everyone doing embedded software development has at least some understanding of the English language.
    So in my case I often end up writing comments in German, but naming variables, functions and classes in English.
    I guess that’s fine as long as the English is good enough to be understood by most of the people, but someone who only speaks poor English should probably stick to his or her native language, because like you said, a poor spelling will most likely create more problems than it solves.

  4. FrankSansC says:

    Very interesting topic as always. I don’t care if someone writes code in English or French as long as (s)he sticks to it. I’ve seen too much code with mixed languages and it’s unbearable. Personally I write all my code in French because, as you may have already noticed, I’m not really good at writing proper English.
    Regarding abbreviations you’re right but it’s sometimes really difficult not to use them, especially in languages like French or German where most of the words are much longer compared to English (ex: “get” can be translated to “récupérer” or “retourner” or “recevoir” in French which is 3 letters for ~9 letters ! How lucky you are English speaking people 🙂 ). And as most of the compilers complains when you have variable or function names longer than 31 letters you sometimes have no another choice to abbreviate if you want that the name you’ve chose means something (and something useful !).

  5. I am English and write it rather well because I was taught properly. I am also 63 years old. Most programmers are younger than I am and many of the British ones haven’t learnt how to spell or punctuate much of what they write – and seem not to care about it. Judging by things I’ve seen in discussion groups, many Americans fare just as badly. And don’t get me started on the grammar!

    Those who learn English as a second language tend to do it properly, so, Bernard and Frank, don’t worry too much about your own efforts or those of your countrymen!

    Having said all that, I do hate bad spelling and I share the mixed feelings expressed about abbreviations. The kind of abbreviation I cannot bear is that which uses numbers or letters phonetically to indicate words, or parts of them, for example:

    gr8 = great
    4 = for
    2 = to
    u = you
    i = I (not quite in this category, but equally hateful)

    Any business displaying this kind of nonsense in its name or its internet domain name is unlikely ever to sell me anything! We have already foisted two languages on the world – the written one and the spoken one, which in English, particularly, are not well related; combining the two, as above, is doubly inconsiderate!

  6. Lundin says:

    I’m Swedish, and I always code in English. I strongly advise people to do the same . In my company we have introduced a complete ban of source code written in any other language than English. There are two main reasons for this: you cannot export your code outside your country, and perhaps even more importantly, you cannot get support for your code.

    Whenever one encounter some tricky problem one can’t solve by themselves, a wise person asks for help. Particularly when encountering silicon bugs, compiler bugs, poor compiler conformance etc, but perhaps also when you are just truly stuck in a bug caused by yourself. You can then ask for help either through an “official channel”, such as the manufacturer or some tech support. Or through one of all the helpful, free of charge online communities that become increasingly important for engineers. In either case, you can’t come dragging some some mumbo-jumbo code written in Swedish with you. Nobody will want to touch that code.

    On one occasion we had some big code written in Swedish and it was a pain to troubleshoot because of this, as the external support we needed couldn’t be found unless we translated the whole thing first. The project was delayed several weeks solely because of this, and since then, “Swedish code” has been banned.

    Another aspect of this is indeed learning, but not in the way Nigel puts it, rather the contrary. Yeah, some may find it tricky to learn technical terms in another language than their own. But the -only- alternative is learning the English term -plus- the native term. This is sheer madness!

    I remember studying C++ back in school and they fed us a book in Swedish. Of course all terms like constructors, destructors, polymorphism, inheritance etc etc was translated to some entirely different Swedish word. So not only did I need to learn the English terms, I also had to learn some nonsense Swedish terms I never actually need to use. Double the effort for learning a language that was already incredible complex to begin with!

    On top of that, most technical terms come in different English flavours or synonymous words. “floats/float numbers/floating point numbers”. Then in the native language there are about as many versions as well. For example, float would translate to “decimaltal” (“decimal number”) in Swedish… I guess we call them that because they have a decimal comma. Then after translating it to Swedish, go ask an English programmer if the format of a variable is a “decimal number”… and he will answer… “err no it is actually hex”. Total confusion guaranteed.

    Also note that I wrote “decimal comma” above… in Sweden we use the “,” character for floats, and not the “.” character. I believe the same is used in Germany? At least it is very annoying in programming.

    int x = 10,000; // oops, it will compile, perhaps even without warnings. Happy debugging!

    Yet another argument is that C/C++ and every other programming language uses English keywords. So even if you would name variables etc in your native language, the reader would still have to know English to understand anything. What’s the point?

    • Bernhard Weller says:

      Yeah the “,” is also used in Germany, and it is a constant pain (luckily in a function call you get an error). Especially exporting some numbers from one program to another can really cost you time. Just have one program expecting German numbers with commas and another one giving you English numbers with commas as is 1,000.5. Or the other way round, 1.000,5 which basically is the same in another notation.
      I really hate this kind of confusion. And the next thing I really don’t like is different units of measurement for the same thing, and best of all mixing them up like it often happens in PCB layout where you have parts which have .5 spacing and others which have .5 spacing, while one is inch and the other mm. Or SMD packages named 0402, there is one in inch and one in mm, completely different patterns needed, so I check every time which one it is.
      I sometimes feel like I’m still stuck in the building of tower of babel.

      • Nigel Jones says:

        The comma as a decimal separator issue is one I am acutely aware of. For products with a UI that will be sold in Europe I always try and make the decimal separator a configuration option. It isn’t something I see done much, so I’d be interested to know if you bother, or whether you just go with a decimal point.

        • Lundin says:

          For the products I’m designing, which are mainly sold in Europe, I used to do various projects with LCDs and added optional “comma notation” as a feature. Nobody ever requested it, or frowned when they got a “dot notation” delivered. So now I just use the “dot notation” everywhere, the customers will simply have to deal with it.

        • Bernhard Weller says:

          Well, I always keep my intended users in mind. Do I write something for engineers? Do I write something for usage in the production line?

          If it is something for engineers, I think that they will be able to understand and use software which uses a decimal point. There just should be a hint somewhere like all displayed values use a point, or a complete English interface suggest the use of a point, where a German interface points at a comma as separator.
          If it’s something for the production line, I’m faced with people who do a lot on a basis of common sense, which is of course using a comma as separator here in Germany. So I try to use this as well.
          If I’m unsure about what users I’ll have I try to handle both cases, which leads to quite some overhead in the code, but it’s more or less working for everyone.

          But I hope, that sometime we switch over to using the point everywhere, it’s just annoying to spend time on issues like this.

          • Gauthier says:

            One of the first thing I do when setting up a new computer for work is setting the decimal separator as dot. You can do that in windows, it helps when importing and exporting to and from different programs (excel, octave, you name it).
            You still have the UI issue of the product you develop, but at least you know that your machine is less likely to mess things up for you.

            The best would be to have an US OS, but sometimes it’s not in your company’s policy (sigh).

            Nigel: dot *could* be confusing if you displayed exactly three digits after it. In that case you don’t know if it’s a decimal separator or a thousands delimiter. Otherwise I think people wouldn’t even think about it.

          • Nigel Jones says:

            A few years ago a friend of mine was doing some .NET work for a product used mainly in China. As a result he wanted to get the Chinese version of Windows. Apparently Microsoft was baffled as to why anyone in the USA would want to do such a thing, and as a result he went through a lot of hassle before finally getting it.

          • Kubik says:

            Actually, Microsoft seems to find the idea of someone in a particular country using a non-localised software quite odd. When I needed VS2008, I bought it online from a German MS site (I am located in Germany) as I wanted fastest delivery possible. To my surprise, the VS2008 I received was a German version – note that there was no language option during the purchase process 🙂 According to MS helpdesk, the only way to get an English version would be to return what I got and purchase via US or UK MS site.
            Anyway, back to the topic… I would suggest one more rule to the original article: “Try to stick with common English words.” Our chief SW architect is one of the smartest guys I ever met, but he tends to use words like “pristine” and “salient” that don’t really ring any bells (honestly, I still don’t remember what those mean, and I did look them up in dictionary not so long ago).

          • Nigel Jones says:

            I think the stick to common English words to be an excellent suggestion. I have to admit that I’m guilty of using a rather wide (and hence obscure) vocabulary. I’ll try and clean up my act.

        • Tomas B says:

          In our products we have chosen to combine the decimal separator setting with the language setting. Meaning, that if you set the UI language to German or Swedish, the decimal separator will also switch (to “,”). This way for instance csv files exported to an SD card will easily import into Excel or oOO without any hassle (since most users use Windows set up for their language/country). However, we still have the option to set the decimal separator separately available for “experts”, but for most users, everything “just works” when they set the correct language in the embedded device.

  7. Lundin says:

    I just realized that the example of the Swedish comma fiasco didn’t make much sense. This is closer to what I meant:

    float x = 12,345f;

  8. Kyle Bostian says:

    Just out of curiosity, did you notice any convention for capitalization in the code? Since all nouns in German are capitalized, what’s intuitive and natural for a German speaker to write may be at odds with what English speaking programmers may expect when reading code.

    • Lundin says:

      That isn’t really dependant on culture, but on personal preference in coding style. For the C language, the two most common variable/function naming notations by far are:

      only_lower_case
      MixedUpperCase

      If you would write in some other style than these two most common, you would however confuse me a bit. For example, I personally find it harder to read the typical Ada style: Mixed_Upper_Case. Not to mention code from “assembler programmer going C” where everything is in upper-case. But that could be because I’m not used to reading Ada or assembler. I think that the programming language has a much higher influence on coding style than our native language.

      But then it is all personal preference. As long as you are consistent I see no problems with style. The least we can demand is that the programmer and/or their company have a consistent coding style, so that all code in a project looks the same.

    • Nigel Jones says:

      Sorry for the delay in replying. An interesting question! Capitalization was used extensively in the comments (which is not something you see much unfortunately), but was not used in variable names. Here’s an example:
      byte bitzahl; /* Bitanzahl des Lagewertes */

    • Bernhard Weller says:

      If you’re writing comments in a specific language, I think you should keep the language as correct as possible. So write the comments in plain English, German or whatever and write it like you would do in a book.

      As for variable names, I guess it’s like Lundin says, a matter of personal preference or coding guidelines. Like the one I adhere to now says to write everything using CamelCase with a typeprefix for variables in lowercase, like ubExampleUnsignedByte. I don’t like the typeprefix, but I guess thats another point of personal preference.

      I guess some sort of marking where a new word begins in a concatenated name is very helpful for understanding, so I would have named the variable either bitZahl or BitZahl. Although of course in German you can concatenate words like crazy, so maybe we Germans are not aware of this problem.
      (It is a fun thing if you like playing with your language 😉 like Donaudampfschifffahrtsgesellschaft or even better/worse Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz and yes these words are legal and not something I just made up)

      • Nigel Jones says:

        Thanks for making me laugh this morning Bernhard. On a more relevant level I wonder if native German speakers are inherently better at parsing concatenated words used for variable names? BTW I also do not like type prefixes on variable names. I find it makes the variable name a lot harder to read, and thus it makes the code harder to read. I find that Lint is a far more effective tool for eliminating data type problems than using a prefix that may or may not reflect the underlying type.

  9. Dick Selwood says:

    Nigel

    A “foreign” perspective? Don’t want to sound too picky, but don’t you mean “non-American” (or even “non-US”) or “non-English” perspective?

    Using foreign as a word suggests that in some way that non-foreign is the norm. It might be for you, but as you have seen by the reaction, you are writing for an international audience. What percentage of embedded work is carried out in the US? 25%?

    Don’t worry. At a rough estimate 990% of US companies write everything from a US perspective and then as an afterthought add in “foreign” or “international”

    Bet wishes from England

    Dick

    • Nigel Jones says:

      Hi Dick. I think I really did mean “foreign” in the sense that no matter what language you choose to comment / name variables in, it will be foreign to someone else. I’m a UK citizen that was raised in Germany and now lives in the USA. Furthermore, the two main projects I am currently working on are for Swedish companies – thus I think I have a fairly international perspective. That being said, your observation about US companies being completely US centric is right on the money. The attitude that everyone should speak American (which I’m sure you know is rather different to English) and use imperial units is widespread. However, as the USA moves to being a bilingual (American / Spanish) society, I see signs that attitudes are changing – albeit without a fight!

  10. Clint Hobson says:

    Well, I came here after reading what I felt was a rather obnoxious comment on Stack Overflow to a German programmer asking for help – one of the first comments on his reasonable question was “It is not normal to be writing your variable names in a language other than English!” Not only was it unhelpful to the original question but even as a native English speaker (from Australia) I thought this was just plain rude and ignorant as well – probably American, one would assume.

    But after reading Lundin’s comment about “banning non-English code” in Sweden, and FrankSansC’s comment about very long French words compared to the same word in English, I can see the argument for this. It seems we just would like to stick to one language, and that one language is the one that happens to be popular and also has shorter words. (I still think the Stack Overflow comment was rude though!)

    Actually I really came here after working with a new graphics engine called Tilengine, written by a native Portuguese speaker, and noticed his comments were in Portuguese, but his code (variable names, etc.) were consistently English all the way. It got me thinking about that comment I read a little while back, and I started my search.

Leave a Reply to Gauthier

You must be logged in to post a comment.