Why you really shouldn’t steal source code

Why you really shouldn’t steal source code

Saturday, February 11th, 2012 by Nigel Jones

As an embedded systems consultant, I spend a substantial part of my work time working on your typical embedded systems projects. However I also spend a significant amount of time working as an expert witness in legal proceedings. While the expert witness work is quite varied, one of the things I have noticed in the last few months is an increase in the number of cases related to source code theft. Typically these cases involve the plaintiff claiming that the defendant has stolen their source code and is using it in a competing product. These claims are often plausible, because as we all know, it’s trivial to walk out of a company with Gigabytes of information in your pocket. Even in companies with strong security measures, it’s normally the case that the engineers are smart enough to work out how to bypass the security systems, so the ‘there’s no way I could have got the code out of there’ defense isn’t usually very plausible.

Thus given how easy it is to steal source code, why shouldn’t you do it? Well let’s start with the obvious – it’s wrong. if you don’t understand this, go and have a chat with your mother – I’m sure she’ll spell it out for you. Notwithstanding the morality (and legality) of the issue, here’s another reason why you shouldn’t do it – there’s a great chance you’ll be found out. If that happens, you can find yourself in serious legal jeopardy.

So just how easy is it to show that someone has stolen your code?

Typically the first step is for the (future) plaintiff to have their suspicions aroused. If half the engineering department leaves and starts up a company with a competing product, then it’s hardly surprising that your ex-employer will be suspicious. Of course suspicions aren’t grounds for a lawsuit. The plaintiff needs at least some evidence of your malfeasance. Now sometimes this can be done purely by the functionality / look and feel of a product. However in other cases it’s necessary for the plaintiff to get at your code’s binary image. You can make this very hard (and hence expensive) to do. However for your typical microprocessor, this step is surprisingly easy. Indeed there are any number of organizations around the world that are quite adept at extracting binary images from processors. So what you may ask? I took the code, moved stuff around, used a different compiler and compiled it for a different processor, so good luck with showing that I used your code. Well the trouble with this, is that using tools such as Ida-Pro, it’s easy to count the number of functions in the code, and the arguments they take. These metrics are a remarkably good signature. [BTW, there are other metrics as well, but I really don’t want to give the whole game away]. Thus if the original code base and the stolen code base have a very similar function call signature, then there’s an excellent chance that the plaintiffs have enough evidence to file a lawsuit.

It’s at this point that you are really in trouble. As part of the lawsuit, the plaintiffs are allowed to engage in discovery. In a case like this, it means quite simply that the court will require you to turn your source code over to an expert that has been retained by the plaintiffs (i.e. someone like yours truly). At this point, I can use any number of tools that are available for comparing code bases. Some of the tools are designed expressly for litigation purposes, while others are just some of the standard tools we use as part of our everyday work. Anyway, the point is this: these tools are really good at finding all sorts of obfuscations, including things such as:

Renaming variables, constants functions etc.
Changing function parameter orders
Replacing comments
Adding / deleting white space
Splitting / merging files

In many cases, they can even detect the plagiarism (theft) even if you have switched languages. In other words, if you have indeed stolen the source code, then the chances of it not being conclusively proven at this stage are pretty slim. In short, life is about to get very unpleasant.

Having said the above, I like to think that the readers of this blog are not the type that would engage in source code theft. However I suspect that some of you have been tempted to go into business competing against your current employer. If this describes you, then what should you do to ensure that you don’t get hit with a lawsuit a year or two after starting your own business? Well clearly the best bet is not to go into a competing business. However if you must do this, then get some legal advice (please don’t rely on what is written here – I’m just an engineer!) before you start. You will probably be advised to do a ‘clean room’ design, which in a nutshell will require you to demonstrate that the code in your competing product was designed from scratch, using nothing from your former employer. Be advised that even in these cases, if you adopt the same algorithms, then you may still be in trouble.

This entry was posted on Saturday, February 11th, 2012 at 8:41 pm and is filed under Compilers / Tools, Consulting. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

25 Responses to “Why you really shouldn’t steal source code”

Lundin says:

February 13, 2012 at 3:31 am

“Indeed there are any number of organizations around the world that are quite adept at extracting binary images from processors.”

Is this really that easy on modern MCUs? Modern as in younger than 10 years. With the new flash security features, I can’t even extract the binary out of my own products any longer, let alone someone else’s. But maybe them copy cats have found ways around this?

Log in to Reply
- Nigel Jones says:
  
  February 13, 2012 at 6:34 am
  
  Yes it is. For a very entertaining read on how this is done, go to http://www.cl.cam.ac.uk/~rja14/tamper.html . I think you’ll be both impressed and depressed at just how easy it is to attack some processors. Even so-called security processors used in smart cards have proven to be vulnerable. However to be fair I have heard (but don’t know for sure) that the latest security processors are very good.
  
  Log in to Reply
- David Garcia says:
  
  February 13, 2012 at 9:59 am
  
  Well, as a matter of fact, even with current microprocessors the job is trivial.
  
  Check http://www.cl.cam.ac.uk/~sps32/sec_news.html were you get paragraphs like:
  
  “This paper is a short summary of a real world AES key extraction performed on a military grade FPGA marketed as ‘virtually unbreakable’ and ‘highly secure’. We demonstrated that it is possible to extract the AES key from the … in a time of 0.01 seconds using a new side-channel analysis technique called Pipeline Emission Analysis (PEA) developed by Quo Vadis Labs (QVL) … We will show that with a very low cost hardware setup made with parts obtained from a local electronics distributor ….”
  
  You can also check http://www.flylogic.net and get depressed.
  
  In summary, it is not worth the time protecting your code.
  
  Log in to Reply
  - Nigel Jones says:
    
    February 13, 2012 at 8:14 pm
    
    I guess it depends on your definition of trivial. Flylogic is always interesting. I’m actually very familiar with Chris Tarnovsky’s work (the proprietor of Flylogic). However I’ll leave that story for another day.
    
    Log in to Reply
- Dan says:
  
  July 7, 2015 at 12:51 am
  
  What about the reverse, fake companies stealing code? For example I just got an email from a company offering remote working. I apply, then they come back asking me to send them code I’m proud of that should be a complete product. Not just a snippet. Image how much free code they could be recieving – what’s to stop them publishing it?
  
  Log in to Reply
Miro Samek says:

February 13, 2012 at 10:05 am

Thank you for this post. Disregard for intellectual property rights can cause more damage than many people realize.

While this post is mostly about plagiarizing commercial software, the problems are much worse in the open source domain. This is because by nature open source is, well… open for anybody to take and the developers of open source typically don’t have the budgets for litigation. On top of this, there is still a lot of misunderstanding of open source and many people don’t distinguish between BSD-type and GPL-type licenses.

But there is also the opposite facet of the widespread software theft problem. How do you prove that *your* software is clean?

Unfortunately, again, the burden of proving originality of code is much heavier for open source vendors than commercial vendors. When you buy a software license from a traditional closed-source vendor, you just presume without much proof that they didn’t steal the code that they are selling you. However, when people license open source code (e.g., dually-licensed open source can be licensed for closed-source use), the originality of the code is always questioned. This is illogical, and in fact, should be the opposite.

For example, Quantum Leaps (my company) provides dually-licensed QP state machine frameworks. QP software is available as open source (under GPL) and under commercial closed-source licenses. The point is that because QP has been open for over a decade now, it has been scrutinized much more thoroughly than any closed-source project out there.

The bottom line is that I don’t know of any better way of proving originality of a piece of software than to put it out there for anybody to see and check. This is, of course, no different than checking legality of any other human activity. It has to be a transparent, public process.

Log in to Reply
- Nigel Jones says:
  
  February 13, 2012 at 8:17 pm
  
  A very thoughtful post Miro. I hadn’t really given much thought to the problem of proving your code is clean. However thinking about, I can envisage the nightmare scenario where you hire someone to write say a USB driver, only to find out years later that what have they done is plagiarized someone’s work without your knowledge – but leaving you with the legal liability.
  
  Log in to Reply
Ufuk Sevim says:

February 20, 2012 at 5:44 am

I’m working in a small company for 4 years now and I designed and wrote all embedded software libraries from scratch by myself. If I start my own embedded software company, then there is a good chance that the design and algorithms will be similar to this one, even if I do my design from scratch. So, the question is how to avoid stealing code from yourself?

Log in to Reply
- Nigel Jones says:
  
  February 20, 2012 at 6:52 am
  
  Well the first observation is that from the law’s perspective, you aren’t stealing from yourself – you are stealing from your ex-employer. Secondly, a common defense is that I wrote it for my ex-employer and now I have rewritten it – so it’s not surprising they are similar. While this has some merit particularly with regards to coding style, where you can get tripped up on are the algorithms. Regardless of whether you rewrite the code from scratch, if it uses the same algorithms, then you are probably on the hook for stealing. I can’t emphasize enough how important it is for you to get some legal advice before you go down this road.
  
  Log in to Reply
  - Ian Johns says:
    
    March 1, 2012 at 11:46 am
    
    Well, then almost every company has “stolen” their memcpy()/strcpy() implementations from the original alpha source. Which shows the ridiculousness of some aspects of copyright/trademark ownership. “Oh, your brand tires are black and round like ours. Clearly, you stole our design.”
    
    Log in to Reply
Mehdi says:

February 21, 2012 at 2:07 pm

My investment partner and I are starting a small mobile software startup and we’re planning to hire 4 engineers during the upcoming months. We are now facing this dilemma : Do you leave a relatively “free” access to our employees and hope they will remain loyal and fair since we show them we trust them ? Or do we try to implement a strict IT policy (blocking file uploading, semantic e-mail search, no USB …), knowing that there will always be at least one breach in the system that an ill-intentioned developer could use ? In the latter case do you have an idea about some information ressources on the web we were could find such tools ? We’re planning to use a cloud based file management system.
Thanks !

Log in to Reply
- Nigel Jones says:
  
  February 21, 2012 at 2:13 pm
  
  Personally I find the trust your employees route to be the best policy (although I am of course a bit naive). Putting in place strict IT controls will usually slow you down. Furthermore anyone intent on stealing the code will almost certainly be able to circumvent whatever procedures you put in place. I think your best bet would be to use a simple file download tracker, so that you have proof that employee X had the code at some point.
  
  Log in to Reply
Jeremy says:

March 13, 2012 at 10:45 am

Although I agree with the article 100%, stealing source code is wrong, extracting a binary to investigate is perhaps a legal gray area, and I would expect those more knowledgeable in the law than I to have varied opinions.

My understanding of the DMCA is that even if it is using stolen software, you are also breaking the law to prove that by reverse engineering a product. You then potentially open yourself up to litigation from the device manufacturer and counter-suits stating the DMCA as grounds for getting the case tossed. If tossed, a counter-suit for harassment can be even more expensive, as suing people that left your company is seen by many judges as being retaliatory and using a more vast financial means to ruin someone is not viewed favorably.

Lawsuits of retaliation are more common than many of us would believe, as engineering professionals. At least in my experience, a lot of non-engineering managers have the “you don’t quit me, I quit you” mentality.

Log in to Reply
- Nigel Jones says:
  
  March 13, 2012 at 11:28 am
  
  I do a lot of DMCA related work. However I’m an engineer and not a lawyer, so take what follows with a large grain of salt. The DMCA normally kicks in when one has circumvented a copy protection mechanism (typically encryption). Whether taking the lid off a chip and reading the ROM contents constitutes a DMCA violation is debatable.
  
  Log in to Reply
cnxsoft says:

March 29, 2012 at 7:05 am

Source code theft is common practice in China. They may even start a new company while they still work with their current company. Some companies tried to split offices in different cities (without contact between teams), but developers finally found each other and started their own project based on the existing source code.

The problem is that it’s very difficult to do anything, at least in China.

Another type of theft is with dual license open source code, where it can be GPL (release the modifications) or commercial (keep your modifications). In that case, many companies do not bother, although some will obfuscate the code (e.g. rename functions) to avoid potential legal issues.

Log in to Reply
Herman says:

April 23, 2012 at 10:28 am

Devil’s advocate: Enforcement of intellectual property rights actually infringes on the legitimate property rights of others. I.e., Monopoly privilege granted by government over ideas-whether for books, music, source code, etc.-dictates how the owners of legitimate property-paper, hard discs, minds, etc.-can utilize their property. Ideas, especially when made digital, are infinitely reproducible and non-rivalrous. X’s use of Y’s code does not prohibit Y’s use of said code without limitation.

This does not mean Y cannot obfuscate, secure, or even boobytrap his ideas to prevent external use. It also does not mean X can trespass, burgle, or breach contracts to obtain Y’s ideas. Commercialization of ideas should rest solely on Y’s ability to satisfy the wants of consumers, not his ability to strong arm government and bureaucracy to acquire special protection from competition.

More reading: http://mises.org/books/against.pdf

Log in to Reply
- Enzo says:
  
  April 25, 2013 at 2:47 am
  
  Interesting PDF book there. I was rather amused at the header stamp which read “Copyright © 2008 Ludwig von Mises Institute”
  
  One thing I do know for sure, someone who is clever enough and wanted to steal a few functions could surely modify their signature by changing the order of operations and adding a few extra arguments without breaking the functionality. It could probably be done automatically using a randomizing algorithm. I personally think that if they can hide their theft well enough to evade detection, then they should be welcomed to it. Simply relabeling someone else’s work with their brand name is another issue…
  
  Notwithstanding, the hypocrisy of even the most vehement opponent to intellectual property and copyrights would still become evident if they were presented with the right circumstances.
  
  Log in to Reply
Juan Hernandez says:

August 20, 2013 at 9:57 am

According to Judge William Alsup, API’s can’t be copyrighted because it would “monopolize ideas”.
Check this link:
http://arstechnica.com/tech-policy/2012/05/google-wins-crucial-api-ruling-oracles-case-decimated/

Log in to Reply
- Nigel Jones says:
  
  August 20, 2013 at 10:03 am
  
  Very interesting. Thanks for the link.
  
  Log in to Reply
Enrique Flores says:

November 12, 2013 at 4:34 pm

“In many cases, they can even detect the plagiarism (theft) even if you have switched languages. ”

Hi Nigel, I’m interested if you know some work that can detect plagiarism between different languages.

Modern obfuscators can work nicely and decompose a simple function into hundreds or thousands of functions. Of course, the source code will be unreadable but no one could prove that you are commiting plagiarism.

Log in to Reply
- Nigel Jones says:
  
  November 13, 2013 at 7:51 am
  
  There are indeed tools that can do this. Obviously the more the code is obfuscated, the harder it is to meet the evidentiary standard of proof. At a certain point you are spending more time hiding the crime than you’d have spent doing a clean room design.
  
  Log in to Reply
Mike says:

March 1, 2016 at 11:53 pm

We can try to use WSCSA service to protect your website source code, https://www.wscsa.net
Website Source Code Stealing Alarm (WSCSA) works like CCTV and Alarm system, it is mainly designed to assist in identifying and deterring someone who attempts to steal your website source code. Now WSCSA is offering two versions : Free & Premium.

Log in to Reply
Ryan says:

March 16, 2017 at 9:31 pm

Is it possible to get their github history in discovery?

Log in to Reply
- Nigel Jones says:
  
  June 2, 2017 at 1:08 pm
  
  Yes – but it’s not easy.
  
  Log in to Reply
Pascal Bourguignon says:

November 23, 2017 at 7:29 pm

I think that countries should not allow the sale (and much less the importation) of any device that doesn’t come with the sources of their firmware provided, inspected, and compiled (by the customer preferably, but in the country if not). For reasons of the various recent scandals (Toyota, Volkswagen), but also for sovereignty reasons.

I hear that 90% of the security cameras used on US military bases are China made. Wouldn’t you want to check their firmware?

This could also apply in general to software, but it is easier to use freedom or open-source software and thus avoid US-made or China-made software, both in private use and for governments, than in the case of firmware, which may hides in all kinds of apparently physical and mechanical devices.

Log in to Reply