Posts Tagged ‘opensource’

Tools to Detect Software Copyright Infringement

Thursday, September 23rd, 2010 Michael Barr

An emerging class of tools makes it easy to automatically detect copying of copyrighted software source code, even if it came from one of the hundreds of thousands of open source packages.

I am presently providing litigation support in a case of alleged software copyright infringement.  In a nutshell, the plaintiff brought suit against the defendant for allegedly continuing to use plaintiff’s copyrighted software source code in defendant’s products after termination of a license agreement between the parties.  Fortunately, automated tools are making it easier than ever to quickly and inexpensively detect copying of software source code.

Some of the most powerful tools for doing direct comparisons between a pair of source code sets are from S.A.F.E. Their CodeMatch tool works by comparing each file of source code in the first set with every file of code in the second set.  Results are presented in a table that is sorted by the relative amount of matching code in the files.  And CodeMatch is clever enough to detect copying in which variable and function names and other details were subsequently changed; CodeMatch can even detect code that was copied from one programming language into another.  The only weakness of CodeMatch is that you have to have the source code for each product, which is not always possible early in litigation.

Other tools from S.A.F.E. provide additional help.  For example, BitMatch can compare a pair of executable binary programs or one party’s source code against another’s executable code.  It works by matching strings that appear in both programs.  Meanwhile, SourceDetective helps rule out that the two programs are only similar because they both borrowed from some third program—by automatically searching the Internet for hundreds or thousands of matching phrases.  CodeMatch, BitMatch, and SourceDetective are part of a suite of related tools called CodeSuite.  CodeSuite is a free download that runs on Microsoft Windows, with license keys sold based on the amount of code to be compared.

Of course, sometimes code may be copied from open source software.  Open source software is subject to so-called copyleft licenses, which are a special type of copyright that makes the source code open to the public.  Copyleft language is drafted to ensure that the source code for certain categories of derived work are also open to the public.  This creates problems for companies that wish to keep their source code private but also rely upon open source software.

Fortunately, there are also tools to detect the presence of part of all of an open source software package within a proprietary program.  I have used such tools from Black Duck Software and Protecode.  Both work similarly: each company maintains a database of hundreds of thousands of known open source packages against which the source code you provide is tested. Results are presented as a list of open source packages from which code may have been copied. This testing can be done entirely on a personal computer running Microsoft Windows, so that proprietary source code need not be sent outside a trusted network.  Both tools are generally licensed for an expected level of use on an annual basis.

Unfortunately, the precision of CodeMatch is lost in trying to cast such a broad net for potential copying.  The tools from BlackDuck and Protecode don’t actually compare your code against each and every of the millions of source code files in their database.  Instead, they reduce each file of your source code to a simpler representation of its structure and then compute a unique mathematical signature for that new file.  This signature is subsequently compared to a similar representation of the files in their database.  In plain English, this means that you get lots of false positives.  Some open source packages that weren’t actually copied usually turn up in the results list.

When searching for potential copying of open source code, I recommend searching the database from BlackDuck or Protecode first.  Then, to eliminate the false positives, a more thorough analysis should be performed by obtaining the listed open source packages and using CodeMatch to compare the proprietary code against them file-by-file.

With the help of tools like those mentioned here, it is possible to quickly ascertain whether source code copying has taken place.  Prior to the appearance of these tools, it was necessary for an expert in software development to manual perform dozens of searching and comparison steps.  This strategy can be used early in litigation with the benefit of dramatically reducing the cost of such analysis.  The same tools can also be employed proactively by companies seeking to reduce their risks of copyright infringement litigation.

Free as in, well, Free Software

Wednesday, September 27th, 2006 Michael Barr

There’s no such thing as free beer. But free software abounds. It seems that everywhere I look these days companies are offering their embedded operating systems and tools for free evaluation. Often, the price includes full access to the source code.

Examples just this week include the announcement that Quantum Leaps would make the source code for its previously proprietary QP-nano product available under GPL, that Micrium would release the source code for the TCP/IP stack it developed at great expense under a 45-day evaluation license, and that Hitachi‘s brand new Entier relational database could be downloaded for use in 30-day trials.

Given access to the source code of a complicated product such as an operating system, network stack, or relational database how many people pay? A restrictive license is perhaps a legal consideration, but if these guys can afford to give their source code out wily nily how do you feel about being the only schmuck to actually cough up dough? Does anyone buy embedded software components anymore?

Open Sores

Saturday, January 5th, 2002 Michael Barr

In the past two years, increasing numbers of embedded programmers have been getting to know Linux and other open source software packages intimately. What has primarily attracted this interest is the non-existent pricing structure. But some of the initial enthusiasm—particularly for Linux—seems to be fading.

Is the use of open source software as building blocks for embedded systems just a fad?
I’ve just found a couple of interesting insights about Linux buried within a recent survey of embedded developers by Evans Data Corporation. The survey asked a number of questions focused on Linux, and the results are cross-tabulated in interesting ways. One table, titled “Perceptions of Linux’ Biggest Technical Difficulties by Degree of Community Interaction,” presents data gleaned from a question asked of those considering and already using Linux to various degrees, sorted by their experience level. Developers who hadn’t actually done anything with Linux yet (about 84% of those surveyed) perceived its biggest technical hurdles to be “availability of device drivers” and “lack of board support packages.” However, developers with hands-on Linux experience including kernel modifications (about 6%) were most concerned about the “size” of the package.

You’d think that the size of the Linux code (which is measured in Megabytes), its worst-case interrupt latency and other performance characteristics, and RAM requirements (also Megabytes) would be the overriding concerns for embedded programmers. And yet the big issues that I hear everyone complain about are legalities surrounding open source licensing terms and fragmentation of the, widely distributed, code base. In reality, these latter are not big problems for embedded programmers—as those who’ve actually investigated Linux already know. It’s the memory and performance issues that really get in our way.

As the reality begins to overtake the hype, a consultant/author friend had this to say about the evolving market for his Linux services:
Two years ago I was pumped up on embedded Linux. You said it would pass; I thought you were crazy. Well… I just stopped work on my book. I only found two Linux clients and I ran out of money. Back to VxWorks to pay the bills—and get me out of debt for the time and effort I put into Linux.

Though there are certainly companies out there embedding Linux, the market isn’t growing as rapidly as most analysts predicted it would.