VendorID - Signing Python Interpreters and Modules

Reference Guide

Contact: ulrich.berning@desys.de
Version: 1.0.0
Copyright: Copyright (c) 2005 DESYS GmbH

Contents

1   Introduction

If you are building commercial applications based on Python and want to restrict the availability of your own dynamic loadable extension modules to your applications, or you use commercial extension modules that you can't make public available for general use with Python the VendorID package may be of interest for you.

The idea of this software is to restrict the usability of dynamic loadable extension modules to specific Python interpreters. Only specific Python interpreters are allowed to import such extension modules. A generic Python interpreter should fail to import such extensions.

With the term specific Python interpreter, I mean a binary executable that runs a single specific Python application, something you create with tools like freeze or py2exe.

The motivation for the VendorID package was following term in section 10 of Trolltech's Qt commercial license agreement:

...
(vii) Applications may not pass on functionality which in any way makes it
      possible for others to create software with the Licensed Software;
...

Because I want to build commercial applications based on Python, Qt and PyQt, I either have to make the PyQt extension modules builtin into the application specific executables, or I had to find a solution, where the PyQt extension modules can be dynamically loaded at runtime only by the application specific exeutables. Another motivation are extension modules, that we have created in the past and that we don't want to make public available to our customers.

1.1   License

VendorID is licensed under the same terms and conditions as Python itself. See the file LICENSE.txt for more details. VendorID places no restrictions on the license you may apply to interpreters and extension modules that make use of the VendorID functionality.

1.2   How Does it Work

A dynamic loadable extension module that should be restricted, queries the identity of the Python executable, that imports this module. If the identity string of the executable is the same identity string that the module knows, the import succeeds, otherwise the import should fail with an error. To provide the identity string, a specific Python executable (from now on I call this a signed interpreter) provides a builtin module called vendorid. This module has a single PyCObject named _C_API providing a pointer to a function named get_identity. An extension module imports the vendorid module, gets the pointer to the get_identity function and calls it to get the identity string. The module then compares the identity string with it's own identity string. If they are identical, the import succeeds, otherwise it fails.

To make this functionality easy to use, the distutils setup of the package creates a static library containing exactly two public functions and a header file containing the function declarations:

  • void vendorid_init(void) - This is called by signed interpreters just before Py_Initialize gets called. It appends the vendorid module to the list of builtin modules.
  • int vendorid_check(void) - This is called by restricted extension modules just before Py_InitModule gets called. It returns 1 if the identity of the signed interpreter could be verified. It returns 0 if the identity could not be verified, either because the identity string is wrong or anything else failed. This function clears any exception that may have occured before it returns.

Because you normally create signed interpreters only just before delivery time and not early at development time, the distutils setup of the package also creates a dynamic loadable version of the builtin vendorid module. You should never deliver this module to your customers, it is only needed at development time to use restricted modules with your generic Python interpreter.

1.3   Security Details

To make it difficult to circumvent the described restriction functionality by creating a new generic interpreter that contains the necessary code, the identity string is a totally random string that is encrypted with a totally random password before it is returned by the get_identity function. The extension module has to decrypt the identity string, before it can be compared. Because the identity and the password must be stored somewhere in static storage of the interpreter and the module binaries, someone could compare both binaries and search for identical byte sequences. So the identity string and the password are stored encrypted with different passwords in both binaries.

All in all we need six random byte strings:

  1. A password to decrypt the real password in the interpreter binary.
  2. A password to decrypt the real identity in the interpreter binary.
  3. A password to decrypt the real password in the module binary.
  4. A password to decrypt the real identity in the module binary.
  5. The real password.
  6. The real identity.

I do not say that it is impossible to identify the passwords in the binaries and create a new generic interpreter, but it should be difficult enough. If anybody has a better idea, improvement suggestions are highly appreciated. The encryption/decryption algorithm is a synchronous stream cipher algorithm called ARC4. It is a modified variant of the patented RC4 algorithm and should be free of any patents.

2   Installation

2.1   Building the Package

If you build the package for the first time, a password file and the source files are generated by the distutils setup script. Following builds do not recreate the files. The password file containing the six random bytes strings is created by using either /dev/urandom if available or by using the Python's standard random module (this is not cryptographically strong, but should be random enough).

You build the package with:

$ python setup.py build

After the first build, you should find the following generated files:

key/vendorid_secrets The key file containing six random keys.
mod/vendoridmodule.c The code for the dynamic loadable vendorid module.
lib/vendorid.h The include file containing the function declarations of the library code.
lib/vendorid_init.c The code used in a signed interpreter. This goes into the static library.
lib/vendorid_check.c The code used in a restricted module. This goes into the static library.

2.2   Installing the Package

You install the package (as root) with:

# python setup.py install

This installs the extension module vendorid.so | vendorid.pyd in the standard site-packages directory.

The static library libvendorid.a | vendorid.lib is installed into the config subdirectory on UNIX/Linux and into the libs subdirectory on Windows. The header file is installed in Python's standard include subdirectory.

2.3   Distribution Notes

If you want to distribute the VendorID package, do not use the distutils sdist command, because then the key file and the sources are part of the distribution package (fresh created if not already there). Use an ordinary tar or zip command after deleting the mod, lib and key subdirectories. Use the sdist command only to create internal in-house distributions.

Because the source code is generated slightly different for different Python versions, you may need to delete the mod and lib subdirectories, when you build the package for another Python version. Code generated with a lower version should always work with higher versions. Code generated with a higher version will not work with lower versions.

2.4   Macintosh Platforms

Because I don't have access to Macintosh platforms, I need some help from Macintosh users.

3   Using the Package

3.1   Creating Restricted Extension Modules

An extension module foo, that should be restricted includes the vendorid header file and calls vendorid_check() before Py_InitModule():

foomodule.c:

...
#include <Python.h>
...
#include <vendorid.h>
...
PyMOD_INITFUNC
initfoo(void)
{
    PyObject *m;
    ...
    if(!vendorid_check())
    {
        PyErr_SetString(PyExc_RuntimeError,
                        "Module foo is not usable with this Python interpreter");
        return;
    }
    m = PyInitModule("foo", FooMethods);
    ...
}

This module must be linked with the vendorid static library.

NOTE: This does not prevent the Python interpreter to load the module into memory of course, it only prevents the interpreter to initialize the module successfully.

3.2   Creating a Signed Interpreter

A signed interpreter main includes the vendorid header file and calls vendorid_init() before Py_Initialize():

python.c:

...
#include <Python.h>
...
#include <vendorid.h>
...
int main(int argc, char **argv)
{
    ...
    vendorid_init();
    ...
    Py_Initialize();
    ...
}

This interpreter must be linked with the vendorid static library.

If you use one of the tools to create application specific interpreters (freeze, py2exe, ...), you have to modify these tools so that they generate the necessary code and link with the vendorid static library, or you can use SIB, which is part of this package.

3.3   Developing Software Using Restricted Modules

Because the package creates the dynamic loadable extension module vendorid, your generic Python interpreter becomes a signed interpreter. At development time, you can use your normal interpreter and need not worry about creating a signed interpreter. At deployment time, when you have finished your development and testing phase, create an application specific signed interpreter using the vendorid static library.

NOTE: Never distribute the extension module or the static library. This would make the whole functionality senseless.

4   SIB

Part of the VendorID package is SIB. SIB is a simple tool for converting Python applications into a binary executable. Until now it is a single script file called sib.py (this may change in the future). You can find it in the subdirectory sib of the VendorID package.

The name SIB stands for:

There are a number of other tools to convert Python applications into binary executables:

freeze This tool is part of the Python distribution. I think, this tool is no longer maintained. As far as I know, Macintosh platforms are not supported.
py2exe This is a distutils extension from Thomas Heller. It runs only on Windows platforms.
py2app This is a distutils extension from Bob Ippolito. It runs only on Macintosh platforms.
Installer This tool is from Gordon McMillan. It can create executables for UNIX/Linux and Windows platforms. I don't know if it supports Macintosh platforms.

The approach of these tools is to package the application itself and everything that is needed from the Python installation into a single file or a single directory. The advantage is, that the target machines do not need to have Python installed, but there are a number of disadvantages:

SIB's approach is to package only the applications main script or the main script together with the application specific packages and/or modules. The disadvantage is, that you need to have Python installed on the target machines. The advantage is that the application is small, because it doesn't contain any portions of the Python runtime.

4.1   Usage

You call SIB with:

$ python sib.py [options] <application-main-script>

The options are:

Options for all platforms.
-n <name> The name of the executable. The default is based on the name of the main script (basename of the script without extension).
-u Do not create a signed interpreter. Default is to create a signed interpreter (linked with the vendorid static library).
-v Add support for Python's verbose flag. If you use this option, the created interpreter parses the argument list for -v or -vv and sets the verbose flag. The advantage is that you can use this to track the imports if anything goes wrong. The disadvantage is that you cannot use -v or -vv in your application any longer (after parsing the arguments, this flag is removed from the argument list).
-o # Optimize flag. This sets the optimize flag for the created interpreter. This does not affect the bytecode generation, it only affects the setting of the optimize flag of the created executable.
The possible values are:
0 no optimizing (the default).
1 normal optimizing (like python -O).
2 additionally removes doc strings (like python -OO).
If you want to have optimized bytecode generated, use -O or -OO when you call sib:
python -O sib.py ...
python -OO sib.py ...
-p <prefix> Use <prefix> as path prefix for compiled python code. Default is the absolute path of the python source file. Note that this is normally not the path where the executable is installed on the target machines. You should use something like -p "" or -p "application-name:". See the Python documentation of the filename argument of the builtin compile method for the meaning of prefix.
-d <dir> Use <dir> for the generated output files. Default is build_ + the base name of the python main script without extension.
-m <package> <package> is a package or module that should be included into the executable. Use this option for every top level package or module that should be part of the executable. Subpackages of a package are included automatically. Note that only the source files of a package are used (.py) not precompiled files (.pyc or .pyo). The sources are always compiled before they are converted into C code. Modules must contain the extension .py.
-t <dir> Install the application in <dir> when you call make install for the generated Makefile. Default is to install it into the same directory where the python binary is installed.
The following options are only for Windows platforms.
-c Create a console application. Default is to build a Windows application. Be sure, that your application does not output anything to stdout or stderr if you create a Windows application.
-i <file.ico> Use <file.ico> as the applications icon file. This must be a valid ico file. If this option is not used, a builtin default icon is used. The builtin default icons are the original Python icons (py.ico and pycon.ico).

After SIB has run, you should find a number of generated C files and a Makefile in the build directory. The file frozen.c is the main interpreter code, the file __main__.c is the converted main script. All other C files are the converted additional modules or packages. On Windows, there is also a resource file and an icon file.

You can now call make or nmake to create the executable and then make install or nmake install to install the executable. If the executable has problems to find the appropriate Python runtime, you can set the PYTHONHOME environment variable.

4.2   Tested Platforms

I have tested SIB with Python 2.3.4 on the following platforms:

  • Windows 2000 Professional with Visual C++ V 6.0
  • SuSE Linux 8.2 with gcc V 3.3
  • AIX 4.3.3 with VisualAge C++ V 6.0
  • HP-UX B 11.00 with HP aC++ Compiler C.03.50

5   ToDo List


If you find bugs or have any enhancement suggestions, please let me know.

I would also like to hear from people, that use this software.

As said before, I need help from Macintosh users, if the VendorID package should used on this platform.