OTToolbox_v0.1.2 : Documentation

Table of Contents

Introduction About System Requirements Disclaimer & License Contact Installation Overview Compilation Import in Python Examples Examples for ShortCut Examples for SparseSinkhorn References

Introduction

About

OTToolbox is a toolbox for numerical optimal transport for Python 3 with numpy and scipy. The front end is written in Python 3 and several functions are provided by shared c++ libraries.

It provides implementations for various algorithms for solving discrete optimal transport problems, via the linear programming formulation due to Kantorovich. It includes a naive dense solver for reference and convenience. The main functionality are implementations of the sparse multi-scale linear program solver described in [Schmitzer 2016a] and the stabilized sparse Sinkhorn algorithm based on entropy regularization, as described in [Schmitzer 2016b]. These two components will be referred to by ShortCut and SparseSinkhorn in the following. References to figures refer to figures in the respective articles. Parts of the ShortCut algorithm have recently been included in the numerical benchmark [Schrieber et al. 2016].

In this early stage there is no extensive documentation available. It is recommended to try to complete the installation and then look at some of the example files.

Note that this is highly experimental software. It does virtually no safety checks on the users input data and will likely crash your Python session if invalid data is provided. See also Disclaimer & License.

System Requirements

Currently the toolbox is in a very early beta stage and no systematic testing for system compatibility has been performed.
So far it has been tested with:

Due to the way communication between Python and C++ is handled, it is likely that the toolbox will only work on 64bit operating systems.
If you are new to Python we recommend the Anaconda distribution (or Miniconda) which provides a convenient way to set up a working Python environment. Moreover, while the toolbox can be operated from standard Python scripts, we highly recommend using jupyter for interactive testing and development.

In addition, ShortCut requires an external linear program solver. Currently, interfaces are provided for the following solvers:

These external libraries cannot be directly provided with the toolbox. The user is kindly asked to obtain either of the two by themselves.

Disclaimer & License

This toolbox is prototypical software, developed as part of ongoing research on efficient algorithms and should be considered experimental. It is distributed such that other researchers can test and reproduce published results as well as extend it.
It comes without any guarantees or warranty. Use at your own risk.

This toolbox is released under the `Modified BSD License'. The full license text is given below.

Copyright (c) 2016, Bernhard Schmitzer
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

(3) The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Contact

The toolbox is developed by Bernhard Schmitzer, currently a post-doctoral reasearcher at the University of Münster, (website). Feedback on how to make the toolbox more user-friendly, about bugs, successful installation on a new system configuration or requests for additional features is much appreciated.

Installation

Overview

Installation consists of two steps: compiling the c++ libraries and importing the front end modules into Python. Currently both steps require some manual handling by the user.

Compilation

Installing ShortCut requires steps 1-3, installing SparseSinkhorn requires steps 2 & 4.
  1. Preparation for ShortCut: If you want to compile the ShortCut component with some of the external libraries, you need to configure the build script first. To do so, open the file Solvers/compile_constants.sh with a text editor. Find the sections for the CPLEX or Lemon specific settings and adjust the location to the libraries on your system. At least one must be provided to use the toolbox. Be sure to remove the comments in front of the lines that specify the include and linker information!
  2. Compile Common Components: ShortCut and SparseSinkhorn share some components. For compilation execute the script Solvers/compile_common.sh from a terminal. Check for possible compiler and linker errors.
  3. Compile ShortCut: First execute the script Solvers/compile_shortcut.sh from a terminal. Then, depending on which LP solver you want to use, run the corresponding compilation script: Solvers/compile_cplex.sh for CPLEX and Solvers/compile_lemon.sh for LEMON. Again, carefully check the output for compiler and linker errors.
  4. Compile SparseSinkhorn: Execute the script Solvers/compile_sinkhorn.sh from a terminal.

Import in Python

After compilation it is important that Python knows where to look for the toolbox. This is achieved by the following commands (run within Python), where you must replace XXXX by the appropriate location of the toolbox on your system. The directory /XXXX/OTToolbox_v0.1.2/ should contain the file CClasses.py.

import sys
sys.path.append("/XXXX/OTToolbox_v0.1.2/")

Note: In the example files, this is already done via relative paths, see the file Examples/lib/header_rootdir.py for details.

After this, the various modules of the toolbox can be imported in Python for example via:

# dense linear program solvers, only available if compiled with corresponding external libraries
import Solvers.OT_CPLEX as OT_CPLEX
import Solvers.OT_Lemon as OT_Lemon

# basic convenience functions, hierarchical partition handling
import OTTools
import OTTools.HierarchicalPartition as HierarchicalPartition
import Solvers.CostFunctionComputation as SolverCFC

# ShortCut, only available if compiled with corresponding external libraries
import Solvers.ShortCutSolver as ShortCutSolver
import Solvers.ShortCutSolver_CPLEX as ShortCutSolver_CPLEX
import Solvers.ShortCutSolver_Lemon as ShortCutSolver_Lemon

# SparseSinkhorn
import Solvers.Sinkhorn as Sinkhorn

We refer to the section on Examples to find out what the various modules do.

Examples

In the directory Examples/ some example files are provided. Most of them are IPython / jupyter notebooks (see Requirements), which allow for a more interactive demonstration with embedded plots. All examples contain some explaining comments with the code.

Examples for ShortCut

These examples require that the ShortCut Code with either CPLEX or Lemon libraries is compiled.
FilenameDescription
Basic LP Solver
Dense-01-CPLEX.pyVery simple test file which solves a dense transport problem with CPLEX. Can be used to verify whether installation worked properly.
Dense-01-Lemon.py

Very simple test file which solves a dense transport problem with LEMON. Can be used to verify whether installation worked properly.

Note: The LEMON solver is a bit more elaborate to set up than the CPLEX solver, since marginals and cost function must be truncated to a discrete grid. Therefore most further examples use the CPLEX solver. Should CPLEX not be available to you, this script explains how to set up problems with LEMON and how to adapt other scripts if required. For the set-up of the ShortCut-Solver with Lemon see also Example-ShortCutSolver-01-Basic-Lemon.ipynb.

ShortCut Solver
ShortCut-01-Basic-CPLEX.ipynbBasic demonstration of the ShortCut-Solver on a small test problem, comparison with the naive dense solver, using dense data structures. Uses CPLEX as internal LP solver.
ShortCut-01-Basic-Lemon.ipynbBasic demonstration of the ShortCut-Solver on a small test problem, comparison with the naive dense solver, using dense data structures. Uses Lemon as internal LP solver.
ShortCut-02-SparseDataStructures.ipynbSolving a larger test problem with the ShortCut-Solver on sparse data structures. Comparison to the dense solver is no longer possible.
ShortCut-03-PaddingTest.ipynbThe construction of shielding neighourboods is compared with the padding method proposed in [Oberman & Ruan 2015]. It is shown on an extreme test problem that the padding method will not always provide a globally optimal result.

Examples for SparseSinkhorn

These examples require that the SparseSinkhorn Code is compiled. The SparseSinkhorn solver is more complicated to set up, as it requires much more parameters to be set. This is due to the wide variety in modelling aspects that need to be chosen (e.g. standard transport, unbalanced transport, barycenter or gradient flows with various parameters), and computational choices (e.g. data structure for kernel [dense or truncated], epsilon-scaling, convergence criteria...). Also the code is currently kept extremely modular, to simplify adjustments and development. Therefore, the introduction sequence is somewhat lengthy.

For these reasons the examples are run in a different way. There are two `execution notebooks', SparseSinkhorn-01-OptimalTransport.ipynb and SparseSinkhorn-02-Barycenter.ipynb which first load configuration parameters from separate files and then solve the specified problem. For the first, there is also a script version, SparseSinkhorn-01-OptimalTransport.py. The script only solves the problem and provides verbose output on the multi-scale scheme, log-stabilization and runtime. It terminates and discards any results. The notebooks provide some elementary post-processing for visualizing the results. How to select the configuration files is explained in the jupyter notebooks and the header section of the py script. We will now briefly discuss the prepared configuration files.

Configuration File TagDescription
(Un-)balanced transport: run with SparseSinkhorn-01-OptimalTransport.ipynb or SparseSinkhorn-01-OptimalTransport.py
cfg/Sinkhorn/CompareEnhancements/1 Example for comparison of enhancements in Fig. 2. This uses the standard dense Sinkhorn algorithm, only with log-stabilization, at epsilon=0.1. Will take many iterations and a lot of time. We recommend only running this with the script.
cfg/Sinkhorn/CompareEnhancements/2 Example for comparison of enhancements in Fig. 2. This uses log-stabilization and epsilon-scaling for final value epsilon=0.1. We recommend only running this with the script.
cfg/Sinkhorn/CompareEnhancements/3 Example for comparison of enhancements in Fig. 2. This uses log-stabilization, epsilon-scaling and truncated kernels for final value epsilon=0.1. We recommend only running this with the script.
cfg/Sinkhorn/CompareEnhancements/4 Example for comparison of enhancements in Fig. 2. This uses the fully enhanced algorithm with log-stabilization, epsilon-scaling, truncated kernels and the coarse-to-fine scheme for final value epsilon=0.1. We recommend only running this with the script.
cfg/Sinkhorn/ImageBenchmark/OT_256 Example for Fig. 3 for 256x256 grids. Uses standard optimal transport.
cfg/Sinkhorn/ImageBenchmark/WF_256 Similar, but uses Wasserstein-Fisher-Rao distance, not explicitly shown in paper.
cfg/Sinkhorn/Gaussians/OT_128_000-001 Computes standard optimal transport between two pairs of Gaussians with different mass. We recommend trying the post-processing with displacement interpolation in the jupyter notebook. It shows how one Gaussian must send mass to the other to balance the overall distribution.
cfg/Sinkhorn/Gaussians/WF_128_000-001 Computes Wasserstein-Fisher-Rao transport between two pairs of Gaussians with different mass. We recommend trying the post-processing with displacement interpolation in the jupyter notebook. Here the mass can be balanced by creation / annihilation, leading to a more natural interpolation.
(Un-)balanced barycenters: run with SparseSinkhorn-02-Barycenter.ipynb
cfg/Sinkhorn/Barycenter/OT_simple_1-2-1 Computes a Wasserstein barycenter, as shown in Fig. 6.
cfg/Sinkhorn/Barycenter/WF_three_1-2-1 Computes a Wasserstein-Fisher-Rao barycenter, as shown in Fig. 7.

References

[Oberman & Ruan 2015] A. Oberman and Y. Ruan: An efficient linear programming method for Optimal Transportation, http://arxiv.org/abs/1509.03668

[Schmitzer 2016a] B. Schmitzer: A Sparse Multi-Scale Algorithm for Dense Optimal Transport, Journal of Mathematical Imaging and Vision, 56 (2): 238-259 (2016), http://arxiv.org/abs/1510.05466

[Schmitzer 2016b] B. Schmitzer: Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems, https://arxiv.org/abs/1610.06519

[Schrieber et al. 2016] J. Schrieber, D. Schuhmacher and C. Gottschlich: DOTmark - A Benchmark for Discrete Optimal Transport, https://arxiv.org/abs/1610.03368