Table of Contents
Introduction About System Requirements Disclaimer & License Contact Installation Overview Compilation Import in Python Examples Examples for ShortCut Examples for SparseSinkhorn ReferencesIntroduction
About
OTToolbox is a toolbox for numerical optimal transport for Python 3 with numpy and scipy. The front end is written in Python 3 and several functions are provided by shared c++ libraries.
It provides implementations for various algorithms for solving discrete optimal transport problems, via the linear programming formulation due to Kantorovich. It includes a naive dense solver for reference and convenience. The main functionality are implementations of the sparse multi-scale linear program solver described in [Schmitzer 2016a] and the stabilized sparse Sinkhorn algorithm based on entropy regularization, as described in [Schmitzer 2016b]. These two components will be referred to by ShortCut and SparseSinkhorn in the following. References to figures refer to figures in the respective articles. Parts of the ShortCut algorithm have recently been included in the numerical benchmark [Schrieber et al. 2016].
In this early stage there is no extensive documentation available. It is recommended to try to complete the installation and then look at some of the example files.
Note that this is highly experimental software. It does virtually no safety checks on the users input data and will likely crash your Python session if invalid data is provided. See also Disclaimer & License.
System Requirements
Currently the toolbox is in a very early beta stage and no systematic testing for system compatibility has been performed.
So far it has been tested with:
- Ubuntu [12.04, 14.04, 16.04], Python 3.4.3, numpy 1.11.2, scipy 0.15.1, g++ [4.6.3, 4.8.4, 5.4.0]
- OS X 10.9.5, Python 3.4.3, numpy 1.9.0, scipy 0.14.0
If you are new to Python we recommend the Anaconda distribution (or Miniconda) which provides a convenient way to set up a working Python environment. Moreover, while the toolbox can be operated from standard Python scripts, we highly recommend using jupyter for interactive testing and development.
In addition, ShortCut requires an external linear program solver. Currently, interfaces are provided for the following solvers:
- ILOG CPLEX Optimization Studio (tested with versions 12.5 and 12.6.1). Online documentation. For academic researchers CPLEX is available under the IBM Academic Initiative.
- LEMON Graph Library 1.3.1
Disclaimer & License
This toolbox is prototypical software, developed as part of ongoing research on efficient algorithms and should be considered experimental. It is distributed such that other researchers can test and reproduce published results as well as extend it.It comes without any guarantees or warranty. Use at your own risk.
This toolbox is released under the `Modified BSD License'. The full license text is given below.
Copyright (c) 2016, Bernhard Schmitzer
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
(3) The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Contact
The toolbox is developed by Bernhard Schmitzer, currently a post-doctoral reasearcher at the University of Münster, (website). Feedback on how to make the toolbox more user-friendly, about bugs, successful installation on a new system configuration or requests for additional features is much appreciated.
Installation
Overview
Installation consists of two steps: compiling the c++ libraries and importing the front end modules into Python. Currently both steps require some manual handling by the user.
Compilation
Installing ShortCut requires steps 1-3, installing SparseSinkhorn requires steps 2 & 4.- Preparation for ShortCut: If you want to compile the ShortCut component with some of the external libraries, you need to configure the build script first. To do so, open the file Solvers/compile_constants.sh with a text editor. Find the sections for the CPLEX or Lemon specific settings and adjust the location to the libraries on your system. At least one must be provided to use the toolbox. Be sure to remove the comments in front of the lines that specify the include and linker information!
- Compile Common Components: ShortCut and SparseSinkhorn share some components. For compilation execute the script Solvers/compile_common.sh from a terminal. Check for possible compiler and linker errors.
- Compile ShortCut: First execute the script Solvers/compile_shortcut.sh from a terminal. Then, depending on which LP solver you want to use, run the corresponding compilation script: Solvers/compile_cplex.sh for CPLEX and Solvers/compile_lemon.sh for LEMON. Again, carefully check the output for compiler and linker errors.
- Compile SparseSinkhorn: Execute the script Solvers/compile_sinkhorn.sh from a terminal.
Import in Python
After compilation it is important that Python knows where to look for the toolbox. This is achieved by the following commands (run within Python), where you must replace XXXX by the appropriate location of the toolbox on your system. The directory /XXXX/OTToolbox_v0.1.2/ should contain the file CClasses.py.
import sys
sys.path.append("/XXXX/OTToolbox_v0.1.2/")
Note: In the example files, this is already done via relative paths, see the file Examples/lib/header_rootdir.py for details.
After this, the various modules of the toolbox can be imported in Python for example via:
# dense linear program solvers, only available if compiled with corresponding external libraries
import Solvers.OT_CPLEX as OT_CPLEX
import Solvers.OT_Lemon as OT_Lemon
# basic convenience functions, hierarchical partition handling
import OTTools
import OTTools.HierarchicalPartition as HierarchicalPartition
import Solvers.CostFunctionComputation as SolverCFC
# ShortCut, only available if compiled with corresponding external libraries
import Solvers.ShortCutSolver as ShortCutSolver
import Solvers.ShortCutSolver_CPLEX as ShortCutSolver_CPLEX
import Solvers.ShortCutSolver_Lemon as ShortCutSolver_Lemon
# SparseSinkhorn
import Solvers.Sinkhorn as Sinkhorn
Examples
In the directory Examples/ some example files are provided. Most of them are IPython / jupyter notebooks (see Requirements), which allow for a more interactive demonstration with embedded plots. All examples contain some explaining comments with the code.Examples for ShortCut
These examples require that the ShortCut Code with either CPLEX or Lemon libraries is compiled.Filename | Description |
Basic LP Solver | Dense-01-CPLEX.py | Very simple test file which solves a dense transport problem with CPLEX. Can be used to verify whether installation worked properly. |
Dense-01-Lemon.py | Very simple test file which solves a dense transport problem with LEMON. Can be used to verify whether installation worked properly. Note: The LEMON solver is a bit more elaborate to set up than the CPLEX solver, since marginals and cost function must be truncated to a discrete grid. Therefore most further examples use the CPLEX solver. Should CPLEX not be available to you, this script explains how to set up problems with LEMON and how to adapt other scripts if required. For the set-up of the ShortCut-Solver with Lemon see also Example-ShortCutSolver-01-Basic-Lemon.ipynb. |
ShortCut Solver | |
ShortCut-01-Basic-CPLEX.ipynb | Basic demonstration of the ShortCut-Solver on a small test problem, comparison with the naive dense solver, using dense data structures. Uses CPLEX as internal LP solver. |
ShortCut-01-Basic-Lemon.ipynb | Basic demonstration of the ShortCut-Solver on a small test problem, comparison with the naive dense solver, using dense data structures. Uses Lemon as internal LP solver. |
ShortCut-02-SparseDataStructures.ipynb | Solving a larger test problem with the ShortCut-Solver on sparse data structures. Comparison to the dense solver is no longer possible. |
ShortCut-03-PaddingTest.ipynb | The construction of shielding neighourboods is compared with the padding method proposed in [Oberman & Ruan 2015]. It is shown on an extreme test problem that the padding method will not always provide a globally optimal result. |
Examples for SparseSinkhorn
These examples require that the SparseSinkhorn Code is compiled. The SparseSinkhorn solver is more complicated to set up, as it requires much more parameters to be set. This is due to the wide variety in modelling aspects that need to be chosen (e.g. standard transport, unbalanced transport, barycenter or gradient flows with various parameters), and computational choices (e.g. data structure for kernel [dense or truncated], epsilon-scaling, convergence criteria...). Also the code is currently kept extremely modular, to simplify adjustments and development. Therefore, the introduction sequence is somewhat lengthy.
For these reasons the examples are run in a different way. There are two `execution notebooks', SparseSinkhorn-01-OptimalTransport.ipynb and SparseSinkhorn-02-Barycenter.ipynb which first load configuration parameters from separate files and then solve the specified problem. For the first, there is also a script version, SparseSinkhorn-01-OptimalTransport.py. The script only solves the problem and provides verbose output on the multi-scale scheme, log-stabilization and runtime. It terminates and discards any results. The notebooks provide some elementary post-processing for visualizing the results. How to select the configuration files is explained in the jupyter notebooks and the header section of the py script. We will now briefly discuss the prepared configuration files.
Configuration File Tag | Description |
(Un-)balanced transport: run with SparseSinkhorn-01-OptimalTransport.ipynb or SparseSinkhorn-01-OptimalTransport.py | |
cfg/Sinkhorn/CompareEnhancements/1 | Example for comparison of enhancements in Fig. 2. This uses the standard dense Sinkhorn algorithm, only with log-stabilization, at epsilon=0.1. Will take many iterations and a lot of time. We recommend only running this with the script. |
cfg/Sinkhorn/CompareEnhancements/2 | Example for comparison of enhancements in Fig. 2. This uses log-stabilization and epsilon-scaling for final value epsilon=0.1. We recommend only running this with the script. |
cfg/Sinkhorn/CompareEnhancements/3 | Example for comparison of enhancements in Fig. 2. This uses log-stabilization, epsilon-scaling and truncated kernels for final value epsilon=0.1. We recommend only running this with the script. |
cfg/Sinkhorn/CompareEnhancements/4 | Example for comparison of enhancements in Fig. 2. This uses the fully enhanced algorithm with log-stabilization, epsilon-scaling, truncated kernels and the coarse-to-fine scheme for final value epsilon=0.1. We recommend only running this with the script. |
cfg/Sinkhorn/ImageBenchmark/OT_256 | Example for Fig. 3 for 256x256 grids. Uses standard optimal transport. |
cfg/Sinkhorn/ImageBenchmark/WF_256 | Similar, but uses Wasserstein-Fisher-Rao distance, not explicitly shown in paper. |
cfg/Sinkhorn/Gaussians/OT_128_000-001 | Computes standard optimal transport between two pairs of Gaussians with different mass. We recommend trying the post-processing with displacement interpolation in the jupyter notebook. It shows how one Gaussian must send mass to the other to balance the overall distribution. |
cfg/Sinkhorn/Gaussians/WF_128_000-001 | Computes Wasserstein-Fisher-Rao transport between two pairs of Gaussians with different mass. We recommend trying the post-processing with displacement interpolation in the jupyter notebook. Here the mass can be balanced by creation / annihilation, leading to a more natural interpolation. |
(Un-)balanced barycenters: run with SparseSinkhorn-02-Barycenter.ipynb | |
cfg/Sinkhorn/Barycenter/OT_simple_1-2-1 | Computes a Wasserstein barycenter, as shown in Fig. 6. |
cfg/Sinkhorn/Barycenter/WF_three_1-2-1 | Computes a Wasserstein-Fisher-Rao barycenter, as shown in Fig. 7. |
References
[Oberman & Ruan 2015] An efficient linear programming method for Optimal Transportation, http://arxiv.org/abs/1509.03668
:[Schmitzer 2016a] A Sparse Multi-Scale Algorithm for Dense Optimal Transport, Journal of Mathematical Imaging and Vision, 56 (2): 238-259 (2016), http://arxiv.org/abs/1510.05466
:[Schmitzer 2016b] Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems, https://arxiv.org/abs/1610.06519
:[Schrieber et al. 2016] DOTmark - A Benchmark for Discrete Optimal Transport, https://arxiv.org/abs/1610.03368
: