The Python C-API

Wed 03 April 2019 by Michael Olberg

Extending the Python Interpreter with C/C++

The Python interpreter can be extended via C (or C++) code to

  1. make existing python code execute faster by replacing critical sections into compiled code
  2. interface to existing/legacy C (or C++) libraries
  3. make C structures (C++ classes) first class objects within python

In my work I was guided by the following documentation (Python 3.0): extending and embedding the Python interpreter

In my demonstration I'll be using an interface I wrote to the CLASSIC data format (used by CLASS/GILDAS), please ask me for the source code if you are interested.

To start with I wrote some C++ classes which implement a reader class for the two types of CLASSIC files, V1 or V2. My C++ code allows to open a file, determine the file type and return a reader which will allow to retrieve the header, data vector or frequency vector for a given id, which ranges from 1 to the number returned by the getDirectory method. Example below is for the V2 reader class:

/**
 * A class to read a CLASSIC file of type 2.
 *
 */
class Type2Reader : public ClassReader {

 public:
    Type2Reader(const char *);
    ~Type2Reader();

    int getDirectory();
    SpectrumHeader getHead(int scan);
    std::vector<double> getFreq(int scan);
    std::vector<double> getData(int scan);

 private:
    void getFileDescriptor();
    void getEntry(int k);

    FileDescriptor2 fdesc;
    Type2Entry centry;
    ClassSection2 csect;
    long int ext[MAXEXT];
};

In my Python code I wanted to be able to use code like the following:

import sys
import pandas as pd
import numpy as np

import classic  # my python module!

print(classic.version())

if len(sys.argv) == 1:
    print("usage: %s <filename>" % (sys.argv[0]))
    sys.exit(1)

classfile = sys.argv[1]
foo = classic.Reader(classfile)

# get number of spectra in file
nscans = foo.getDirectory()
print("number of spectra = %d (%d)" % (nscans, foo.count))

headers = []
for i in range(nscans):
    headers.append(foo.getHead(i+1))

df = pd.DataFrame(headers)

iscan = 1
if len(sys.argv) > 2:
    iscan = int(sys.argv[2])

freq = foo.getFreq(iscan)
data = foo.getData(iscan)
nchan = len(freq)
for i in range(nchan):
    print("%10.5f %10.5f" % (freq[i], data[i]))

All code is available at this git repository.


OCR and digitization of plots

Wed 28 November 2018 by Michael Olberg

Optical character recognition

The tool of choice on Linux and MacOS is tesseract, an open source tool originally developed by HP.

Installation

On Linux (with apt package manager)

    sudo apt install imagemagick
    sudo apt install tesseract-ocr

and then install individual language packages

    apt-cache search --names-only tesseract-ocr
    sudo apt install tesseract-ocr-XXX …
read more

Writing R extensions

Mon 16 April 2018 by Michael Olberg

Overview

  • R makes it easy to use legacy C and Fortran code
  • comes with beautiful graphics and powerful statistical tools
  • 10000 packages on CRAN

  • packages follow high standards for documentation
  • think of R as python with built-in matplotlib and pandas

Take this simple Fortran subroutine:

      subroutine facto(n, answer)
C …
read more

Emacs org-mode

Thu 01 March 2018 by Michael Olberg

Org mode

Here is the document I presented, with some comments and keyboard shortcuts added: (S stands for Shift, C stands for Ctrl, M stands for meta (either Esc or Alt). For example M-g would mean, press the Esc key first then press the g key, alternatively press Alt and …

read more

Virtualization -- About virtual machines and containers

Mon 29 January 2018 by Michael Olberg

What is this about

The problems we are trying to solve:

  • run software not supported by our operating system
  • run software in well defined environments

Assume that you are asked to deploy a web application which stores and retrieves data from an SQL database engine. For security, reliability and scalability …

read more

Git version control system

Mon 11 December 2017 by Michael Olberg

The Abomination

foo.txt
foo1.txt
foo1_john.txt
foo2_jane_feb12.txt
foo2_jane_feb12_with_additional_comments_by_john.txt

Please. don't. do. this!

History

The competition

  • rcs revision control system
  • cvs concurrent versions system
  • svn subversion
  • hg mercurial

... and the winner is:

  • git developed in 2005 by Linus Torvalds, used for the development of the Linux kernel …
read more

Overleaf

Mon 04 December 2017 by Michael Olberg

Overleaf - collaborative Writing and Publishing

Overview

  • a merge between former WriteLaTeX and ShareLaTeX
  • find it at Overleaf
  • watch the introductory movie and follow the tutorial, by clicking on the ``?'' in the toolbar.

Features

  • platform for collaboration on LaTeX documents
  • comfortable editor with template completion
  • version control with powerful comparison feature …
read more