Reproducible and collaborative research
Wed 29 May 2019 by Luis Velilla PrietoNotes from the CodeRefinery workshop in Gothenburg which covered:
- version control (github)
- licenses
- workflow management tools (snakemake)
- documentation (read the docs)
- automated testing (pytest, travis ci)
scripting BIFROST
Observing with the Onsala 20m using scripting and BIFROST
- The following (abbreviated) script was discussed:
# Define/import global settings
SETUP POINT loops=2 azdist=21 eldist=21 ontime=20 caltime=10 calwhen=3 obsmode=ssw dop=eachcal adjpwr=first rawdata=skip addpol=save backend=SPEa baseline=nobaselines startpos=last apply …
The Python C-API
Extending the Python Interpreter with C/C++
The Python interpreter can be extended via C (or C++) code to
- make existing python code execute faster by replacing critical sections into compiled code
- interface to existing/legacy C (or C++) libraries
- make C structures (C++ classes) first class objects within python …
Observation planning with gildas
Context/Motivation
- Create visibility plots to plan your observations
- Obtain good quality plots to be enclosed in your proposals
Known tools:
- ASTRO from the GILDAS package
- ASTROPLAN from ASTROPY (not covered here)
The best starting points for GILDAS tutorials are
read moreSome features of numpy arrays
- check out the numpy website for docs and the like
- feel free to download and play around with the jupyter notebook on numpy
Installation
- assuming you have 'pip' installed run the following
pip install numpy
Selection of some useful bash tools
- Sometimes it is quicker to write a short bash script to 'get the job done' instead of doing it in python or something else
- bash comes with a load of very useful little programmes that allow you to modify files on the fly
- if you 'pipe' things together (with the …
pandas
pandas: powerful Python data analysis toolkit
Useful links:
A tip: Whenever you are using pandas
in your own code, try and stick to the following convention:
import pandas as pd
This will make it easier for other people to understand your …
read moreOCR and digitization of plots
Optical character recognition
The tool of choice on Linux and MacOS is tesseract, an open source tool originally developed by HP.
Installation
On Linux (with apt package manager)
sudo apt install imagemagick
sudo apt install tesseract-ocr
and then install individual language packages
apt-cache search --names-only tesseract-ocr
sudo apt install tesseract-ocr-XXX …
astropy - the python astronomy package
- get it from here
- there are a number of great examples to get you started (the examples I showed in the seminar came from here)
- and further tutorials
- also check out the astropy-page on the WCS
What it can be used for (biased, certainly not complete list)
- reading+writing standard …
Writing R extensions
Overview
- R makes it easy to use legacy C and Fortran code
- comes with beautiful graphics and powerful statistical tools
-
10000 packages on CRAN
- packages follow high standards for documentation
- think of R as python with built-in matplotlib and pandas
Take this simple Fortran subroutine:
subroutine facto(n, answer)
C …
Emacs
Emacs
A short introduction to emacs
given as part of the tech talk series.
read more"Emacs outshines all other editing software in approximately the same way that the noonday sun does the stars. It is not just bigger and brighter; it simply makes everything else vanish." -Neal Stephenson, "In the Beginning …
Coding in python in a Jupyter Notebook
- I made some slides showing how to get started
- check out the jupyter website for docs and the like
- for further info on the jupyter extensions check out their github page
- feel free to download and play around with the notebook we worked on; (tarball contains both the notebook and …
Anaconda
The Anaconda distribution for Python
-
The slides are in this anaconda.pdf
-
Anaconda is a distribution for python which
- comes with pre-built and pre-configured collection of packages
- package manager (
conda
) - version management
- ... and more
-
Freemium software from Continuum Analytics
- Works basically the same on Win, Mac and Linux
- Self-contained, install …
Emacs org-mode
Org mode
Here is the document I presented, with some comments and keyboard
shortcuts added: (S stands for Shift, C stands for Ctrl, M stands for
meta (either Esc or Alt). For example M-g
would mean, press the Esc
key first then press the g key, alternatively press Alt and …
Containers
What is a container?
- It’s a file system image (eg. tar archive) that gets used instead of the local file system
- An application to set things up (eg. singularity, docker)
- When you run an application via a container that application is isolated from the rest of the OS
- Made …
Rsync and Cron jobs
- The slides I showed are in this rsync-cronjobs.pdf
- in a nutshell
rsync
is a fancy way to copy data - instead of copying everything it will syncronise source and sinc by comparing time tags and/or sizes of files
- can be used like
scp
to sync different machines in a …
Gimp -- The GNU Image Manipulation program
What is it?
Wikipedia
(GNU Image Manipulation Program) is a free and open-source raster graphics editor used for image retouching and editing, free-form drawing, converting between different image formats, and more specialized tasks.
Gimp - what I …
read moreInkscape -- A powerful, free design tool
- Also check out the notes on gimp
What is it?
Wikipedia
read moreInkscape is a free and open-source vector graphics editor; it can be used to create or edit vector graphics such as illustrations, diagrams, line arts, charts, logos and complex paintings. Inkscape's primary vector graphics format is Scalable Vector Graphics …
Virtualization -- About virtual machines and containers
What is this about
The problems we are trying to solve:
- run software not supported by our operating system
- run software in well defined environments
Assume that you are asked to deploy a web application which stores and retrieves data from an SQL database engine. For security, reliability and scalability …
read moreTmux - a terminal multiplexer
- The slides are in this tmux.pdf
- if you know
screen
:tmux
is a similar thing but I like it a lot more tmux
vsscreen
cheat sheet
What is it?
Wikipedia
read more
tmux
is a software application that can be used to multiplex several virtual consoles, allowing a user to …
Git version control system
The Abomination
foo.txt
foo1.txt
foo1_john.txt
foo2_jane_feb12.txt
foo2_jane_feb12_with_additional_comments_by_john.txt
Please. don't. do. this!
History
The competition
rcs
revision control systemcvs
concurrent versions systemsvn
subversionhg
mercurial
... and the winner is:
git
developed in 2005 by Linus Torvalds, used for the development of the Linux kernel …
Lyx
Lyx - document editor based on latex
Overview
- encourages writing based on the idea of what-you-see-is-what-you-mean WYSIWYM
- find it at Lyx
- the Documentation/Wiki is great
- allows you to write a LaTeX document without ever having to remember latex-commands
Features
- import of LaTeX files
- export of Lyx-files to LaTeX
- vast amount …
Overleaf
Overleaf - collaborative Writing and Publishing
Overview
- a merge between former
WriteLaTeX
andShareLaTeX
- find it at Overleaf
- watch the introductory movie and follow the tutorial, by clicking on the ``?'' in the toolbar.
Features
- platform for collaboration on LaTeX documents
- comfortable editor with template completion
- version control with powerful comparison feature …
ssh-agent
ssh-agent
- The slides are in this ssh-agent.pdf
What is it?
- A separate program that loads your keys and passes them to
ssh
ssh
will see ifssh-agent
is running and if so will ask it for your keys
What's the point?
- To decrypt your keys temporarily
- Stores them in …
Using ssh and friends
Using ssh and friends
- The slides are in this ssh.pdf
# we spoke about the following tools:
ssh # utility to access remote machine
ssh-keygen # used to generate key-pairs (to access machines without a password)
ssh-copy-id # to copy the public key to the remote machine
scp # 'secure copy', to copy files …