Posts tagged "technote"

Docker

Docs | QuickRef | Cheatsheet

Use docker help, man dockerfile of man docker-<command> (e.g. man docker-run).

Concepts:

  • Major components:
    • A server/daemon which
      • manages docker objects, such as images, containers, network and data volumes
      • has a REST API
      • and a CLI (which uses the REST API)
    • A client which is the primary way to interact with Docker.
    • A registry (e.g. Docker Hub) that stores images
  • App hierarchy:
    • Stack
    • Service: a container from a docker-compose file
    • Container:
      • When stopped: a runnable instance of an image
      • When started: a running image (i.e. image + state)
      • may be connected to a network and/or storage
      • can be stored as an image
    • Image:
      • A read-only template with instructions for creating a Docker container
      • an executable package
      • may be based on another image
  • Files:
    • A Dockerfile defines an image
    • A docker-compose.yml defines one or more containers (a.k.a. services) that work together
  • Tips:
    • You start a container (after it has been stopped) and run an image (which creates and starts a container).
    • ​Docker services can address each other through their container names as host name.

Basic docker commands

command (docker …) effect
build -t friendlyhello . Create image using this directory's Dockerfile
run -p 4000:80 friendlyhello Run "friendlyname" mapping port 4000 to 80
run -d -p 4000:80 friendlyhello Same thing, but in detached mode
container ls List all running containers
container ls -a List all containers, even those not running
container stop <hash> Gracefully stop the specified container
container kill <hash> Force shutdown of the specified container
container rm <hash> Remove specified container from this machine
container rm $(docker container ls -a -q) Remove all containers
image ls -a List all images on this machine
image rm <image id> Remove specified image from this machine
image rm $(docker image ls -a -q) Remove all images from this machine
login Log in this CLI session using your Docker credentials
tag <image> username/repository:tag Tag <image> for upload to registry
push username/repository:tag Upload tagged image to registry
run username/repository:tag Run image from a registry

(src)

More advanced docker commands

command (docker …) effect
volume rm $(docker volume ls) Remove all named volumes
exec -i -t container_name /bin/bash Open a terminal
system prune WARNING: use with caution

docker-compose

command (docker-compose ...) effect
exec <container_name> bash run command bash
-f <file> up start from a custom file
down --volumes also remove volumes attached to the container

Org-mode cheat sheet

Manual | RefCard | Org4Beginners | Glossary | Cookbook | 5 useful features

Markup: bold, italic, underlined, strikethrough, , verbatim, code

  • list
    • other list
      • Numbered list

Links1: http://otech.nl, OTech, otech.jpg

Cycle to do items with S-LEFT and S-RIGHT

Table and keys

key context effect
M-RET   New headline
TAB / S-TAB   Fold / Unfold
M-RIGHT / M-LEFT   Promote / Demote
  table Move column
M-UP / M-DOWN table Move row
M-S-DOWN table Insert row (?)
C-c RET table Insert horizontal line
C-c ^ table Sort lines
S-RIGHT / S-LEFT task Cycle workflow
  list Cycle bullet type
S-UP / S-DOWN   Cycle priority
C-c C-e   Export menu
C-c a   Agenda
C-c C-c heading edit tags
  on top refresh local setup
C-c ' code block edit in native mode
C-c ;   Toggle COMMENT of subtree

Literal examples

Some example from a text file.

Also available: VERSE, QUOTE and CENTER

Source code blocks

(defun org-xor (a b)
  "Exclusive or."
  (if a (not b) b))

Use Ditaa for figures.

Footnotes:

1
Show markup by removing the last (hidden) symbol of the link

Blockchain

Key concepts

  • Chain of immutable blocks (though hashes)
  • Public, distributed replication (through peer to peer - p2p)
  • Decentralized, trustless consensus (by collective self-interest)
  • Forks are resolved through scoring (resulting in orphan blocks)

Web scraping with Scrapy

You will need three components for web scraping:

  1. a tool to GET files from the web,
  2. a tool to figure out how to process these files, and
  3. a tool to do the actual processing.

The one in the middle is where we, the humans, come in. The Chrome developer tools (or whatever they are called in your browser of choice) are our friends here. We can use the `Elements` tab to figure out the structure of a page and the identifiers we need to navigate that structure:

Chrome Developer Tools

You can use Beautiful Soup to automate the processing, combined with requests or the default urllib to do the file transfers. But Scrapy provides an all-in-one package.

Scrapy

There are two ways to use Scrapy, the hard way and the quick-'n-dirty way:

  • For the hard way you let Scrapy generate a project for you with the startproject command. This will give you all the bells and whistles you need for extensive web-scraping, including items, middleware and pipelines. This will allow you to write a whole nest of related (or unrelated) spiders and deploy them to the cloud. But for most use cases this is far more than is needed, and the easy way is sufficient.
  • The easy way is to simply write a spider and run it with the runspider command. This will give you just one, simple spider, but I find that in most cases this is sufficient.

The scrapy shell command allows you the research the page and experiment with selectors.

The scrapy view command (both in the Scrapy shell and from the command line as a parameter to the scrapy command) opens a page in your browser, as seen by Scrapy. This prevents differences between how your browser GETs a page and how Scrape sees it.

Data Science with Pandas

Home | Cookbook | Handbook

Data science is learning from data in order to gain useful predictions and insights and consists of the steps below1:

  1. Ask an interesting question:
    1. What is the scientific goal?
    2. What would you do if you had all the data?
    3. What do you want to predict or estimate?
  2. GET the data:
    1. How were the data sampled?
    2. Which data are relevant?
    3. Are there privacy or copyright issues?
  3. EXPLORE the data:
    1. Plot the data.
    2. Are there anomalies?
    3. Are there patterns?
  4. MODEL the data:
    1. Build a model.
    2. Fit the model.
    3. Validate the model.
  5. Communicate and visualize the results:
    1. What did we learn?
    2. Do the results make sense?
    3. Can we tell a story?

Data science can roughly be split into data engineering and data analysis. Data engineering consists of gathering and preparing data for analysis by scraping cleaning, correcting, integrating, re-ordering, scaling, converting, etc. In other words, data engineers transform data into formats that data scientists can analyze. For a good introduction to data analysis, sign up for the free Udacity course.

Python packages

The PyData Python Open Data Science Stack:

  • numpy as np
    • axis=0 means columns and axis=1 means rows
  • scipy
  • sklearn
    • preprocessing
    • linear_model
    • cross_validation
    • confusion_matrix
    • svm
    • multiclass
  • pandas as pd
    (The framework for data engineering, although others exist, like Bubbles.)
  • bobobo for ETL

Preprocessing

  • binarization
  • mean removal
  • scaling
  • normalization
  • label encoding

Machine learning

Applications of AI

  • Computer Vision (CV)
  • Natural Language Processing (NLP)
  • Speech Recognition
  • Expert Systems (rule based)
  • Games
  • Robotics (all of the above)

Branches of AI

  • Machine learning and pattern recognition
  • Logic-based AI
  • Seach
  • Knowledge reresentation
  • Planning
  • Heuristics
  • Genetic Programming

Types of models

  • analytical
  • learned
    • supervised: uses labeled training data
    • unsupervised: without labeled training data

Techniques

  • classification: arrange data into a a fixed numer of distinct categories

    • if the number of samples if insufficient, the algorithm will overfit the training data

    Classifiers:

    • logistic regression: not actually a classifier, but often used as such
    • Bayes theorem: describes the probability of an event occurring based on different conditions related to this event (naïve Bayes assumes these conditions are independent of each other)
    • Support Vector Machine (SVM): defines a separating hyperplane between classes (the best hyperplane maximizes the distance to each point)
  • regression: explain the relationship between independent / input / predictor variables and dependent / output variables

Metrics

  • Confusion matrix: shows the performance of a classifier in terms of true/false positives/negatives
  • F1 score: harmonic average of…
    • precision: #true positives / #total positives
    • recall: #true positives / #total truths

Concepts

  • Cognitive modeling: simulating the human thinking process
  • Deep learning: feature extraction and transformation using using a cascade of multiple layers (hence deep) of nonlinear processing units (e.g. neural nets, belief networks), each using the output from the previous layer as input.
  • Rational agent: does the 'right' thing in a given context, using sensors, actuators and an inference engine
  • General Problem Solver (GPS)
  • Cross validation: divide your data set into training and test subsets

Footnotes:

Python

I use Python for work and play. I make money with Python, and I have fun with it.

There's excellent documentation for Python itself and for many of the Python packages. But how do you use Python in practical, day-to-day use? Automate the boring stuff provides a thorough overview of the basics for beginners. This pages shows how I use Python. There are many other possibilities, but this one works for me.

Second best?

Many consider Python 'the second best language for anything'. PHP may be the leading open source language for the web, Java may dominate the enterprise, R may be the dedicated language for data science, and bash may be the go to language for shell scripting, but in all these cases Python is at least a viable alternative.

What this means is that by learning Python you can be competitive in all these areas. And learning Python is relatively easy, because it is considered to be very beginner-friendly (while at the same time being powerful enough to intrigue the most advanced hackers).

But don't get carried away, because just learning Python (the programming language) is not enough. To be productive in any area, you will have to learn the appropriate package(s) as well.

The only area I can think of where Python is not at home, is the web client. There are client-side implementations like Skulpt, but these are still rather obscure. Javascript rulez (sic) in the browser. And with it, its elegant asynchronicity and use of callbacks. Both work in Python (the former since v3.4 through asyncio), but they are not as common.

History

Python was originally developed by Guide van Rossum who also was its BDFL for decades. But on July 12, 2018 he indicated he is going to step down.

Version 2.7 of Python has long been my default, but a couple of years ago I switched to version 3, which rectified some of the technical debt that had built up in version 2, at the cost of loosing backwards compatibility.

Pythonic

To describe something as clever is not considered a compliment in the Python culture. Alex Martelli - Python Cookbook (O’Reilly)

When working with Python, you will often encounter the phrase pythonic, indicating if something is 'proper python' or not. But what does this mean?

PEP20, or the Zen of Python, describes guiding principles for Python's design:

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Flat is better than nested.
  • Sparse is better than dense.
  • Readability counts.
  • Special cases aren't special enough to break the rules.
  • Although practicality beats purity.
  • Errors should never pass silently.
  • Unless explicitly silenced.
  • In the face of ambiguity, refuse the temptation to guess.
  • There should be one –and preferably only one– obvious way to do it.
  • Although that way may not be obvious at first unless you're Dutch.
  • Now is better than never.
  • Although never is often better than right now.
  • If the implementation is hard to explain, it's a bad idea.
  • If the implementation is easy to explain, it may be a good idea.
  • Namespaces are one honking great idea – let's do more of those!

Note that the Zen of Python uses terms like "is better than" and "beats", indicating that these are relative, rather than absolute, values. So, flat may be better than nested, but in specific conditions you may still use nesting.

PEP8 contains the Python style guide. Here you will find conventions about layout, whitespace, etc.

The Hitchhiker’s Guide to Python contains an extensive section on writing great code.

To me, constructs like list comprehension and generators feel very Python-specific, although they also occur in other languages.

Packages

Pipenv, as its name suggest, combines pip and virtualenv:

Pipenv is primarily meant to provide users and developers of applications with an easy method to setup a working environment. It harnesses Pipfile, pip, and virtualenv into one single command. It features very pretty terminal colors.

Pipenv simplified my Python workflow significantly. Instead of creating a virtual environment, activating it, running pip install, etc., I just do pipenv shell and I'm in business.

Here's a list of the packages I use most:

Web development

With Flask

Web projects differ widely, so there's not one size fits all web framework for me. What I need is just the basics that can be extended with anything the project at hand asks for. Flask is my go to package for web development.

Flask is a microframework for Python based on Werkzeug, Jinja 2 and good intentions. Flask is fun and easy to set up.

Out of the box, Flask just provides minimal http request/response, routing and template support. This means that Flask is unopinionated, but also that, for anything that's not completely trivial, you need to add extensions. Most of my projects need at least SQL Alchemy, WTForms and security. For convenience, I have bundled these in {{% attention %}} Barrel, which optionally also includes admin, REST and datatables modules.

The most advanced part of Barrel is the db module, which provided CRUD-operations and relation decorators. I am working on making that a separate package as SQLAngelo. More on that later.

With Django

Testing

Unit testing

Python has many unit testing frameworks, like PyTest and Nose2. But I prefer the default unit testing, because Python provides it out of the box, and it satisfies all of my requirements.

Functional testing

For functional (or behaviorial) testing I prefer Behave that uses Gherkin to describe behavior in (near) natural language.

Utilities

Tips

  • For REPL there is no standard like ipython, but in some cases I prefer ptpython.
  • The command python -m site

Packages to look into

Productivity

GTD-workflow.png (Advanced)

Javascript

Once upon a time, Javascript was the domain of script kiddies. No more.

The last few years have been a roller coaster ride for the language and its thriving community. The result is ECMA Script 6 (ES6, ES2016) with a long list of state of the art features

Documentation

Clearly, reStructured Text (reST) is the most Pythonic documentation format. And Sphinx is the generator to go with it.

However, for most use cases, I find reST far too complicated and I prefer Orgmode (and sometimes Markdown) for documentation.

On occasion I use an Online Syntax Highlighter to format source code.

Nikola

I sometimes blog with Markdown-files and publish it with static site generator Nikola. While writing I can check the live result with the nikola auto command.

One of the lesser known features of Nikola are shortcodes: simple snippets that you can use throughout your blog. They come in the varieties built-in, community-provided, and home-made. The simplest way to roll your own is by using templates in the shortcodes directory. For example, I have defined a shortcodes/attention.tmpl that gives me {{% attention %}} whenever I want it.

Virtual development environments with Vagrant

Python's virtual environments are a blessing, but as they manage Python dependencies only, they are also limited.

For example, I use sqlite for most of my projects, but in some cases, I need something else, like a more sophisticated SQL database (e.g. MariaDB) or a no-SQL database (e.g. MongoDB). In situations like that, I use Vagrant:

Vagrant provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.

What this means is that with Vagrant I can specify all my project's dependencies in configuration files and scripts. This keeps my projects nicely separated and allows me to reproduce my project's environment at will (this process is called provisioning). I can even use this provisioning to transfer the project to a different environment (e.g. from development to test) and to other developers.

If you want to get Vagrant up and running, the Getting Started section provides a good introduction. For a more beginner-friendly tutorial, head to Scotch.io instead. If you want to skip all that, just go to the official documentation or to the PyCharm Vagrant page if that's your IDE of choice.

In short, Vagrant is about provisioning a virtual machine with all the assets you need for your project. There are many provisioners to choose from to help you with that, like Ansible, Puppet and Chef. But I prefer vanilla bash provisioning, because I am already familiar with bash and would have to learn any of the other provisioners. In other words, my needs are not complex enough to warrant the time needed to learn a more advanced provisioner. And, to be honest, I found their learning curve rather steep.

To make my life (and maybe yours) a little easier, I wrote some basic scaffolding which is available here.

Emacs

Home

Keymaps

Emacs standard

key effect
C-^ Join with next line
C-_ Undo
C-c d Duplicate line
C-g Abort operation
C-h HELP! (and learn)
C-x 1 Focus window
C-x 2 Split window horizontally
C-x 3 Split window vertically
C-x C-f Find file (open into buffer)
C-x C-s Save buffer to file
C-x C-w Save buffer to file as …
C-x C-x Exchange point and mark
C-x b Go to other buffer
C-x d Edit directory
C-x e Execute elisp under cursor
C-x o Go to other window
M-. Go to definition
M-f / M-b Move word forward / backward

Prelude

key effect
C-c c Clean buffer
C-c d Duplicate line
C-c s Swap windows

Custom

key effect
C-c / Toggle region comment
C-c j Join lines
C-c l List packages
C-x k Kill buffer immediately (no confirmation)

IT wisdom

Code / design

Good software [Beck]:

  1. Passes the tests
  2. Reveals intention
    • I will contend that Conceptual Integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas. [Brooks]
    • Simplicity and clarity —in short: what mathematicians call "elegance"— are not a dispensable luxury, but a crucial matter that decides between success and failure [Dijkstra]
    • Convention over configuration
    • Separation of Concerns
    • Maximum coherence, minimum dependency
  3. No duplication
    • Don’t Repeat Yourself (DRY)
    • Single Point of Truth (SPOT)
  4. Fewest elements
    • A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away. [Antoine de Saint-Exupéry]
    • Less is More
    • Occam's razor: the simplest option is usually correct
    • Keep It Small and Simple (KISS) until proven complex
    • Make everything as simple as possible, but not simpler [Einstein]
    • Measuring programming progress by lines of code is like measuring aircraft building progress by weight. [Gates]

Snippets:

  • From the Gang of Four:
    • Program to an interface, not an implementation
    • Favor object composition over class inheritance
  • Algemene namen:
    • Foo, bar
    • Alice, Bob, Carol & Dave
  • Make the change easy. Then make the easy change.
  • Technology develops from the primitive via the complex to the simple. [Antoine de Saint-Exupéry]
  • Command Query Separation: Functions that change state should not return values and functions that return values should not change state.
  • Death by Arguments
  • Refactoring Catalog
  • Principles of OOD:
    • Class design:
      • SRP (Single Responsibility Principle): A class should have one, and only one, reason to change.
      • OCP (Open Closed Principle): You should be able to extend a classes behavior, without modifying it.
      • LSP (Liskov Substitution Principle): Derived classes must be substitutable for their base classes.
      • ISP (Interface Segregation Principle): Make fine grained interfaces that are client specific.
      • DIP (Dependency Inversion Principle): Depend on abstractions, not on concretions.
    • Package1 cohesion (what to put inside packages):
      • REP (Release Reuse Equivalency Principle): The granule of reuse is the granule of release.
      • CCP (Common Closure Principle): Classes that change together are packaged together.
      • CRP (Common Reuse Principle): Classes that are used together are packaged together.
    • package couplings (metrics that evaluate the package structure of a system):
      • ADP (Acyclic Dependencies Principle): The dependency graph of packages must have no cycles.
      • SDP (Stable Dependencies Principle): Depend in the direction of stability.
      • SAP (Stable Abstractions Principle): Abstractness increases with stability.

Projects / management

About IT:

  • IT connects people and systems
  • IT is a craft
  • IT is human labour

Snippets:

  • Deploy Early and Often (DEO) / Release Early, Release Often (RERO)
  • Problem vs Work
  • Realistische ambitie
  • Pareto principle: 20/80% rule
  • Lacking quality, rules abound.

Aphorisms

Brooks's Law Adding manpower to a late software project makes it later
Clarke's third law Any sufficiently advanced technology is indistinguishable from magic.
Reverse any technology that is not like magic, is insufficiently advanced
Conway's Law organizations which design systems are constrained to produce designs
  which are copies of the communication structures of these organizations
Law of Demeter For all classes C, and for all methods M attached to C, all objects to which M sends a
  message must be M’s argument objects (including the self object)
Hanlon's razor Never attribute to malice that which is adequately explained by stupidity.
Hofstadter's Law It always takes longer than you expect, even when you take into account Hofstadter's Law.
Murphy's law Anything that can go wrong, will go wrong.
Finagle's corollary …at the worst possible moment.
Muphry’s law Any correction will introduce new errors.
Parkinson's law work expands so as to fill the time available for its completion
Peter principle managers rise to the level of their incompetence

Abbrs

​REPL Read Evaluate Print Loop
TL;DR Too Long, Didn't Read

Footnotes:

1
binary deliverable

Meteor overview and major packages (deprecated)

This repository contains a brief overview of Meteor and an list of its major packages. Meteor is an innovative platform for developing web-apps. The following features distinguish Meteor from other web-platforms:

These are impressive Unique Selling Points, but in the end it turned out to be too much (at least it was to me). There are just too many moving parts. So, since 2017Q4 I haven't worked with Meteor. It's premises are a developer's dream, but I always struggled to get a stable app.

Overview

The mindmap below shows how I understand Meteor:

Meteor mindmap

The client handles routing and rendering. Routing is taken care of by Flow router, which uses Blaze as template engine (Angular and React are also supported). Templates are divided in three layers:

  • layouts define the general structure of your site, using pages
  • pages are smart, which means they collect data and feed them to components
  • components do not interact with anything except though parameters, which makes them highly reusable

Templates can be controlled through helpers and event handlers (including onCreated).

The server uses Mongodb as a datastore. Data is stored in collections, which are basically persistent JSON documents. A schema language is used to define the datastructures, so data can be validated. Schema also drive the autoform package. Collections can be controlled through helpers and hooks.

The client and server communicate through a publish/subscribe mechanism and through methods. The server controls data access by selectively publishing data to the client. The client (pages) collect data through subscriptions. Methods are Meteor's remote procedure calls (RPCs) and can use schema for data validation.

Some final tips & tricks:

  • waitOn (package Iron Router) lets you defer execution untill a subscription has finished
  • Meteor methods <i>can</i> be called synchronously on the server, but <i>must</i> be called asynchronously on the client
  • Global variables aren't available from templates. Access them with template helper.
  • add c:\Windows\System32 to path on tasklist.exe error
  • The autorun function lets you define a function that is run automatically when a reactive data source changes.

Packages

Just like all other modern frameworks, Meteor relies heavily on third party packages from its thriving eco-system. The Meteor Guide gives an opinionated overview of which packages to use. The table below shows these packages in the middle column, along with some additional packages I prefer in the final column. For more packages visit Atmosphere.js. You can import all these package definitions by importing the file packages from this repository into the file packages of your project and the remove any package you don't need.

<tr><th>subject</th><th>guide</th><th>extra</th></tr>

Out of the box http  
  jQuery  
  markdown  
  meteor:ecmascript  
  underscore  
Collections aldeed:collection2  
  aldeed:simple-schema (jagi:astronomy)  
  dburles:collection-helpers  
  percolate:migrations  
Data-loading percolate:find-from-publication  
  reywood:publish-composite  
  simple:rest  
  tmeasday:publish-counts  
Methods mdg:validated-method  
User accounts alanning:roles didericis:permissions-mixin
  arillo:flow-router-helpers matb33:collection-hooks
  useraccounts:flow-routing ongoworks:security
  useraccounts:core ostrio:user-status
  useraccounts:unstyled (or tmeasday:presence)
Routing arillo:flow-router-helpers ostrio:flow-router-extra
  kadira:flow-router  
  kadira:blaze-layout  
  nimble:restivus  
  zimme:active-route  
UI-UX aldeed:autoform aldeed:tabular
  percolate:momentum aldeed:template-extension
  tap:i18n (or universe:18n) aslagle:reactive-table
    chrismbeckett:toastr
    fortawesome:fontawesome
    matb33:bootstrap-glyphicons
    raix:push
    semantic:ui
    twbs:bootstrap
Other dburles:google-maps  
  easy:search  
  momentjs:moment  
  sach:flow-db-admin  
Testing dburles:factory Chimp (not really a package)
  hwillson:stub-collections  
  johanbrook:publication-collector  
  meteortesting:mocha  
  practicalmeteor:mocha  
  velocity:meteor-stubs  
  xolvio:cleaner  
Deployment dferber:prerender  
  kadira:dochead  
  mdg:seo  
  okgrow:analytics  
Other posts