Python Language

Topics related to Python Language:

Getting started with Python Language

Python logo
Python
is a widely used programming language. It is:

  • High-level: Python automates low-level operations such as memory management. It leaves the programmer with a bit less control but has many benefits including code readability and minimal code expressions.

  • General-purpose: Python is built to be used in all contexts and environments. An example for a non-general-purpose language is PHP: it is designed specifically as a server-side web-development scripting language. In contrast, Python can be used for server-side web-development, but also for building desktop applications.

  • Dynamically typed: Every variable in Python can reference any type of data. A single expression may evaluate to data of different types at different times. Due to that, the following code is possible:

    if something:
        x = 1
    else:
        x = 'this is a string'
    print(x)
    
  • Strongly typed: During program execution, you are not allowed to do anything that's incompatible with the type of data you're working with. For example, there are no hidden conversions from strings to numbers; a string made out of digits will never be treated as a number unless you convert it explicitly:

    1 + '1'  # raises an error
    1 + int('1')  # results with 2
    
  • Beginner friendly :): Python's syntax and structure are very intuitive. It is high level and provides constructs intended to enable writing clear programs on both a small and large scale. Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles. It has a large, comprehensive standard library and many easy-to-install 3rd party libraries.

Its design principles are outlined in The Zen of Python.

Currently, there are two major release branches of Python which have some significant differences. Python 2.x is the legacy version though it still sees widespread use. Python 3.x makes a set of backwards-incompatible changes which aim to reduce feature duplication. For help deciding which version is best for you, see this article.

The official Python documentation is also a comprehensive and useful resource, containing documentation for all versions of Python as well as tutorials to help get you started.

There is one official implementation of the language supplied by Python.org, generally referred to as CPython, and several alternative implementations of the language on other runtime platforms. These include IronPython (running Python on the .NET platform), Jython (on the Java runtime) and PyPy (implementing Python in a subset of itself).

List comprehensions

Filter

In most cases a comprehension or generator expression is more readable, more powerful and more efficient than filter() or ifilter().

List

list is a particular type of iterable, but it is not the only one that exists in Python. Sometimes it will be better to use set, tuple, or dictionary

list is the name given in Python to dynamic arrays (similar to vector<void*> from C++ or Java's ArrayList<Object>). It is not a linked-list.

Accessing elements is done in constant time and is very fast. Appending elements to the end of the list is amortized constant time, but once in a while it might involve allocation and copying of the whole list.

List comprehensions are related to lists.

Functions

5 basic things you can do with functions:

  • Assign functions to variables

    def f():
      print(20)
    y = f
    y()
    # Output: 20
    
  • Define functions within other functions (Nested functions )

    def f(a, b, y):
        def inner_add(a, b):      # inner_add is hidden from outer code
            return a + b
        return inner_add(a, b)**y
    
  • Functions can return other functions

    def f(y):
        def nth_power(x):
            return x ** y
        return nth_power    # returns a function
    
    squareOf = f(2)         # function that returns the square of a number           
    cubeOf = f(3)           # function that returns the cube of a number
    squareOf(3)             # Output: 9
    cubeOf(2)               # Output: 8
    
  • Functions can be passed as parameters to other functions

    def a(x, y):
        print(x, y)
    def b(fun, str):        # b has two arguments: a function and a string 
        fun('Hello', str)
    b(a, 'Sophia')           # Output: Hello Sophia
    
  • Inner functions have access to the enclosing scope (Closure )

    def outer_fun(name):
        def inner_fun():     # the variable name is available to the inner function
            return "Hello "+ name + "!"
        return inner_fun
    greet = outer_fun("Sophia")
    print(greet())            # Output: Hello Sophia!
    

Additional resources

Decorators

Math Module

Loops

Random module

Comparisons

Importing modules

Importing a module will make Python evaluate all top-level code in this module so it learns all the functions, classes, and variables that the module contains. When you want a module of yours to be imported somewhere else, be careful with your top-level code, and encapsulate it into if __name__ == '__main__': if you don't want it to be executed when the module gets imported.

Sorting, Minimum and Maximum

Operator module

Variable Scope and Binding

Basic Input and Output

Files & Folders I/O

Avoiding the cross-platform Encoding Hell

When using Python's built-in open(), it is best-practice to always pass the encoding argument, if you intend your code to be run cross-platform. The Reason for this, is that a system's default encoding differs from platform to platform.

While linux systems do indeed use utf-8 as default, this is not necessarily true for MAC and Windows.

To check a system's default encoding, try this:

import sys
sys.getdefaultencoding()

from any python interpreter.

Hence, it is wise to always sepcify an encoding, to make sure the strings you're working with are encoded as what you think they are, ensuring cross-platform compatiblity.

with open('somefile.txt', 'r', encoding='UTF-8') as f:
    for line in f:
        print(line)

JSON Module

For full documentation including version-specific functionality, please check the official documentation.

Types

Defaults

the json module will handle encoding and decoding of the below types by default:

De-serialisation types:

JSONPython
objectdict
arraylist
stringstr
number (int)int
number (real)float
true, falseTrue, False
nullNone

The json module also understands NaN, Infinity, and -Infinity as their corresponding float values, which is outside the JSON spec.

Serialisation types:

PythonJSON
dictobject
list, tuplearray
strstring
int, float, (int/float)-derived Enumsnumber
Truetrue
Falsefalse
Nonenull

To disallow encoding of NaN, Infinity, and -Infinity you must encode with allow_nan=False. This will then raise a ValueError if you attempt to encode these values.

Custom (de-)serialisation

There are various hooks which allow you to handle data that needs to be represented differently. Use of functools.partial allows you to partially apply the relevant parameters to these functions for convenience.

Serialisation:

You can provide a function that operates on objects before they are serialised like so:

# my_json module

import json
from functools import partial

def serialise_object(obj):
    # Do something to produce json-serialisable data
    return dict_obj

dump = partial(json.dump, default=serialise_object)
dumps = partial(json.dumps, default=serialise_object)

De-serialisation:

There are various hooks that are handled by the json functions, such as object_hook and parse_float. For an exhaustive list for your version of python, see here.

# my_json module

import json
from functools import partial

def deserialise_object(dict_obj):
    # Do something custom
    return obj

def deserialise_float(str_obj):
    # Do something custom
    return obj

load = partial(json.load, object_hook=deserialise_object, parse_float=deserialise_float)
loads = partial(json.loads, object_hook=deserialise_object, parse_float=deserialise_float)

Further custom (de-)serialisation:

The json module also allows for extension/substitution of the json.JSONEncoder and json.JSONDecoder to handle miscellaneous types. The hooks documented above can be added as defaults by creating an equivalently named method. To use these simply pass the class as the cls parameter to the relevant function. Use of functools.partial allows you to partially apply the cls parameter to these functions for convenience, e.g.

# my_json module

import json
from functools import partial

class MyEncoder(json.JSONEncoder):
    # Do something custom

class MyDecoder(json.JSONDecoder):
    # Do something custom

dump = partial(json.dump, cls=MyEncoder)
dumps = partial(json.dumps, cls=MyEncoder)
load = partial(json.load, cls=MyDecoder)
loads = partial(json.loads, cls=MyDecoder)

String Methods

String objects are immutable, meaning that they can't be modified in place the way a list can. Because of this, methods on the built-in type str always return a new str object, which contains the result of the method call.

Metaclasses

When designing your architecture, consider that many things which can be accomplished with metaclasses can also be accomplished using more simple semantics:

  • Traditional inheritance is often more than enough.
  • Class decorators can mix-in functionality into a classes on a ad-hoc approach.
  • Python 3.6 introduces __init_subclass__() which allows a class to partake in the creation of its subclass.

Indexing and Slicing

You can unify the concept of slicing strings with that of slicing other sequences by viewing strings as an immutable collection of characters, with the caveat that a unicode character is represented by a string of length 1.

In mathematical notation you can consider slicing to use a half-open interval of [start, end), that is to say that the start is included but the end is not. The half-open nature of the interval has the advantage that len(x[:n]) = n where len(x) > =n, while the interval being closed at the start has the advantage that x[n:n+1] = [x[n]] where x is a list with len(x) >= n, thus keeping consistency between indexing and slicing notation.

Generators

Simple Mathematical Operators

Numerical types and their metaclasses

The numbers module contains the abstract metaclasses for the numerical types:

subclassesnumbers.Numbernumbers.Integralnumbers.Rationalnumbers.Realnumbers.Complex
bool
int
fractions.Fraction
float
complex
decimal.Decimal

Reduce

reduce might be not always the most efficient function. For some types there are equivalent functions or methods:

  • sum() for the sum of a sequence containing addable elements (not strings):

    sum([1,2,3])                                 # = 6
    
  • str.join for the concatenation of strings:

    ''.join(['Hello', ',', ' World'])            # = 'Hello, World'
    
  • next together with a generator could be a short-circuit variant compared to reduce:

    # First falsy item:
    next((i for i in [100, [], 20, 0] if not i)) # = []  
    

Map Function

Everything that can be done with map can also be done with comprehensions:

list(map(abs, [-1,-2,-3]))    # [1, 2, 3]
[abs(i) for i in [-1,-2,-3]]  # [1, 2, 3]

Though you would need zip if you have multiple iterables:

import operator
alist = [1,2,3]
list(map(operator.add, alist, alist))  # [2, 4, 6]
[i + j for i, j in zip(alist, alist)]  # [2, 4, 6]

List comprehensions are efficient and can be faster than map in many cases, so test the times of both approaches if speed is important for you.

Exponentiation

Searching

All searching algorithms on iterables containing n elements have O(n) complexity. Only specialized algorithms like bisect.bisect_left() can be faster with O(log(n)) complexity.

Dictionary

Helpful items to remember when creating a dictionary:

  • Every key must be unique (otherwise it will be overridden)
  • Every key must be hashable (can use the hash function to hash it; otherwise TypeError will be thrown)
  • There is no particular order for the keys.

Classes

Counting

Manipulating XML

Not all elements of the XML input will end up as elements of the parsed tree. Currently, this module skips over any XML comments, processing instructions, and document type declarations in the input. Nevertheless, trees built using this module’s API rather than parsing from XML text can have comments and processing instructions in them; they will be included when generating XML output.

Date and Time

Python provides both builtin methods and external libraries for creating, modifying, parsing, and manipulating dates and times.

Set

Collections module

There are three other types available in the collections module, namely:

  1. UserDict
  2. UserList
  3. UserString

They each act as a wrapper around the tied object, e.g., UserDict acts as a wrapper around a dict object. In each case, the class simulates its named type. The instance's contents are kept in a regular type object, which is accessible via the data attribute of the wrapper instance. In each of these three cases, the need for these types has been partially supplanted by the ability to subclass directly from the basic type; however, the wrapper class can be easier to work with because the underlying type is accessible as an attribute.

Parallel computation

Multithreading

Writing extensions

Unit Testing

Regular Expressions (Regex)

Bitwise Operators

Incompatibilities moving from Python 2 to Python 3

There are currently two supported versions of Python: 2.7 (Python 2) and 3.6 (Python 3). Additionally versions 3.3 and 3.4 receive security updates in source format.

Python 2.7 is backwards-compatible with most earlier versions of Python, and can run Python code from most 1.x and 2.x versions of Python unchanged. It is broadly available, with an extensive collection of packages. It is also considered deprecated by the CPython developers, and receives only security and bug-fix development. The CPython developers intend to abandon this version of the language in 2020.

According to Python Enhancement Proposal 373 there are no planned future releases of Python 2 after 25 June 2016, but bug fixes and security updates will be supported until 2020. (It doesn't specify what exact date in 2020 will be the sunset date of Python 2.)

Python 3 intentionally broke backwards-compatibility, to address concerns the language developers had with the core of the language. Python 3 receives new development and new features. It is the version of the language that the language developers intend to move forward with.

Over the time between the initial release of Python 3.0 and the current version, some features of Python 3 were back-ported into Python 2.6, and other parts of Python 3 were extended to have syntax compatible with Python 2. Therefore it is possible to write Python that will work on both Python 2 and Python 3, by using future imports and special modules (like six).

Future imports have to be at the beginning of your module:

from __future__ import print_function
# other imports and instructions go after __future__
print('Hello world')

For further information on the __future__ module, see the relevant page in the Python documentation.

The 2to3 tool is a Python program that converts Python 2.x code to Python 3.x code, see also the Python documentation.

The package six provides utilities for Python 2/3 compatibility:

  • unified access to renamed libraries
  • variables for string/unicode types
  • functions for method that got removed or has been renamed

A reference for differences between Python 2 and Python 3 can be found here.

Virtual environments

Virtual environments are sufficiently useful that they probably should be used for every project. In particular, virtual environments allow you to:

  1. Manage dependencies without requiring root access
  2. Install different versions of the same dependency, for instance when working on different projects with varying requirements
  3. Work with different python versions

Copying data

Tuple

Parentheses are only needed for empty tuples or when used in a function call.

A tuple is a sequence of values. The values can be any type, and they are indexed by integers, so in that respect tuples are a lot like lists. The important difference is that tuples are immutable and are hashable, so they can be used in sets and maps

Context Managers (“with” Statement)

Context managers are defined in PEP 343. They are intended to be used as more succinct mechanism for resource management than try ... finally constructs. The formal definition is as follows.

In this PEP, context managers provide __enter__() and __exit__() methods that are invoked on entry to and exit from the body of the with statement.

It then goes on to define the with statement as follows.

with EXPR as VAR:
    BLOCK

The translation of the above statement is:

   mgr = (EXPR)
   exit = type(mgr).__exit__  # Not calling it yet
   value = type(mgr).__enter__(mgr)
   exc = True
   try:
       try:
           VAR = value  # Only if "as VAR" is present
           BLOCK
       except:
           # The exceptional case is handled here
           exc = False
           if not exit(mgr, *sys.exc_info()):
               raise
           # The exception is swallowed if exit() returns true
   finally:
       # The normal and non-local-goto cases are handled here
       if exc:
           exit(mgr, None, None, None)

Hidden Features

Enum

String Formatting

Conditionals

Complex math

Unicode and bytes

The __name__ special variable

The Python special variable __name__ is set to the name of the containing module. At the top level (such as in the interactive interpreter, or in the main file) it is set to '__main__'. This can be used to run a block of statements if a module is being run directly rather than being imported.

The related special attribute obj.__name__ is found on classes, imported modules and functions (including methods), and gives the name of the object when defined.

Checking Path Existence and Permissions

Python Networking

Asyncio Module

The Print Function

os.path

Creating Python packages

The pypa sample project contains a complete, easily modifiable template setup.py that demonstrates a large range of capabilities setup-tools has to offer.

Parsing Command Line arguments

HTML Parsing

Subprocess Library

setup.py

List slicing (selecting parts of lists)

Sockets

Itertools Module

Recursion

Recursion needs a stop condition stopCondition in order to exit the recursion.

The original variable must be passed on to the recursive function so it becomes stored.

Boolean Operators

The dis module

Type Hints

Type Hinting, as specified in PEP 484, is a formalized solution to statically indicate the type of a value for Python Code. By appearing alongside the typing module, type-hints offer Python users the capability to annotate their code thereby assisting type checkers while, indirectly, documenting their code with more information.

pip: PyPI Package Manager

The locale Module

Exceptions

Web scraping with Python

Useful Python packages for web scraping (alphabetical order)

Making requests and collecting data

requests

A simple, but powerful package for making HTTP requests.

requests-cache

Caching for requests; caching data is very useful. In development, it means you can avoid hitting a site unnecessarily. While running a real collection, it means that if your scraper crashes for some reason (maybe you didn't handle some unusual content on the site...? maybe the site went down...?) you can repeat the collection very quickly from where you left off.

scrapy

Useful for building web crawlers, where you need something more powerful than using requests and iterating through pages.

selenium

Python bindings for Selenium WebDriver, for browser automation. Using requests to make HTTP requests directly is often simpler for retrieving webpages. However, this remains a useful tool when it is not possible to replicate the desired behaviour of a site using requests alone, particularly when JavaScript is required to render elements on a page.

HTML parsing

BeautifulSoup

Query HTML and XML documents, using a number of different parsers (Python's built-in HTML Parser,html5lib, lxml or lxml.html)

lxml

Processes HTML and XML. Can be used to query and select content from HTML documents via CSS selectors and XPath.

Deque Module

This class is useful when you need an object similar to a list that allows fast append and pop operations from either side (the name deque stands for “double-ended queue”).

The methods provided are indeed very similar, except that some like pop, append, or extend can be suffixed with left. The deque data structure should be preferred to a list if one needs to frequently insert and delete elements at both ends because it allows to do so in constant time O(1).

Distribution

Property Objects

Note: In Python 2, make sure that your class inherits from object (making it a new-style class) in order for all features of properties to be available.

Overloading

Debugging

Reading and Writing CSV

Dynamic code execution with `exec` and `eval`

In exec, if globals is locals (i.e. they refer to the same object), the code is executed as if it is on the module level. If globals and locals are distinct objects, the code is executed as if it were in a class body.

If the globals object is passed in, but doesn't specify __builtins__ key, then Python built-in functions and names are automatically added to the global scope. To suppress the availability of functions such as print or isinstance in the executed scope, let globals have the key __builtins__ mapped to value None. However, this is not a security feature.

The Python 2 -specific syntax shouldn't be used; the Python 3 syntax will work in Python 2. Thus the following forms are deprecated: <s>

  • exec object
  • exec object in globals
  • exec object in globals, locals

PyInstaller - Distributing Python Code

Iterables and Iterators

Data Visualization with Python

The Interpreter (Command Line Console)

*args and **kwargs

There a few things to note:

  1. The names args and kwargs are used by convention, they are not a part of the language specification. Thus, these are equivalent:

     def func(*args, **kwargs):
         print(args)
         print(kwargs)
    

     def func(*a, **b):
         print(a)
         print(b)
    
  2. You may not have more than one args or more than one kwargs parameters (however they are not required)

     def func(*args1, *args2):
     #   File "<stdin>", line 1
     #     def test(*args1, *args2):
     #                      ^
     # SyntaxError: invalid syntax
    

     def test(**kwargs1, **kwargs2):
     #   File "<stdin>", line 1
     #     def test(**kwargs1, **kwargs2):
     #                       ^
     # SyntaxError: invalid syntax
    
  3. If any positional argument follow *args, they are keyword-only arguments that can only be passed by name. A single star may be used instead of *args to force values to be keyword arguments without providing a variadic parameter list. Keyword-only parameter lists are only available in Python 3.

     def func(a, b, *args, x, y):
         print(a, b, args, x, y)
    
     func(1, 2, 3, 4, x=5, y=6)
     #>>> 1, 2, (3, 4), 5, 6
    

     def func(a, b, *, x, y):
         print(a, b, x, y)
    
     func(1, 2, x=5, y=6)
     #>>> 1, 2, 5, 6
    
  4. **kwargs must come last in the parameter list.

     def test(**kwargs, *args):
     #   File "<stdin>", line 1
     #     def test(**kwargs, *args):
     #                      ^
     # SyntaxError: invalid syntax
    

Functools Module

Garbage Collection

At its core, Python's garbage collector (as of 3.5) is a simple reference counting implementation. Every time you make a reference to an object (for example, a = myobject) the reference count on that object (myobject) is incremented. Every time a reference gets removed, the reference count is decremented, and once the reference count reaches 0, we know that nothing holds a reference to that object and we can deallocate it!

One common misunderstanding about how Python memory management works is that the del keyword frees objects memory. This is not true. What actually happens is that the del keyword merely decrements the objects refcount, meaning that if you call it enough times for the refcount to reach zero the object may be garbage collected (even if there are actually still references to the object available elsewhere in your code).

Python aggresively creates or cleans up objects the first time it needs them If I perform the assignment a = object(), the memory for object is allocated at that time (cpython will sometimes reuse certain types of object, eg. lists under the hood, but mostly it doesn't keep a free object pool and will perform allocation when you need it). Similarly, as soon as the refcount is decremented to 0, GC cleans it up.

Generational Garbage Collection

In the 1960's John McCarthy discovered a fatal flaw in refcounting garbage collection when he implemented the refcounting algorithm used by Lisp: What happens if two objects refer to each other in a cyclic reference? How can you ever garbage collect those two objects even if there are no external references to them if they will always refer to eachother? This problem also extends to any cyclic data structure, such as a ring buffers or any two consecutive entries in a doubly linked list. Python attempts to fix this problem using a slightly interesting twist on another garbage collection algorithm called Generational Garbage Collection.

In essence, any time you create an object in Python it adds it to the end of a doubly linked list. On occasion Python loops through this list, checks what objects the objects in the list refer too, and if they're also in the list (we'll see why they might not be in a moment), further decrements their refcounts. At this point (actually, there are some heuristics that determine when things get moved, but let's assume it's after a single collection to keep things simple) anything that still has a refcount greater than 0 gets promoted to another linked list called "Generation 1" (this is why all objects aren't always in the generation 0 list) which has this loop applied to it less often. This is where the generational garbage collection comes in. There are 3 generations by default in Python (three linked lists of objects): The first list (generation 0) contains all new objects; if a GC cycle happens and the objects are not collected, they get moved to the second list (generation 1), and if a GC cycle happens on the second list and they are still not collected they get moved to the third list (generation 2). The third generation list (called "generation 2", since we're zero indexing) is garbage collected much less often than the first two, the idea being that if your object is long lived it's not as likely to be GCed, and may never be GCed during the lifetime of your application so there's no point in wasting time checking it on every single GC run. Furthermore, it's observed that most objects are garbage collected relatively quickly. From now on, we'll call these "good objects" since they die young. This is called the "weak generational hypothesis" and was also first observed in the 60s.

A quick aside: unlike the first two generations, the long lived third generation list is not garbage collected on a regular schedule. It is checked when the ratio of long lived pending objects (those that are in the third generation list, but haven't actually had a GC cycle yet) to the total long lived objects in the list is greater than 25%. This is because the third list is unbounded (things are never moved off of it to another list, so they only go away when they're actually garbage collected), meaning that for applications where you are creating lots of long lived objects, GC cycles on the third list can get quite long. By using a ratio we achieve "amortized linear performance in the total number of objects"; aka, the longer the list, the longer GC takes, but the less often we perform GC (here's the original 2008 proposal for this heuristic by Martin von Löwis for futher reading). The act of performing a garbage collection on the third generation or "mature" list is called "full garbage collection".

So the generational garbage collection speeds things up tremdously by not requiring that we scan over objects that aren't likely to need GC all the time, but how does it help us break cyclic references? Probably not very well, it turns out. The function for actually breaking these reference cycles starts out like this:

/* Break reference cycles by clearing the containers involved.  This is
 * tricky business as the lists can be changing and we don't know which
 * objects may be freed.  It is possible I screwed something up here.
 */
static void
delete_garbage(PyGC_Head *collectable, PyGC_Head *old)

The reason generational garbage collection helps with this is that we can keep the length of the list as a separate count; each time we add a new object to the generation we increment this count, and any time we move an object to another generation or dealloc it we decrement the count. Theoretically at the end of a GC cycle this count (for the first two generations anyways) should always be 0. If it's not, anything in the list that's left over is some form of circular reference and we can drop it. However, there's one more problem here: What if the leftover objects have Python's magic method __del__ on them? __del__ is called any time a Python object is destroyed. However, if two objects in a circular reference have __del__ methods, we can't be sure that destroying one won't break the others __del__ method. For a contrived example, imagine we wrote the following:

class A(object):
    def __init__(self, b=None):
        self.b = b
 
    def __del__(self):
        print("We're deleting an instance of A containing:", self.b)
     
class B(object):
    def __init__(self, a=None):
        self.a = a
 
    def __del__(self):
        print("We're deleting an instance of B containing:", self.a)

and we set an instance of A and an instance of B to point to one another and then they end up in the same garbage collection cycle? Let's say we pick one at random and dealloc our instance of A first; A's __del__ method will be called, it will print, then A will be freed. Next we come to B, we call its __del__ method, and oops! Segfault! A no longer exists. We could fix this by calling everything that's left over's __del__ methods first, then doing another pass to actually dealloc everything, however, this introduces another, issue: What if one objects __del__ method saves a reference of the other object that's about to be GCed and has a reference to us somewhere else? We still have a reference cycle, but now it's not possible to actually GC either object, even if they're no longer in use. Note that even if an object is not part of a circular data structure, it could revive itself in its own __del__ method; Python does have a check for this and will stop GCing if an objects refcount has increased after its __del__ method has been called.

CPython deals with this is by sticking those un-GC-able objects (anything with some form of circular reference and a __del__ method) onto a global list of uncollectable garbage and then leaving it there for all eternity:

/* list of uncollectable objects */
static PyObject *garbage = NULL;

Indentation

Security and Cryptography

Many of the methods in hashlib will require you to pass values interpretable as buffers of bytes, rather than strings. This is the case for hashlib.new().update() as well as hashlib.pbkdf2_hmac. If you have a string, you can convert it to a byte buffer by prepending the character b to the start of the string:

  "This is a string"
 b"This is a buffer of bytes"

Pickle data serialisation

Pickleable types

The following objects are picklable.

  • None, True, and False
  • numbers (of all types)
  • strings (of all types)
  • tuples, lists, sets, and dicts containing only picklable objects
  • functions defined at the top level of a module
  • built-in functions
  • classes that are defined at the top level of a module
    • instances of such classes whose __dict__ or the result of calling __getstate__() is picklable (see the official docs for details).

Based on the official Python documentation.

pickle and security

The pickle module is not secure. It should not be used when receiving the serialized data from an untrusted party, such as over the Internet.

urllib

Binary Data

Python and Excel

Idioms

Method Overriding

Difference between Module and Package

It is possible to put a Python package in a ZIP file, and use it that way if you add these lines to the beginning of your script:

import sys
sys.path.append("package.zip")

Data Serialization

Why using JSON?

  • Cross language support
  • Human readable
  • Unlike pickle, it doesn't have the danger of running arbitrary code

Why not using JSON?

  • Doesn't support Pythonic data types
  • Keys in dictionaries must not be other than string data types.

Why Pickle?

  • Great way for serializing Pythonic (tuples, functions, classes)
  • Keys in dictionaries can be of any data type.

Why not Pickle?

  • Cross language support is missing
  • It is not safe for loading arbitrary data

Python concurrency

The Python developers made sure that the API between threading and multiprocessing is similar so that switching between the two variants is easier for programmers.

Introduction to RabbitMQ using AMQPStorm

The latest version of AMQPStorm is available at pypi or you can install it using pip

pip install amqpstorm

PostgreSQL

Descriptor

Common Pitfalls

Multiprocessing

tempfile NamedTemporaryFile

Working with ZIP archives

If you try to open a file that is not a ZIP file, the exception zipfile.BadZipFile is raised.

In Python 2.7, this was spelled zipfile.BadZipfile, and this old name is retained alongside the new one in Python 3.2+

Stack

From Wikipedia:

In computer science, a stack is an abstract data type that serves as a collection of elements, with two principal operations: push, which adds an element to the collection, and pop, which removes the most recently added element that was not yet removed.

Due to the way their elements are accessed, stacks are also known as Last-In, First-Out (LIFO) stacks.

In Python one can use lists as stacks with append() as push and pop() as pop operations. Both operations run in constant time O(1).

The Python's deque data structure can also be used as a stack. Compared to lists, deques allow push and pop operations with constant time complexity from both ends.

Profiling

User-Defined Methods

Working around the Global Interpreter Lock (GIL)

Why is there a GIL?

The GIL has been around in CPython since the inception of Python threads, in 1992. It's designed to ensure thread safety of running python code. Python interpreters written with a GIL prevent multiple native threads from executing Python bytecodes at once. This makes it easy for plugins to ensure that their code is thread-safe: simply lock the GIL, and only your active thread is able to run, so your code is automatically thread-safe.

Short version: the GIL ensures that no matter how many processors and threads you have, only one thread of a python interpreter will run at one time.

This has a lot of ease-of-use benefits, but also has a lot of negative benefits as well.

Note that a GIL is not a requirment of the Python language. Consequently, you can't access the GIL directly from standard python code. Not all implementations of Python use a GIL.

Interpreters that have a GIL: CPython, PyPy, Cython (but you can disable the GIL with nogil)

Interpreters that do not have a GIL: Jython, IronPython

Details on how the GIL operates:

When a thread is running, it locks the GIL. When a thread wants to run, it requests the GIL, and waits until it is available. In CPython, before version 3.2, the running thread would check after a certain number of python instructions to see if other code wanted the lock (that is, it released the lock and then requested it again). This method tended to cause thread starvation, largely because the thread that released the lock would acquire it again before the waiting threads had a chance to wake up. Since 3.2, threads that want the GIL wait for the lock for some time, and after that time, they set a shared variable that forces the running thread to yield. This can still result in drastically longer execution times, though. See the links below from dabeaz.com (in the references section) for more details.

CPython automatically releases the GIL when a thread performs an I/O operation. Image processing libraries and numpy number crunching operations release the GIL before doing their processing.

Benefits of the GIL

For interpreters that use the GIL, the GIL is systemic. It is used to preserve the state of the application. Benefits include:
  • Garbage collection - thread-safe reference counts must be modified while the GIL is locked. In CPython, all of garbarge collection is tied to the GIL. This is a big one; see the python.org wiki article about the GIL (listed in References, below) for details about what must still be functional if one wanted to remove the GIL.
  • Ease for programmers dealing with the GIL - locking everything is simplistic, but easy to code to
  • Eases the import of modules from other languages

Consequences of the GIL

The GIL only allows one thread to run python code at a time inside the python interpreter. This means that multithreading of processes that run strict python code simply doesn't work. When using threads against the GIL, you will likely have worse performance with the threads than if you ran in a single thread.

References:

https://wiki.python.org/moin/GlobalInterpreterLock - quick summary of what it does, fine details on all the benefits

http://programmers.stackexchange.com/questions/186889/why-was-python-written-with-the-gil - clearly written summary

http://www.dabeaz.com/python/UnderstandingGIL.pdf - how the GIL works and why it slows down on multiple cores

http://www.dabeaz.com/GIL/gilvis/index.html - visualization of the data showing how the GIL locks up threads

http://jeffknupp.com/blog/2012/03/31/pythons-hardest-problem/ - simple to understand history of the GIL problem

https://jeffknupp.com/blog/2013/06/30/pythons-hardest-problem-revisited/ - details on ways to work around the GIL's limitations

Deployment

Logging

Processes and Threads

The os Module

Comments and Documentation

Developers should follow the PEP257 - Docstring Conventions guidelines. In some cases, style guides (such as Google Style Guide ones) or documentation rendering third-parties (such as Sphinx) may detail additional conventions for docstrings.

Database Access

Python HTTP Server

Alternatives to switch statement from other languages

List destructuring (aka packing and unpacking)

Accessing Python source code and bytecode

Mixins

Adding a mixin to a class looks a lot like adding a superclass, because it pretty much is just that. An object of a class with the mixin Foo will also be an instance of Foo, and isinstance(instance, Foo) will return true

Attribute Access

ArcPy

This example uses a Search Cursor from the Data Access (da) module of ArcPy.

Do not confuse arcpy.da.SearchCursor syntax with the earlier and slower arcpy.SearchCursor().

The Data Access module (arcpy.da) has only been available since ArcGIS 10.1 for Desktop.

Python Anti-Patterns

Plugin and Extension Classes

Websockets

Immutable datatypes(int, float, str, tuple and frozensets)

String representations of class instances: __str__ and __repr__ methods

A note about implemeting both methods

When both methods are implemented, it's somewhat common to have a __str__ method that returns a human-friendly representation (e.g. "Ace of Spaces") and __repr__ return an eval-friendly representation.

In fact, the Python docs for repr() note exactly this:

For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object.

What that means is that __str__ might be implemented to return something like "Ace of Spaces" as shown previously, __repr__ might be implemented to instead return Card('Spades', 1)

This string could be passed directly back into eval in somewhat of a "round-trip":

object -> string -> object

An example of an implementation of such a method might be:

def __repr__(self):
    return "Card(%s, %d)" % (self.suit, self.pips)

Notes

[1] This output is implementation specific. The string displayed is from cpython.

[2] You may have already seen the result of this str()/repr() divide and not known it. When strings containing special characters such as backslashes are converted to strings via str() the backslashes appear as-is (they appear once). When they're converted to strings via repr() (for example, as elements of a list being displayed), the backslashes are escaped and thus appear twice.

Arrays

Operator Precedence

From the Python documentation:

The following table summarizes the operator precedences in Python, from lowest precedence (least binding) to highest precedence (most binding). Operators in the same box have the same precedence. Unless the syntax is explicitly given, operators are binary. Operators in the same box group left to right (except for comparisons, including tests, which all have the same precedence and chain from left to right and exponentiation, which groups from right to left).

OperatorDescription
lambdaLambda expression
if – elseConditional expression
orBoolean OR
andBoolean AND
not xBoolean NOT
in, not in, is, is not, <, <=, >, >=, <>, !=, ==Comparisons, including membership tests and identity tests
|Bitwise OR
^Bitwise XOR
&Bitwise AND
<<, >>Shifts
+, -Addition and subtraction
*, /, //, %Multiplication, division, remainder [8]
+x, -x, ~xPositive, negative, bitwise NOT
**Exponentiation [9]
x[index], x[index:index], x(arguments...), x.attributeSubscription, slicing, call, attribute reference
(expressions...), [expressions...], {key: value...}, expressions...Binding or tuple display, list display, dictionary display, string conversion

Polymorphism

Non-official Python implementations

List Comprehensions

List comprehensions were outlined in PEP 202 and introduced in Python 2.0.

Web Server Gateway Interface (WSGI)

2to3 tool

Abstract syntax tree

Abstract Base Classes (abc)

Unicode

Secure Shell Connection in Python

Python Serial Communication (pyserial)

Neo4j and Cypher using Py2Neo

Basic Curses with Python

Curses is a basic terminal ( or character display ) handling module from Python. This can be used to create Terminal based User interfaces or TUIs.

This is a python port of a more popular C library 'ncurses'

Performance optimization

When attempting to improve the performance of a Python script, first and foremost you should be able to find the bottleneck of your script and note that no optimization can compensate for a poor choice in data structures or a flaw in your algorithm design. Identifying performance bottlenecks can be done by profiling your script. Secondly do not try to optimize too early in your coding process at the expense of readability/design/quality. Donald Knuth made the following statement on optimization:

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”

Templates in python

Pillow

The pass statement

Why would you ever want to tell the interpreter to explicitly do nothing? Python has the syntactical requirement that code blocks (after if, except, def, class etc.) cannot be empty.

But sometimes an empty code block is useful in itself. An empty class block can definine a new, different class, such as exception that can be caught. An empty except block can be the simplest way to express “ask for forgiveness later” if there was nothing to ask for forgiveness for. If an iterator does all the heavy lifting, an empty for loop to just run the iterator can be useful.

Therefore, if nothing is supposed to happen in a code block, a pass is needed for such a block to not produce an IndentationError. Alternatively, any statement (including just a term to be evaluated, like the Ellipsis literal ... or a string, most often a docstring) can be used, but the pass makes clear that indeed nothing is supposed to happen, and does not need to be actually evaluated and (at least temporarily) stored in memory. Here is a small annotated collection of the most frequent uses of pass that crossed my way – together with some comments on good and bad pratice.

  • Ignoring (all or) a certain type of Exception (example from xml):

     try:
         self.version = "Expat %d.%d.%d" % expat.version_info
     except AttributeError:
         pass # unknown
    

    Note: Ignoring all types of raises, as in the following example from pandas, is generally considered bad practice, because it also catches exceptions that should probably be passed on to the caller, e.g. KeyboardInterrupt or SystemExit (or even HardwareIsOnFireError – How do you know you aren't running on a custom box with specific errors defined, which some calling application would want to know about?).

     try:
         os.unlink(filename_larry)
     except:
         pass
    

    Instead using at least except Error: or in this case preferably except OSError: is considered much better practice. A quick analysis of all python modules I have installed gave me that more than 10% of all except ...: pass statements catch all exceptions, so it's still a frequent pattern in python programming.

  • Deriving an exception class that does not add new behaviour (e.g. in scipy):

     class CompileError(Exception):
         pass
    

    Similarly, classes intended as abstract base class often have an explicit empty __init__ or other methods that subclasses are supposed to derive. (e.g. pebl)

     class _BaseSubmittingController(_BaseController):
         def submit(self, tasks): pass
         def retrieve(self, deferred_results): pass
    
  • Testing that code runs properly for a few test values, without caring about the results (from mpmath):

     for x, error in MDNewton(mp, f, (1,-2), verbose=0,
                              norm=lambda x: norm(x, inf)):
         pass
    
  • In class or function definitions, often a docstring is already in place as the obligatory statement to be executed as the only thing in the block. In such cases, the block may contain pass in addition to the docstring in order to say “This is indeed intended to do nothing.”, for example in pebl:

     class ParsingError(Exception): 
         """Error encountered while parsing an ill-formed datafile."""
         pass
    
  • In some cases, pass is used as a placeholder to say “This method/class/if-block/... has not been implemented yet, but this will be the place to do it”, although I personally prefer the Ellipsis literal ... (NOTE: python-3 only) in order to strictly differentiate between this and the intentional “no-op” in the previous example. For example, if I write a model in broad strokes, I might write

     def update_agent(agent):
         ... 
    

    where others might have

     def update_agent(agent):
         pass
    

    before

     def time_step(agents):
         for agent in agents:
             update_agent(agent)
    

    as a reminder to fill in the update_agent function at a later point, but run some tests already to see if the rest of the code behaves as intended. (A third option for this case is raise NotImplementedError. This is useful in particular for two cases: Either “This abstract method should be implemented by every subclass, there is no generic way to define it in this base class”, or “This function, with this name, is not yet implemented in this release, but this is what its signature will look like”)

Linked List Node

py.test

Date Formatting

Heapq

tkinter

CLI subcommands with precise help output

Different ways to create subcommands like in hg or svn with the command line interface shown in the help message:

usage: sub <command>

commands:

  status -  show status
  list   -  print list

Defining functions with list arguments

Sqlite3 Module

Python Persistence

Turtle Graphics

Connecting Python to SQL Server

Design Patterns

Multidimensional arrays

Audio

Pyglet

Queue Module

ijson

Webbrowser Module

The following table lists predefined browser types. The left column are names that can be passed into the webbrowser.get() method and the right column lists the class names for each browser type.

Type NameClass Name
'mozilla'Mozilla('mozilla')
'firefox'Mozilla('mozilla')
'netscape'Mozilla('netscape')
'galeon'Galeon('galeon')
'epiphany'Galeon('epiphany')
'skipstone'BackgroundBrowser('skipstone')
'kfmclient'Konqueror()
'konqueror'Konqueror()
'kfm'Konqueror()
'mosaic'BackgroundBrowser('mosaic')
'opera'Opera()
'grail'Grail()
'links'GenericBrowser('links')
'elinks'Elinks('elinks')
'lynx'GenericBrowser('lynx')
'w3m'GenericBrowser('w3m')
'windows-default'WindowsDefault
'macosx'MacOSX('default')
'safari'MacOSX('safari')
'google-chrome'Chrome('google-chrome')
'chrome'Chrome('chrome')
'chromium'Chromium('chromium')
'chromium-browser'Chromium('chromium-browser')

The base64 Module

Flask

groupby()

Sockets And Message Encryption/Decryption Between Client and Server

Language Used: Python 2.7 (Download Link: https://www.python.org/downloads/ )

Library Used:

*PyCrypto (Download Link: https://pypi.python.org/pypi/pycrypto )

*PyCryptoPlus (Download Link: https://github.com/doegox/python-cryptoplus )

Library Installation:

PyCrypto: Unzip the file. Go to the directory and open terminal for linux(alt+ctrl+t) and CMD(shift+right click+select command prompt open here) for windows. After that write python setup.py install (Make Sure Python Environment is set properly in Windows OS)

PyCryptoPlus: Same as the last library.

Tasks Implementation: The task is separated into two parts. One is handshake process and another one is communication process. Socket Setup:

  • As the creating public and private keys as well as hashing the public key, we need to setup the socket now. For setting up the socket, we need to import another module with “import socket” and connect(for client) or bind(for server) the IP address and the port with the socket getting from the user.

    ----------Client Side----------

      server = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
      host = raw_input("Server Address To Be Connected -> ")
      port = int(input("Port of The Server -> "))
      server.connect((host, port))
    

    ----------Server Side---------

      try:
      #setting up socket
      server = socket.socket(socket.AF_INET,socket.SOCK_STREAM)     
      server.bind((host,port))
      server.listen(5)
      except BaseException: print "-----Check Server Address or Port-----"
    

    “ socket.AF_INET,socket.SOCK_STREAM” will allow us to use accept() function and messaging fundamentals. Instead of it, we can use “ socket.AF_INET,socket.SOCK_DGRAM” also but that time we will have to use setblocking(value) .

Handshake Process:

  • (CLIENT)The first task is to create public and private key. To create the private and public key, we have to import some modules. They are : from Crypto import Random and from Crypto.PublicKey import RSA. To create the keys, we have to write few simple lines of codes:
random_generator = Random.new().read
        key = RSA.generate(1024,random_generator) 
        public = key.publickey().exportKey()

random_generator is derived from “from Crypto import Random” module. Key is derived from “from Crypto.PublicKey import RSA” which will create a private key, size of 1024 by generating random characters. Public is exporting public key from previously generated private key.

  • (CLIENT)After creating the public and private key, we have to hash the public key to send over to the server using SHA-1 hash. To use the SHA-1 hash we need to import another module by writing “import hashlib” .To hash the public key we have write two lines of code:

      hash_object = hashlib.sha1(public) 
      hex_digest = hash_object.hexdigest()
    

Here hash_object and hex_digest is our variable. After this, client will send hex_digest and public to the server and Server will verify them by comparing the hash got from client and new hash of the public key. If the new hash and the hash from the client matches, it will move to next procedure. As the public sent from the client is in form of string, it will not be able to be used as key in the server side. To prevent this and converting string public key to rsa public key, we need to write server_public_key = RSA.importKey(getpbk) ,here getpbk is the public key from the client.

  • (SERVER)The next step is to create a session key. Here, I have used “os” module to create a random key “key = os.urandom(16)” which will give us a 16bit long key and after that I have encrypted that key in “AES.MODE_CTR” and hash it again with SHA-1:

     #encrypt CTR MODE session key
     en = AES.new(key_128,AES.MODE_CTR,counter = lambda:key_128) encrypto = en.encrypt(key_128)
     #hashing sha1
     en_object = hashlib.sha1(encrypto)
     en_digest = en_object.hexdigest()
    

So the en_digest will be our session key.

  • (SERVER) For the final part of the handshake process is to encrypt the public key got from the client and the session key created in server side.

     #encrypting session key and public key
     E = server_public_key.encrypt(encrypto,16)
    

After encrypting, server will send the key to the client as string.

  • (CLIENT) After getting the encrypted string of (public and session key) from the server, client will decrypt them using Private Key which was created earlier along with the public key. As the encrypted (public and session key) was in form of string, now we have to get it back as a key by using eval() . If the decryption is done, the handshake process is completed also as both sides confirms that they are using same keys. To decrypt:

     en = eval(msg)
     decrypt = key.decrypt(en)
     # hashing sha1
     en_object = hashlib.sha1(decrypt) en_digest = en_object.hexdigest()
    

I have used the SHA-1 here so that it will be readable in the output.

Communication Process:

For communication process, we have to use the session key from both side as the KEY for IDEA encryption MODE_CTR. Both side will encrypt and decrypt messages with IDEA.MODE_CTR using the session key.

  • (Encryption) For IDEA encryption, we need key of 16bit in size and counter as must callable. Counter is mandatory in MODE_CTR. The session key that we encrypted and hashed is now size of 40 which will exceed the limit key of the IDEA encryption. Hence, we need to reduce the size of the session key. For reducing, we can use normal python built in function string[value:value]. Where the value can be any value according to the choice of the user. In our case, I have done “key[:16]” where it will take from 0 to 16 values from the key. This conversion could be done in many ways like key[1:17] or key[16:]. Next part is to create new IDEA encryption function by writing IDEA.new() which will take 3 arguments for processing. The first argument will be KEY,second argument will be the mode of the IDEA encryption (in our case, IDEA.MODE_CTR) and the third argument will be the counter= which is a must callable function. The counter= will hold a size of of string which will be returned by the function. To define the counter= , we must have to use a reasonable values. In this case, I have used the size of the KEY by defining lambda. Instead of using lambda, we could use Counter.Util which generates random value for counter= . To use Counter.Util, we need to import counter module from crypto. Hence, the code will be:

      ideaEncrypt = IDEA.new(key, IDEA.MODE_CTR, counter=lambda : key)
    

Once defining the “ideaEncrypt” as our IDEA encryption variable, we can use the built in encrypt function to encrypt any message.

eMsg = ideaEncrypt.encrypt(whole)
#converting the encrypted message to HEXADECIMAL to readable eMsg =         
eMsg.encode("hex").upper()

In this code segment, whole is the message to be encrypted and eMsg is the encrypted message. After encrypting the message, I have converted it into HEXADECIMAL to make readable and upper() is the built in function to make the characters uppercase. After that, this encrypted message will be sent to the opposite station for decryption.

  • (Decryption)

To decrypt the encrypted messages, we will need to create another encryption variable by using the same arguments and same key but this time the variable will decrypt the encrypted messages. The code for this same as the last time. However, before decrypting the messages, we need to decode the message from hexadecimal because in our encryption part, we encoded the encrypted message in hexadecimal to make readable. Hence, the whole code will be:

decoded = newmess.decode("hex")
ideaDecrypt = IDEA.new(key, IDEA.MODE_CTR, counter=lambda: key) 
dMsg = ideaDecrypt.decrypt(decoded)

These processes will be done in both server and client side for encrypting and decrypting.

pygame

Input, Subset and Output External Data Files using Pandas

hashlib

getting start with GZip

Django

ctypes

Creating a Windows service using Python

Python Server Sent Events

Mutable vs Immutable (and Hashable) in Python

Python speed of program

configparser

All return values from ConfigParser.ConfigParser().get are strings. It can be converted to more common types thanks to eval

Linked lists

Commonwealth Exceptions

Optical Character Recognition

Python Data Types

Partial functions

pyautogui module

graph-tool

Unzipping Files

Functional Programming in Python

Python Virtual Environment - virtualenv

sys

virtual environment with virtualenvwrapper

Create virtual environment with virtualenvwrapper in windows

Python Requests Post

Plotting with Matplotlib

Python Lex-Yacc

ChemPy - python package

pyaudio

shelve

Note: Do not rely on the shelf being closed automatically; always call close() explicitly when you don’t need it any more, or use shelve.open() as a context manager:

with shelve.open('spam') as db:
    db['eggs'] = 'eggs'

Warning:

Because the shelve module is backed by pickle, it is insecure to load a shelf from an untrusted source. Like with pickle, loading a shelf can execute arbitrary code.

Restrictions

1. The choice of which database package will be used (such as dbm.ndbm or dbm.gnu) depends on which interface is available. Therefore it is not safe to open the database directly using dbm. The database is also (unfortunately) subject to the limitations of dbm, if it is used — this means that (the pickled representation of) the objects stored in the database should be fairly small, and in rare cases key collisions may cause the database to refuse updates.

2.The shelve module does not support concurrent read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing. Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used.

Usage of "pip" module: PyPI Package Manager

IoT Programming with Python and Raspberry PI

Code blocks, execution frames, and namespaces

kivy - Cross-platform Python Framework for NUI Development

Call Python from C#

Note that in the example above data is serialized using MongoDB.Bson library that can be installed via NuGet manager.

Otherwise, you can use any JSON serialization library of your choice.


Below are inter-process communication implementation steps:


enter image description here

I am using the inter-process communication between C# and Python scripts in one of my projects that allows calling Python scripts directly from Excel spreadsheets.

The project utilizes ExcelDNA add-in for C# - Excel binding.

The source-code is stored in the GitHub repository.

Below are links to wiki pages that provide an overview of the project and help to get started in 4 easy steps.

I hope you find the example and the project useful.


Similarities in syntax, Differences in meaning: Python vs. JavaScript

Writing to CSV from String or List

open( path, "wb")

"wb" - Write mode.

The b parameter in "wb" we have used, is necessary only if you want to open it in binary mode, which is needed only in some operating systems like Windows.

csv.writer ( csv_file, delimiter=',' )

Here the delimiter we have used, is ,, because we want each cell of data in a row, to contain the first name, last name, and age respectively. Since our list is split along the , too, it proves rather convenient for us.

Raise Custom Errors / Exceptions

Using loops within functions

Pandas Transform: Preform operations on groups and concatenate the results