Python is a widely used programming language. It is:
High-level: Python automates low-level operations such as memory management. It leaves the programmer with a bit less control but has many benefits including code readability and minimal code expressions.
General-purpose: Python is built to be used in all contexts and environments. An example for a non-general-purpose language is PHP: it is designed specifically as a server-side web-development scripting language. In contrast, Python can be used for server-side web-development, but also for building desktop applications.
Dynamically typed: Every variable in Python can reference any type of data. A single expression may evaluate to data of different types at different times. Due to that, the following code is possible:
if something:
x = 1
else:
x = 'this is a string'
print(x)
Strongly typed: During program execution, you are not allowed to do anything that's incompatible with the type of data you're working with. For example, there are no hidden conversions from strings to numbers; a string made out of digits will never be treated as a number unless you convert it explicitly:
1 + '1' # raises an error
1 + int('1') # results with 2
Beginner friendly :): Python's syntax and structure are very intuitive. It is high level and provides constructs intended to enable writing clear programs on both a small and large scale. Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles. It has a large, comprehensive standard library and many easy-to-install 3rd party libraries.
Its design principles are outlined in The Zen of Python.
Currently, there are two major release branches of Python which have some significant differences. Python 2.x is the legacy version though it still sees widespread use. Python 3.x makes a set of backwards-incompatible changes which aim to reduce feature duplication. For help deciding which version is best for you, see this article.
The official Python documentation is also a comprehensive and useful resource, containing documentation for all versions of Python as well as tutorials to help get you started.
There is one official implementation of the language supplied by Python.org, generally referred to as CPython, and several alternative implementations of the language on other runtime platforms. These include IronPython (running Python on the .NET platform), Jython (on the Java runtime) and PyPy (implementing Python in a subset of itself).
Comprehensions are syntactical constructs which define data structures or expressions unique to a particular language. Proper use of comprehensions reinterpret these into easily-understood expressions. As expressions, they can be used:
[print(x) for x in range(10)]
)In most cases a comprehension or generator expression is more readable, more powerful and more efficient than filter()
or ifilter()
.
list
is a particular type of iterable, but it is not the only one that exists in Python. Sometimes it will be better to use set
, tuple
, or dictionary
list
is the name given in Python to dynamic arrays (similar to vector<void*>
from C++ or Java's ArrayList<Object>
). It is not a linked-list.
Accessing elements is done in constant time and is very fast. Appending elements to the end of the list is amortized constant time, but once in a while it might involve allocation and copying of the whole list
.
List comprehensions are related to lists.
5 basic things you can do with functions:
Assign functions to variables
def f():
print(20)
y = f
y()
# Output: 20
Define functions within other functions (Nested functions )
def f(a, b, y):
def inner_add(a, b): # inner_add is hidden from outer code
return a + b
return inner_add(a, b)**y
Functions can return other functions
def f(y):
def nth_power(x):
return x ** y
return nth_power # returns a function
squareOf = f(2) # function that returns the square of a number
cubeOf = f(3) # function that returns the cube of a number
squareOf(3) # Output: 9
cubeOf(2) # Output: 8
Functions can be passed as parameters to other functions
def a(x, y):
print(x, y)
def b(fun, str): # b has two arguments: a function and a string
fun('Hello', str)
b(a, 'Sophia') # Output: Hello Sophia
Inner functions have access to the enclosing scope (Closure )
def outer_fun(name):
def inner_fun(): # the variable name is available to the inner function
return "Hello "+ name + "!"
return inner_fun
greet = outer_fun("Sophia")
print(greet()) # Output: Hello Sophia!
Importing a module will make Python evaluate all top-level code in this module so it learns all the functions, classes, and variables that the module contains. When you want a module of yours to be imported somewhere else, be careful with your top-level code, and encapsulate it into if __name__ == '__main__':
if you don't want it to be executed when the module gets imported.
When using Python's built-in open()
, it is best-practice to always pass the encoding
argument, if you intend your code to be run cross-platform.
The Reason for this, is that a system's default encoding differs from platform to platform.
While linux
systems do indeed use utf-8
as default, this is not necessarily true for MAC and Windows.
To check a system's default encoding, try this:
import sys
sys.getdefaultencoding()
from any python interpreter.
Hence, it is wise to always sepcify an encoding, to make sure the strings you're working with are encoded as what you think they are, ensuring cross-platform compatiblity.
with open('somefile.txt', 'r', encoding='UTF-8') as f:
for line in f:
print(line)
For full documentation including version-specific functionality, please check the official documentation.
the json
module will handle encoding and decoding of the below types by default:
JSON | Python |
---|---|
object | dict |
array | list |
string | str |
number (int) | int |
number (real) | float |
true, false | True, False |
null | None |
The json
module also understands NaN
, Infinity
, and -Infinity
as their corresponding float values, which is outside the JSON spec.
Python | JSON |
---|---|
dict | object |
list, tuple | array |
str | string |
int, float, (int/float)-derived Enums | number |
True | true |
False | false |
None | null |
To disallow encoding of NaN
, Infinity
, and -Infinity
you must encode with allow_nan=False
. This will then raise a ValueError
if you attempt to encode these values.
There are various hooks which allow you to handle data that needs to be represented differently. Use of functools.partial
allows you to partially apply the relevant parameters to these functions for convenience.
You can provide a function that operates on objects before they are serialised like so:
# my_json module
import json
from functools import partial
def serialise_object(obj):
# Do something to produce json-serialisable data
return dict_obj
dump = partial(json.dump, default=serialise_object)
dumps = partial(json.dumps, default=serialise_object)
There are various hooks that are handled by the json functions, such as object_hook and parse_float. For an exhaustive list for your version of python, see here.
# my_json module
import json
from functools import partial
def deserialise_object(dict_obj):
# Do something custom
return obj
def deserialise_float(str_obj):
# Do something custom
return obj
load = partial(json.load, object_hook=deserialise_object, parse_float=deserialise_float)
loads = partial(json.loads, object_hook=deserialise_object, parse_float=deserialise_float)
The json
module also allows for extension/substitution of the json.JSONEncoder
and json.JSONDecoder
to handle miscellaneous types. The hooks documented above can be added as defaults by creating an equivalently named method. To use these simply pass the class as the cls
parameter to the relevant function. Use of functools.partial
allows you to partially apply the cls parameter to these functions for convenience, e.g.
# my_json module
import json
from functools import partial
class MyEncoder(json.JSONEncoder):
# Do something custom
class MyDecoder(json.JSONDecoder):
# Do something custom
dump = partial(json.dump, cls=MyEncoder)
dumps = partial(json.dumps, cls=MyEncoder)
load = partial(json.load, cls=MyDecoder)
loads = partial(json.loads, cls=MyDecoder)
String objects are immutable, meaning that they can't be modified in place the way a list can. Because of this, methods on the built-in type str
always return a new str
object, which contains the result of the method call.
When designing your architecture, consider that many things which can be accomplished with metaclasses can also be accomplished using more simple semantics:
__init_subclass__()
which allows a class to partake in the creation of its subclass.You can unify the concept of slicing strings with that of slicing other sequences by viewing strings as an immutable collection of characters, with the caveat that a unicode character is represented by a string of length 1.
In mathematical notation you can consider slicing to use a half-open interval of [start, end)
, that is to say that the start is included but the end is not. The half-open nature of the interval has the advantage that len(x[:n])
= n
where len(x)
> =n
, while the interval being closed at the start has the advantage that x[n:n+1]
= [x[n]]
where x
is a list with len(x) >= n
, thus keeping consistency between indexing and slicing notation.
The numbers
module contains the abstract metaclasses for the numerical types:
subclasses | numbers.Number | numbers.Integral | numbers.Rational | numbers.Real | numbers.Complex |
---|---|---|---|---|---|
bool | ✓ | ✓ | ✓ | ✓ | ✓ |
int | ✓ | ✓ | ✓ | ✓ | ✓ |
fractions.Fraction | ✓ | ― | ✓ | ✓ | ✓ |
float | ✓ | ― | ― | ✓ | ✓ |
complex | ✓ | ― | ― | ― | ✓ |
decimal.Decimal | ✓ | ― | ― | ― | ― |
reduce
might be not always the most efficient function. For some types there are equivalent functions or methods:
sum()
for the sum of a sequence containing addable elements (not strings):
sum([1,2,3]) # = 6
str.join
for the concatenation of strings:
''.join(['Hello', ',', ' World']) # = 'Hello, World'
next
together with a generator could be a short-circuit variant compared to reduce
:
# First falsy item:
next((i for i in [100, [], 20, 0] if not i)) # = []
Everything that can be done with map
can also be done with comprehensions
:
list(map(abs, [-1,-2,-3])) # [1, 2, 3]
[abs(i) for i in [-1,-2,-3]] # [1, 2, 3]
Though you would need zip
if you have multiple iterables:
import operator
alist = [1,2,3]
list(map(operator.add, alist, alist)) # [2, 4, 6]
[i + j for i, j in zip(alist, alist)] # [2, 4, 6]
List comprehensions are efficient and can be faster than map
in many cases, so test the times of both approaches if speed is important for you.
All searching algorithms on iterables containing n
elements have O(n)
complexity. Only specialized algorithms like bisect.bisect_left()
can be faster with O(log(n))
complexity.
Helpful items to remember when creating a dictionary:
hash
function to hash it; otherwise TypeError
will be thrown)Not all elements of the XML input will end up as elements of the parsed tree. Currently, this module skips over any XML comments, processing instructions, and document type declarations in the input. Nevertheless, trees built using this module’s API rather than parsing from XML text can have comments and processing instructions in them; they will be included when generating XML output.
Python provides both builtin methods and external libraries for creating, modifying, parsing, and manipulating dates and times.
Sets are unordered and have very fast lookup time (amortized O(1) if you want to get technical). It is great to use when you have a collection of things, the order doesn't matter, and you'll be looking up items by name a lot. If it makes more sense to look up items by an index number, consider using a list instead. If order matters, consider a list as well.
Sets are mutable and thus cannot be hashed, so you cannot use them as dictionary keys or put them in other sets, or anywhere else that requires hashable types. In such cases, you can use an immutable frozenset
.
The elements of a set must be hashable. This means that they have a correct __hash__
method, that is consistent with __eq__
. In general, mutable types such as list
or set
are not hashable and cannot be put in a set. If you encounter this problem, consider using dict
and immutable keys.
There are three other types available in the collections module, namely:
They each act as a wrapper around the tied object, e.g., UserDict acts as a wrapper around a dict object. In each case, the class simulates its named type. The instance's contents are kept in a regular type object, which is accessible via the data attribute of the wrapper instance. In each of these three cases, the need for these types has been partially supplanted by the ability to subclass directly from the basic type; however, the wrapper class can be easier to work with because the underlying type is accessible as an attribute.
Due to the GIL (Global interpreter lock) only one instance of the python interpreter executes in a single process. So in general, using multi-threading only improves IO bound computations, not CPU-bound ones. The multiprocessing
module is recommended if you wish to parallelise CPU-bound tasks.
GIL applies to CPython, the most popular implementation of Python, as well as PyPy. Other implementations such as Jython and IronPython have no GIL.
There are several unit testing tools for Python. This documentation topic describes the basic unittest
module. Other testing tools include py.test
and nosetests
. This python documentation about testing compares several of these tools without going into depth.
There are currently two supported versions of Python: 2.7 (Python 2) and 3.6 (Python 3). Additionally versions 3.3 and 3.4 receive security updates in source format.
Python 2.7 is backwards-compatible with most earlier versions of Python, and can run Python code from most 1.x and 2.x versions of Python unchanged. It is broadly available, with an extensive collection of packages. It is also considered deprecated by the CPython developers, and receives only security and bug-fix development. The CPython developers intend to abandon this version of the language in 2020.
According to Python Enhancement Proposal 373 there are no planned future releases of Python 2 after 25 June 2016, but bug fixes and security updates will be supported until 2020. (It doesn't specify what exact date in 2020 will be the sunset date of Python 2.)
Python 3 intentionally broke backwards-compatibility, to address concerns the language developers had with the core of the language. Python 3 receives new development and new features. It is the version of the language that the language developers intend to move forward with.
Over the time between the initial release of Python 3.0 and the current version, some features of Python 3 were back-ported into Python 2.6, and other parts of Python 3 were extended to have syntax compatible with Python 2. Therefore it is possible to write Python that will work on both Python 2 and Python 3, by using future imports and special modules (like six).
Future imports have to be at the beginning of your module:
from __future__ import print_function
# other imports and instructions go after __future__
print('Hello world')
For further information on the __future__
module, see the relevant page in the Python documentation.
The 2to3 tool is a Python program that converts Python 2.x code to Python 3.x code, see also the Python documentation.
The package six provides utilities for Python 2/3 compatibility:
A reference for differences between Python 2 and Python 3 can be found here.
Virtual environments are sufficiently useful that they probably should be used for every project. In particular, virtual environments allow you to:
Parentheses are only needed for empty tuples or when used in a function call.
A tuple is a sequence of values. The values can be any type, and they are indexed by integers, so in that respect tuples are a lot like lists. The important difference is that tuples are immutable and are hashable, so they can be used in sets and maps
Context managers are defined in PEP 343. They are intended to be used as more succinct mechanism for resource management than try ... finally
constructs. The formal definition is as follows.
In this PEP, context managers provide
__enter__()
and__exit__()
methods that are invoked on entry to and exit from the body of the with statement.
It then goes on to define the with
statement as follows.
with EXPR as VAR: BLOCK
The translation of the above statement is:
mgr = (EXPR) exit = type(mgr).__exit__ # Not calling it yet value = type(mgr).__enter__(mgr) exc = True try: try: VAR = value # Only if "as VAR" is present BLOCK except: # The exceptional case is handled here exc = False if not exit(mgr, *sys.exc_info()): raise # The exception is swallowed if exit() returns true finally: # The normal and non-local-goto cases are handled here if exc: exit(mgr, None, None, None)
Enums were added to Python in version 3.4 by PEP 435.
The Python special variable __name__
is set to the name of the containing module. At the top level (such as in the interactive interpreter, or in the main file) it is set to '__main__'
. This can be used to run a block of statements if a module is being run directly rather than being imported.
The related special attribute obj.__name__
is found on classes, imported modules and functions (including methods), and gives the name of the object when defined.
The pypa sample project contains a complete, easily modifiable template setup.py
that demonstrates a large range of capabilities setup-tools has to offer.
For further information on python packaging see:
For writing official packages there is a packaging user guide.
lst[::-1]
gives you a reversed copy of the liststart
or end
may be a negative number, which means it counts from the end of the array instead of the beginning. So:a[-1] # last item in the array
a[-2:] # last two items in the array
a[:-2] # everything except the last two items
(source)
Recursion needs a stop condition stopCondition
in order to exit the recursion.
The original variable must be passed on to the recursive function so it becomes stored.
Type Hinting, as specified in PEP 484
, is a formalized solution to statically indicate the type of a value for Python Code. By appearing alongside the typing
module, type-hints offer Python users the capability to annotate their code thereby assisting type checkers while, indirectly, documenting their code with more information.
Sometimes, pip
will perfom a manual compilation of native code. On Linux python will automatically choose an available C compiler on your system. Refer to the table below for the required Visual Studio/Visual C++ version on Windows (newer versions will not work.).
Python Version | Visual Studio Version | Visual C++ Version |
---|---|---|
2.6 - 3.2 | Visual Studio 2008 | Visual C++ 9.0 |
3.3 - 3.4 | Visual Studio 2010 | Visual C++ 10.0 |
3.5 | Visual Studio 2015 | Visual C++ 14.0 |
requests
A simple, but powerful package for making HTTP requests.
requests-cache
Caching for requests
; caching data is very useful. In development, it means you can avoid hitting a site unnecessarily. While running a real collection, it means that if your scraper crashes for some reason (maybe you didn't handle some unusual content on the site...? maybe the site went down...?) you can repeat the collection very quickly from where you left off.
scrapy
Useful for building web crawlers, where you need something more powerful than using requests
and iterating through pages.
selenium
Python bindings for Selenium WebDriver, for browser automation. Using requests
to make HTTP requests directly is often simpler for retrieving webpages. However, this remains a useful tool when it is not possible to replicate the desired behaviour of a site using requests
alone, particularly when JavaScript is required to render elements on a page.
BeautifulSoup
Query HTML and XML documents, using a number of different parsers (Python's built-in HTML Parser,html5lib
, lxml
or lxml.html
)
lxml
Processes HTML and XML. Can be used to query and select content from HTML documents via CSS selectors and XPath.
This class is useful when you need an object similar to a list that allows fast append and pop operations from either side (the name deque
stands for “double-ended queue”).
The methods provided are indeed very similar, except that some like pop
, append
, or extend
can be suffixed with left
. The deque
data structure should be preferred to a list if one needs to frequently insert and delete elements at both ends because it allows to do so in constant time O(1).
Note: In Python 2, make sure that your class inherits from object (making it a new-style class) in order for all features of properties to be available.
In exec
, if globals
is locals
(i.e. they refer to the same object), the code is executed as if it is on the module level. If globals
and locals
are distinct objects, the code is executed as if it were in a class body.
If the globals
object is passed in, but doesn't specify __builtins__
key, then Python built-in functions and names are automatically added to the global scope. To suppress the availability of functions such as print
or isinstance
in the executed scope, let globals
have the key __builtins__
mapped to value None
. However, this is not a security feature.
The Python 2 -specific syntax shouldn't be used; the Python 3 syntax will work in Python 2. Thus the following forms are deprecated: <s>
exec object
exec object in globals
exec object in globals, locals
PyInstaller is a module used to bundle python apps in a single package along with all the dependencies. The user can then run the package app without a python interpreter or any modules. It correctly bundles many major packages like numpy, Django, OpenCv and others.
Some important points to remember:
There a few things to note:
The names args
and kwargs
are used by convention, they are not a part of the language specification. Thus, these are equivalent:
def func(*args, **kwargs):
print(args)
print(kwargs)
def func(*a, **b):
print(a)
print(b)
You may not have more than one args
or more than one kwargs
parameters (however they are not required)
def func(*args1, *args2):
# File "<stdin>", line 1
# def test(*args1, *args2):
# ^
# SyntaxError: invalid syntax
def test(**kwargs1, **kwargs2):
# File "<stdin>", line 1
# def test(**kwargs1, **kwargs2):
# ^
# SyntaxError: invalid syntax
If any positional argument follow *args
, they are keyword-only arguments that can only be passed by name. A single star may be used instead of *args
to force values to be keyword arguments without providing a variadic parameter list. Keyword-only parameter lists are only available in Python 3.
def func(a, b, *args, x, y):
print(a, b, args, x, y)
func(1, 2, 3, 4, x=5, y=6)
#>>> 1, 2, (3, 4), 5, 6
def func(a, b, *, x, y):
print(a, b, x, y)
func(1, 2, x=5, y=6)
#>>> 1, 2, 5, 6
**kwargs
must come last in the parameter list.
def test(**kwargs, *args):
# File "<stdin>", line 1
# def test(**kwargs, *args):
# ^
# SyntaxError: invalid syntax
At its core, Python's garbage collector (as of 3.5) is a simple reference counting implementation. Every time you make a reference to an object (for example, a = myobject
) the reference count on that object (myobject) is incremented. Every time a reference gets removed, the reference count is decremented, and once the reference count reaches 0
, we know that nothing holds a reference to that object and we can deallocate it!
One common misunderstanding about how Python memory management works is that the del
keyword frees objects memory. This is not true. What actually happens is that the del
keyword merely decrements the objects refcount, meaning that if you call it enough times for the refcount to reach zero the object may be garbage collected (even if there are actually still references to the object available elsewhere in your code).
Python aggresively creates or cleans up objects the first time it needs them If I perform the assignment a = object(), the memory for object is allocated at that time (cpython will sometimes reuse certain types of object, eg. lists under the hood, but mostly it doesn't keep a free object pool and will perform allocation when you need it). Similarly, as soon as the refcount is decremented to 0, GC cleans it up.
In the 1960's John McCarthy discovered a fatal flaw in refcounting garbage collection when he implemented the refcounting algorithm used by Lisp: What happens if two objects refer to each other in a cyclic reference? How can you ever garbage collect those two objects even if there are no external references to them if they will always refer to eachother? This problem also extends to any cyclic data structure, such as a ring buffers or any two consecutive entries in a doubly linked list. Python attempts to fix this problem using a slightly interesting twist on another garbage collection algorithm called Generational Garbage Collection.
In essence, any time you create an object in Python it adds it to the end of a doubly linked list. On occasion Python loops through this list, checks what objects the objects in the list refer too, and if they're also in the list (we'll see why they might not be in a moment), further decrements their refcounts. At this point (actually, there are some heuristics that determine when things get moved, but let's assume it's after a single collection to keep things simple) anything that still has a refcount greater than 0 gets promoted to another linked list called "Generation 1" (this is why all objects aren't always in the generation 0 list) which has this loop applied to it less often. This is where the generational garbage collection comes in. There are 3 generations by default in Python (three linked lists of objects): The first list (generation 0) contains all new objects; if a GC cycle happens and the objects are not collected, they get moved to the second list (generation 1), and if a GC cycle happens on the second list and they are still not collected they get moved to the third list (generation 2). The third generation list (called "generation 2", since we're zero indexing) is garbage collected much less often than the first two, the idea being that if your object is long lived it's not as likely to be GCed, and may never be GCed during the lifetime of your application so there's no point in wasting time checking it on every single GC run. Furthermore, it's observed that most objects are garbage collected relatively quickly. From now on, we'll call these "good objects" since they die young. This is called the "weak generational hypothesis" and was also first observed in the 60s.
A quick aside: unlike the first two generations, the long lived third generation list is not garbage collected on a regular schedule. It is checked when the ratio of long lived pending objects (those that are in the third generation list, but haven't actually had a GC cycle yet) to the total long lived objects in the list is greater than 25%. This is because the third list is unbounded (things are never moved off of it to another list, so they only go away when they're actually garbage collected), meaning that for applications where you are creating lots of long lived objects, GC cycles on the third list can get quite long. By using a ratio we achieve "amortized linear performance in the total number of objects"; aka, the longer the list, the longer GC takes, but the less often we perform GC (here's the original 2008 proposal for this heuristic by Martin von Löwis for futher reading). The act of performing a garbage collection on the third generation or "mature" list is called "full garbage collection".
So the generational garbage collection speeds things up tremdously by not requiring that we scan over objects that aren't likely to need GC all the time, but how does it help us break cyclic references? Probably not very well, it turns out. The function for actually breaking these reference cycles starts out like this:
/* Break reference cycles by clearing the containers involved. This is
* tricky business as the lists can be changing and we don't know which
* objects may be freed. It is possible I screwed something up here.
*/
static void
delete_garbage(PyGC_Head *collectable, PyGC_Head *old)
The reason generational garbage collection helps with this is that we can keep the length of the list as a separate count; each time we add a new object to the generation we increment this count, and any time we move an object to another generation or dealloc it we decrement the count. Theoretically at the end of a GC cycle this count (for the first two generations anyways) should always be 0. If it's not, anything in the list that's left over is some form of circular reference and we can drop it. However, there's one more problem here: What if the leftover objects have Python's magic method __del__
on them? __del__
is called any time a Python object is destroyed. However, if two objects in a circular reference have __del__
methods, we can't be sure that destroying one won't break the others __del__
method. For a contrived example, imagine we wrote the following:
class A(object):
def __init__(self, b=None):
self.b = b
def __del__(self):
print("We're deleting an instance of A containing:", self.b)
class B(object):
def __init__(self, a=None):
self.a = a
def __del__(self):
print("We're deleting an instance of B containing:", self.a)
and we set an instance of A and an instance of B to point to one another and then they end up in the same garbage collection cycle? Let's say we pick one at random and dealloc our instance of A first; A's __del__
method will be called, it will print, then A will be freed. Next we come to B, we call its __del__
method, and oops! Segfault! A no longer exists. We could fix this by calling everything that's left over's __del__
methods first, then doing another pass to actually dealloc everything, however, this introduces another, issue: What if one objects __del__
method saves a reference of the other object that's about to be GCed and has a reference to us somewhere else? We still have a reference cycle, but now it's not possible to actually GC either object, even if they're no longer in use. Note that even if an object is not part of a circular data structure, it could revive itself in its own __del__
method; Python does have a check for this and will stop GCing if an objects refcount has increased after its __del__
method has been called.
CPython deals with this is by sticking those un-GC-able objects (anything with some form of circular reference and a __del__
method) onto a global list of uncollectable garbage and then leaving it there for all eternity:
/* list of uncollectable objects */
static PyObject *garbage = NULL;
Many of the methods in hashlib
will require you to pass values interpretable as buffers of bytes, rather than strings. This is the case for hashlib.new().update()
as well as hashlib.pbkdf2_hmac
. If you have a string, you can convert it to a byte buffer by prepending the character b
to the start of the string:
"This is a string"
b"This is a buffer of bytes"
The following objects are picklable.
None
, True
, and False
tuple
s, list
s, set
s, and dict
s containing only picklable objects__dict__
or the result of calling __getstate__()
is picklable (see the official docs for details).Based on the official Python documentation.
pickle
and securityThe pickle module is not secure. It should not be used when receiving the serialized data from an untrusted party, such as over the Internet.
It is possible to put a Python package in a ZIP file, and use it that way if you add these lines to the beginning of your script:
import sys
sys.path.append("package.zip")
Why using JSON?
Why not using JSON?
Why Pickle?
Why not Pickle?
The Python developers made sure that the API between threading
and multiprocessing
is similar so that switching between the two variants is easier for programmers.
If you try to open a file that is not a ZIP file, the exception zipfile.BadZipFile
is raised.
In Python 2.7, this was spelled zipfile.BadZipfile
, and this old name is retained alongside the new one in Python 3.2+
From Wikipedia:
In computer science, a stack is an abstract data type that serves as a collection of elements, with two principal operations: push, which adds an element to the collection, and pop, which removes the most recently added element that was not yet removed.
Due to the way their elements are accessed, stacks are also known as Last-In, First-Out (LIFO) stacks.
In Python one can use lists as stacks with append()
as push and pop()
as pop operations. Both operations run in constant time O(1).
The Python's deque
data structure can also be used as a stack. Compared to lists, deque
s allow push and pop operations with constant time complexity from both ends.
Short version: the GIL ensures that no matter how many processors and threads you have, only one thread of a python interpreter will run at one time.
This has a lot of ease-of-use benefits, but also has a lot of negative benefits as well.
Note that a GIL is not a requirment of the Python language. Consequently, you can't access the GIL directly from standard python code. Not all implementations of Python use a GIL.
Interpreters that have a GIL: CPython, PyPy, Cython (but you can disable the GIL with nogil
)
Interpreters that do not have a GIL: Jython, IronPython
CPython automatically releases the GIL when a thread performs an I/O operation. Image processing libraries and numpy number crunching operations release the GIL before doing their processing.
https://wiki.python.org/moin/GlobalInterpreterLock - quick summary of what it does, fine details on all the benefits
http://programmers.stackexchange.com/questions/186889/why-was-python-written-with-the-gil - clearly written summary
http://www.dabeaz.com/python/UnderstandingGIL.pdf - how the GIL works and why it slows down on multiple cores
http://www.dabeaz.com/GIL/gilvis/index.html - visualization of the data showing how the GIL locks up threads
http://jeffknupp.com/blog/2012/03/31/pythons-hardest-problem/ - simple to understand history of the GIL problem
https://jeffknupp.com/blog/2013/06/30/pythons-hardest-problem-revisited/ - details on ways to work around the GIL's limitations
Developers should follow the PEP257 - Docstring Conventions guidelines. In some cases, style guides (such as Google Style Guide ones) or documentation rendering third-parties (such as Sphinx) may detail additional conventions for docstrings.
Python can handle many different types of databases. For each of these types a different API exists. So encourage similarity between those different API's, PEP 249 has been introduced.
This API has been defined to encourage similarity between the Python modules that are used to access databases. By doing this, we hope to achieve a consistency leading to more easily understood modules, code that is generally more portable across databases, and a broader reach of database connectivity from Python. PEP-249
There is NO switch statement in python as a language design choice. There has been a PEP (PEP-3103) covering the topic that has been rejected.
You can find many list of recipes on how to do your own switch statements in python, and here I'm trying to suggest the most sensible options. Here are a few places to check:
Adding a mixin to a class looks a lot like adding a superclass, because it pretty much is just that. An object of a class with the mixin Foo will also be an instance of Foo, and isinstance(instance, Foo)
will return true
This example uses a Search Cursor from the Data Access (da) module of ArcPy.
Do not confuse arcpy.da.SearchCursor syntax with the earlier and slower arcpy.SearchCursor().
The Data Access module (arcpy.da) has only been available since ArcGIS 10.1 for Desktop.
When both methods are implemented, it's somewhat common to have a __str__
method that returns a human-friendly representation (e.g. "Ace of Spaces") and __repr__
return an eval
-friendly representation.
In fact, the Python docs for repr()
note exactly this:
For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object.
What that means is that __str__
might be implemented to return something like "Ace of Spaces" as shown previously, __repr__
might be implemented to instead return Card('Spades', 1)
This string could be passed directly back into eval
in somewhat of a "round-trip":
object -> string -> object
An example of an implementation of such a method might be:
def __repr__(self):
return "Card(%s, %d)" % (self.suit, self.pips)
[1] This output is implementation specific. The string displayed is from cpython.
[2] You may have already seen the result of this str()
/repr()
divide and not known it. When strings containing special characters such as backslashes are converted to strings via str()
the backslashes appear as-is (they appear once). When they're converted to strings via repr()
(for example, as elements of a list being displayed), the backslashes are escaped and thus appear twice.
From the Python documentation:
The following table summarizes the operator precedences in Python, from lowest precedence (least binding) to highest precedence (most binding). Operators in the same box have the same precedence. Unless the syntax is explicitly given, operators are binary. Operators in the same box group left to right (except for comparisons, including tests, which all have the same precedence and chain from left to right and exponentiation, which groups from right to left).
Operator | Description |
---|---|
lambda | Lambda expression |
if – else | Conditional expression |
or | Boolean OR |
and | Boolean AND |
not x | Boolean NOT |
in, not in, is, is not, <, <=, >, >=, <>, !=, == | Comparisons, including membership tests and identity tests |
| | Bitwise OR |
^ | Bitwise XOR |
& | Bitwise AND |
<<, >> | Shifts |
+, - | Addition and subtraction |
*, /, //, % | Multiplication, division, remainder [8] |
+x, -x, ~x | Positive, negative, bitwise NOT |
** | Exponentiation [9] |
x[index], x[index:index], x(arguments...), x.attribute | Subscription, slicing, call, attribute reference |
(expressions...), [expressions...], {key: value...}, expressions... | Binding or tuple display, list display, dictionary display, string conversion |
List comprehensions were outlined in PEP 202 and introduced in Python 2.0.
The 2to3 tool is an python program which is used to convert the code written in Python 2.x to Python 3.x code. The tool reads Python 2.x source code and applies a series of fixers to transform it into valid Python 3.x code.
The 2to3 tool is available in the standard library as lib2to3 which contains a rich set of fixers that will handle almost all code. Since lib2to3 is a generic library, it is possible to write your own fixers for 2to3.
For more details check out pyserial documentation
Curses is a basic terminal ( or character display ) handling module from Python. This can be used to create Terminal based User interfaces or TUIs.
This is a python port of a more popular C library 'ncurses'
When attempting to improve the performance of a Python script, first and foremost you should be able to find the bottleneck of your script and note that no optimization can compensate for a poor choice in data structures or a flaw in your algorithm design. Identifying performance bottlenecks can be done by profiling your script. Secondly do not try to optimize too early in your coding process at the expense of readability/design/quality. Donald Knuth made the following statement on optimization:
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”
Why would you ever want to tell the interpreter to explicitly do nothing?
Python has the syntactical requirement that code blocks (after if
, except
, def
, class
etc.) cannot be empty.
But sometimes an empty code block is useful in itself. An empty class
block can definine a new, different class, such as exception that can be caught. An empty except
block can be the simplest way to express “ask for forgiveness later” if there was nothing to ask for forgiveness for. If an iterator does all the heavy lifting, an empty for
loop to just run the iterator can be useful.
Therefore, if nothing is supposed to happen in a code block, a pass
is needed for such a block to not produce an IndentationError
. Alternatively, any statement (including just a term to be evaluated, like the Ellipsis
literal ...
or a string, most often a docstring) can be used, but the pass
makes clear that indeed nothing is supposed to happen, and does not need to be actually evaluated and (at least temporarily) stored in memory. Here is a small annotated collection of the most frequent uses of pass
that crossed my way – together with some comments on good and bad pratice.
Ignoring (all or) a certain type of Exception
(example from xml
):
try:
self.version = "Expat %d.%d.%d" % expat.version_info
except AttributeError:
pass # unknown
Note: Ignoring all types of raises, as in the following example from pandas
, is generally considered bad practice, because it also catches exceptions that should probably be passed on to the caller, e.g. KeyboardInterrupt
or SystemExit
(or even HardwareIsOnFireError
– How do you know you aren't running on a custom box with specific errors defined, which some calling application would want to know about?).
try:
os.unlink(filename_larry)
except:
pass
Instead using at least except Error:
or in this case preferably except OSError:
is considered much better practice. A quick analysis of all python modules I have installed gave me that more than 10% of all except ...: pass
statements catch all exceptions, so it's still a frequent pattern in python programming.
Deriving an exception class that does not add new behaviour (e.g. in scipy
):
class CompileError(Exception):
pass
Similarly, classes intended as abstract base class often have an explicit empty __init__
or other methods that subclasses are supposed to derive. (e.g. pebl
)
class _BaseSubmittingController(_BaseController):
def submit(self, tasks): pass
def retrieve(self, deferred_results): pass
Testing that code runs properly for a few test values, without caring about the results (from mpmath
):
for x, error in MDNewton(mp, f, (1,-2), verbose=0,
norm=lambda x: norm(x, inf)):
pass
In class or function definitions, often a docstring is already in place as the obligatory statement to be executed as the only thing in the block. In such cases, the block may contain pass
in addition to the docstring in order to say “This is indeed intended to do nothing.”, for example in pebl
:
class ParsingError(Exception):
"""Error encountered while parsing an ill-formed datafile."""
pass
In some cases, pass
is used as a placeholder to say “This method/class/if-block/... has not been implemented yet, but this will be the place to do it”, although I personally prefer the Ellipsis
literal ...
(NOTE: python-3 only) in order to strictly differentiate between this and the intentional “no-op” in the previous example.
For example, if I write a model in broad strokes, I might write
def update_agent(agent):
...
where others might have
def update_agent(agent):
pass
before
def time_step(agents):
for agent in agents:
update_agent(agent)
as a reminder to fill in the update_agent
function at a later point, but run some tests already to see if the rest of the code behaves as intended. (A third option for this case is raise NotImplementedError
. This is useful in particular for two cases: Either “This abstract method should be implemented by every subclass, there is no generic way to define it in this base class”, or “This function, with this name, is not yet implemented in this release, but this is what its signature will look like”)
The capitalization of the tkinter module is different between Python 2 and 3. For Python 2 use the following:
from Tkinter import * # Capitalized
For Python 3 use the following:
from tkinter import * # Lowercase
For code that works with both Python 2 and 3, you can either do
try:
from Tkinter import *
except ImportError:
from tkinter import *
or
from sys import version_info
if version_info.major == 2:
from Tkinter import *
elif version_info.major == 3:
from tkinter import *
See the tkinter Documentation for more details
Different ways to create subcommands like in hg
or svn
with the command line interface shown in the help message:
usage: sub <command>
commands:
status - show status
list - print list
The following table lists predefined browser types. The left column are names that can be passed into the webbrowser.get()
method and the right column lists the class names for each browser type.
Type Name | Class Name |
---|---|
'mozilla' | Mozilla('mozilla') |
'firefox' | Mozilla('mozilla') |
'netscape' | Mozilla('netscape') |
'galeon' | Galeon('galeon') |
'epiphany' | Galeon('epiphany') |
'skipstone' | BackgroundBrowser('skipstone') |
'kfmclient' | Konqueror() |
'konqueror' | Konqueror() |
'kfm' | Konqueror() |
'mosaic' | BackgroundBrowser('mosaic') |
'opera' | Opera() |
'grail' | Grail() |
'links' | GenericBrowser('links') |
'elinks' | Elinks('elinks') |
'lynx' | GenericBrowser('lynx') |
'w3m' | GenericBrowser('w3m') |
'windows-default' | WindowsDefault |
'macosx' | MacOSX('default') |
'safari' | MacOSX('safari') |
'google-chrome' | Chrome('google-chrome') |
'chrome' | Chrome('chrome') |
'chromium' | Chromium('chromium') |
'chromium-browser' | Chromium('chromium-browser') |
Up until Python 3.4 came out, base64 encoding and decoding functions only worked with bytes
or bytearray
types. Now these functions accept any bytes-like object.
groupby() is tricky but a general rule to keep in mind when using it is this:
Always sort the items you want to group with the same key you want to use for grouping
It is recommended that the reader take a look at the documentation here and see how it is explained using a class definition.
Language Used: Python 2.7 (Download Link: https://www.python.org/downloads/ )
Library Used:
*PyCrypto (Download Link: https://pypi.python.org/pypi/pycrypto )
*PyCryptoPlus (Download Link: https://github.com/doegox/python-cryptoplus )
Library Installation:
PyCrypto: Unzip the file. Go to the directory and open terminal for linux(alt+ctrl+t) and CMD(shift+right click+select command prompt open here) for windows. After that write python setup.py install (Make Sure Python Environment is set properly in Windows OS)
PyCryptoPlus: Same as the last library.
Tasks Implementation: The task is separated into two parts. One is handshake process and another one is communication process. Socket Setup:
As the creating public and private keys as well as hashing the public key, we need to setup the socket now. For setting up the socket, we need to import another module with “import socket” and connect(for client) or bind(for server) the IP address and the port with the socket getting from the user.
----------Client Side----------
server = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
host = raw_input("Server Address To Be Connected -> ")
port = int(input("Port of The Server -> "))
server.connect((host, port))
----------Server Side---------
try:
#setting up socket
server = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
server.bind((host,port))
server.listen(5)
except BaseException: print "-----Check Server Address or Port-----"
“ socket.AF_INET,socket.SOCK_STREAM” will allow us to use accept() function and messaging fundamentals. Instead of it, we can use “ socket.AF_INET,socket.SOCK_DGRAM” also but that time we will have to use setblocking(value) .
Handshake Process:
random_generator = Random.new().read
key = RSA.generate(1024,random_generator)
public = key.publickey().exportKey()
random_generator is derived from “from Crypto import Random” module. Key is derived from “from Crypto.PublicKey import RSA” which will create a private key, size of 1024 by generating random characters. Public is exporting public key from previously generated private key.
(CLIENT)After creating the public and private key, we have to hash the public key to send over to the server using SHA-1 hash. To use the SHA-1 hash we need to import another module by writing “import hashlib” .To hash the public key we have write two lines of code:
hash_object = hashlib.sha1(public)
hex_digest = hash_object.hexdigest()
Here hash_object and hex_digest is our variable. After this, client will send hex_digest and public to the server and Server will verify them by comparing the hash got from client and new hash of the public key. If the new hash and the hash from the client matches, it will move to next procedure. As the public sent from the client is in form of string, it will not be able to be used as key in the server
side. To prevent this and converting string public key to rsa public key, we need to write server_public_key = RSA.importKey(getpbk)
,here getpbk is the public key from the client.
(SERVER)The next step is to create a session key. Here, I have used “os” module to create a random key “key = os.urandom(16)” which will give us a 16bit long key and after that I have encrypted that key in “AES.MODE_CTR” and hash it again with SHA-1:
#encrypt CTR MODE session key
en = AES.new(key_128,AES.MODE_CTR,counter = lambda:key_128) encrypto = en.encrypt(key_128)
#hashing sha1
en_object = hashlib.sha1(encrypto)
en_digest = en_object.hexdigest()
So the en_digest will be our session key.
(SERVER) For the final part of the handshake process is to encrypt the public key got from the client and the session key created in server side.
#encrypting session key and public key
E = server_public_key.encrypt(encrypto,16)
After encrypting, server will send the key to the client as string.
(CLIENT) After getting the encrypted string of (public and session key) from the server, client will decrypt them using Private Key which was created earlier along with the public key. As the encrypted (public and session key) was in form of string, now we have to get it back as a key by using eval() . If the decryption is done, the handshake process is completed also as both sides confirms that they are using same keys. To decrypt:
en = eval(msg)
decrypt = key.decrypt(en)
# hashing sha1
en_object = hashlib.sha1(decrypt) en_digest = en_object.hexdigest()
I have used the SHA-1 here so that it will be readable in the output.
Communication Process:
For communication process, we have to use the session key from both side as the KEY for IDEA encryption MODE_CTR. Both side will encrypt and decrypt messages with IDEA.MODE_CTR using the session key.
(Encryption) For IDEA encryption, we need key of 16bit in size and counter as must callable. Counter is mandatory in MODE_CTR. The session key that we encrypted and hashed is now size of 40 which will exceed the limit key of the IDEA encryption. Hence, we need to reduce the size of the session key. For reducing, we can use normal python built in function string[value:value]. Where the value can be any value according to the choice of the user. In our case, I have done “key[:16]” where it will take from 0 to 16 values from the key. This conversion could be done in many ways like key[1:17] or key[16:]. Next part is to create new IDEA encryption function by writing IDEA.new() which will take 3 arguments for processing. The first argument will be KEY,second argument will be the mode of the IDEA encryption (in our case, IDEA.MODE_CTR) and the third argument will be the counter= which is a must callable function. The counter= will hold a size of of string which will be returned by the function. To define the counter= , we must have to use a reasonable values. In this case, I have used the size of the KEY by defining lambda. Instead of using lambda, we could use Counter.Util which generates random value for counter= . To use Counter.Util, we need to import counter module from crypto. Hence, the code will be:
ideaEncrypt = IDEA.new(key, IDEA.MODE_CTR, counter=lambda : key)
Once defining the “ideaEncrypt” as our IDEA encryption variable, we can use the built in encrypt function to encrypt any message.
eMsg = ideaEncrypt.encrypt(whole)
#converting the encrypted message to HEXADECIMAL to readable eMsg =
eMsg.encode("hex").upper()
In this code segment, whole is the message to be encrypted and eMsg is the encrypted message. After encrypting the message, I have converted it into HEXADECIMAL to make readable and upper() is the built in function to make the characters uppercase. After that, this encrypted message will be sent to the opposite station for decryption.
To decrypt the encrypted messages, we will need to create another encryption variable by using the same arguments and same key but this time the variable will decrypt the encrypted messages. The code for this same as the last time. However, before decrypting the messages, we need to decode the message from hexadecimal because in our encryption part, we encoded the encrypted message in hexadecimal to make readable. Hence, the whole code will be:
decoded = newmess.decode("hex")
ideaDecrypt = IDEA.new(key, IDEA.MODE_CTR, counter=lambda: key)
dMsg = ideaDecrypt.decrypt(decoded)
These processes will be done in both server and client side for encrypting and decrypting.
All return values from ConfigParser.ConfigParser().get
are strings. It can be converted to more common types thanks to eval
As stated in Python doc the functools.partial:
Return a new partial object which when called will behave like func called with the positional arguments args and keyword arguments keywords. If more arguments are supplied to the call, they are appended to args. If additional keyword arguments are supplied, they extend and override keywords.
Check this link to see how partial can be implemented.
For details on all sys module members, refer to the official documentation.
Additional links:
Note: stream_callback is called in a separate thread (from the main thread). Exceptions that occur in the stream_callback will:
1.print a traceback on standard error to aid debugging,
2.queue the exception to be thrown (at some point) in the main thread, and
3.return paAbort to PortAudio to stop the stream.
Note: Do not call Stream.read() or Stream.write() if using non-blocking operation.
See: PortAudio’s callback signature for additional details :
http://portaudio.com/docs/v19-doxydocs/portaudio_8h.html#a8a60fb2a5ec9cbade3f54a9c978e2710
Note: Do not rely on the shelf being closed automatically; always call close()
explicitly when you don’t need it any more, or use shelve.open()
as a context manager:
with shelve.open('spam') as db:
db['eggs'] = 'eggs'
Because the shelve
module is backed by pickle
, it is insecure to load a shelf from an untrusted source. Like with pickle, loading a shelf can execute arbitrary code.
1. The choice of which database package will be used (such as dbm.ndbm or dbm.gnu) depends on which interface is available. Therefore it is not safe to open the database directly using dbm. The database is also (unfortunately) subject to the limitations of dbm, if it is used — this means that (the pickled representation of) the objects stored in the database should be fairly small, and in rare cases key collisions may cause the database to refuse updates.
2.The shelve module does not support concurrent read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing. Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used.
Note that in the example above data is serialized using MongoDB.Bson library that can be installed via NuGet manager.
Otherwise, you can use any JSON serialization library of your choice.
Below are inter-process communication implementation steps:
Input arguments are serialized into JSON string and saved in a temporary text file:
BsonDocument argsBson = BsonDocument.Parse("{ 'x' : '1', 'y' : '2' }"); string argsFile = string.Format("{0}\\{1}.txt", Path.GetDirectoryName(pyScriptPath), Guid.NewGuid());
Python interpreter python.exe runs the python script that reads JSON string from a temporary text file and backs-out input arguments:
filename = sys.argv[ 1 ] with open( filename ) as data_file: input_args = json.loads( data_file.read() ) x, y = [ float(input_args.get( key )) for key in [ 'x', 'y' ] ]
Python script is executed and output dictionary is serialized into JSON string and printed to the command window:
print json.dumps( { 'sum' : x + y , 'subtract' : x - y } )
Read output JSON string from C# application:
using (StreamReader myStreamReader = process.StandardOutput) { outputString = myStreamReader.ReadLine(); process.WaitForExit(); }
I am using the inter-process communication between C# and Python scripts in one of my projects that allows calling Python scripts directly from Excel spreadsheets.
The project utilizes ExcelDNA add-in for C# - Excel binding.
The source-code is stored in the GitHub repository.
Below are links to wiki pages that provide an overview of the project and help to get started in 4 easy steps.
I hope you find the example and the project useful.
open( path, "wb")
"wb"
- Write mode.
The b
parameter in "wb"
we have used, is necessary only if you want to open it in binary mode, which is needed only in some operating systems like Windows.
csv.writer ( csv_file, delimiter=',' )
Here the delimiter we have used, is ,
, because we want each cell of data in a row, to contain the first name, last name, and age respectively.
Since our list is split along the ,
too, it proves rather convenient for us.