Getting started with Python LanguageList comprehensionsFilterListFunctionsDecoratorsMath ModuleLoopsRandom moduleComparisonsImporting modulesSorting, Minimum and MaximumOperator moduleVariable Scope and BindingBasic Input and OutputFiles & Folders I/OJSON ModuleString MethodsMetaclassesIndexing and SlicingGeneratorsSimple Mathematical OperatorsReduceMap FunctionExponentiationSearchingDictionaryClassesCountingManipulating XMLDate and TimeSetCollections moduleParallel computationMultithreadingWriting extensionsUnit TestingRegular Expressions (Regex)Bitwise OperatorsIncompatibilities moving from Python 2 to Python 3Virtual environmentsCopying dataTupleContext Managers (“with” Statement)Hidden FeaturesEnumString FormattingConditionalsComplex mathUnicode and bytesThe __name__ special variableChecking Path Existence and PermissionsPython NetworkingAsyncio ModuleThe Print Functionos.pathCreating Python packagesParsing Command Line argumentsHTML ParsingSubprocess Librarysetup.pyList slicing (selecting parts of lists)SocketsItertools ModuleRecursionBoolean OperatorsThe dis moduleType Hintspip: PyPI Package ManagerThe locale ModuleExceptionsWeb scraping with PythonDeque ModuleDistributionProperty ObjectsOverloadingDebuggingReading and Writing CSVDynamic code execution with `exec` and `eval`PyInstaller - Distributing Python CodeIterables and IteratorsData Visualization with PythonThe Interpreter (Command Line Console)*args and **kwargsFunctools ModuleGarbage CollectionIndentationSecurity and CryptographyPickle data serialisationurllibBinary DataPython and ExcelIdiomsMethod OverridingDifference between Module and PackageData SerializationPython concurrencyIntroduction to RabbitMQ using AMQPStormPostgreSQLDescriptorCommon PitfallsMultiprocessingtempfile NamedTemporaryFileWorking with ZIP archivesStackProfilingUser-Defined MethodsWorking around the Global Interpreter Lock (GIL)DeploymentLoggingProcesses and ThreadsThe os ModuleComments and DocumentationDatabase AccessPython HTTP ServerAlternatives to switch statement from other languagesList destructuring (aka packing and unpacking)Accessing Python source code and bytecodeMixinsAttribute AccessArcPyPython Anti-PatternsPlugin and Extension ClassesWebsocketsImmutable datatypes(int, float, str, tuple and frozensets)String representations of class instances: __str__ and __repr__ methodsArraysOperator PrecedencePolymorphismNon-official Python implementationsList ComprehensionsWeb Server Gateway Interface (WSGI)2to3 toolAbstract syntax treeAbstract Base Classes (abc)UnicodeSecure Shell Connection in PythonPython Serial Communication (pyserial)Neo4j and Cypher using Py2NeoBasic Curses with PythonPerformance optimizationTemplates in pythonPillowThe pass statementLinked List Nodepy.testDate FormattingHeapqtkinterCLI subcommands with precise help outputDefining functions with list argumentsSqlite3 ModulePython PersistenceTurtle GraphicsConnecting Python to SQL ServerDesign PatternsMultidimensional arraysAudioPygletQueue ModuleijsonWebbrowser ModuleThe base64 ModuleFlaskgroupby()Sockets And Message Encryption/Decryption Between Client and ServerpygameInput, Subset and Output External Data Files using Pandashashlibgetting start with GZipDjangoctypesCreating a Windows service using PythonPython Server Sent EventsMutable vs Immutable (and Hashable) in PythonPython speed of programconfigparserLinked listsCommonwealth ExceptionsOptical Character RecognitionPython Data TypesPartial functionspyautogui modulegraph-toolUnzipping FilesFunctional Programming in PythonPython Virtual Environment - virtualenvsysvirtual environment with virtualenvwrapperCreate virtual environment with virtualenvwrapper in windowsPython Requests PostPlotting with MatplotlibPython Lex-YaccChemPy - python packagepyaudioshelveUsage of "pip" module: PyPI Package ManagerIoT Programming with Python and Raspberry PICode blocks, execution frames, and namespaceskivy - Cross-platform Python Framework for NUI DevelopmentCall Python from C#Similarities in syntax, Differences in meaning: Python vs. JavaScriptWriting to CSV from String or ListRaise Custom Errors / ExceptionsUsing loops within functionsPandas Transform: Preform operations on groups and concatenate the results

Unicode and bytes

Other topics

Basics

In Python 3 str is the type for unicode-enabled strings, while bytes is the type for sequences of raw bytes.

type("f") == type(u"f")  # True, <class 'str'>
type(b"f")               # <class 'bytes'>

In Python 2 a casual string was a sequence of raw bytes by default and the unicode string was every string with "u" prefix.

type("f") == type(b"f")  # True, <type 'str'>
type(u"f")               # <type 'unicode'>

Unicode to bytes

Unicode strings can be converted to bytes with .encode(encoding).

Python 3

>>> "£13.55".encode('utf8')
b'\xc2\xa313.55'
>>> "£13.55".encode('utf16')
b'\xff\xfe\xa3\x001\x003\x00.\x005\x005\x00'

Python 2

in py2 the default console encoding is sys.getdefaultencoding() == 'ascii' and not utf-8 as in py3, therefore printing it as in the previous example is not directly possible.

>>> print type(u"£13.55".encode('utf8'))
<type 'str'>
>>> print u"£13.55".encode('utf8')
SyntaxError: Non-ASCII character '\xc2' in...

# with encoding set inside a file

# -*- coding: utf-8 -*-
>>> print u"£13.55".encode('utf8')
£13.55

If the encoding can't handle the string, a `UnicodeEncodeError` is raised:
>>> "£13.55".encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 0: ordinal not in range(128)

Bytes to unicode

Bytes can be converted to unicode strings with .decode(encoding).

A sequence of bytes can only be converted into a unicode string via the appropriate encoding!

>>> b'\xc2\xa313.55'.decode('utf8')
'£13.55'

If the encoding can't handle the string, a UnicodeDecodeError is raised:

>>> b'\xc2\xa313.55'.decode('utf16')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/csaftoiu/csaftoiu-github/yahoo-groups-backup/.virtualenv/bin/../lib/python3.5/encodings/utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x35 in position 6: truncated data

Encoding/decoding error handling

.encode and .decode both have error modes.

The default is 'strict', which raises exceptions on error. Other modes are more forgiving.

Encoding

>>> "£13.55".encode('ascii', errors='replace')
b'?13.55'
>>> "£13.55".encode('ascii', errors='ignore')
b'13.55'
>>> "£13.55".encode('ascii', errors='namereplace')
b'\\N{POUND SIGN}13.55'
>>> "£13.55".encode('ascii', errors='xmlcharrefreplace')
b'&#163;13.55'
>>> "£13.55".encode('ascii', errors='backslashreplace')
b'\\xa313.55'

Decoding

>>> b = "£13.55".encode('utf8')
>>> b.decode('ascii', errors='replace')
'��13.55'
>>> b.decode('ascii', errors='ignore')
'13.55'
>>> b.decode('ascii', errors='backslashreplace')
'\\xc2\\xa313.55'

Morale

It is clear from the above that it is vital to keep your encodings straight when dealing with unicode and bytes.

File I/O

Files opened in a non-binary mode (e.g. 'r' or 'w') deal with strings. The deafult encoding is 'utf8'.

open(fn, mode='r')                    # opens file for reading in utf8
open(fn, mode='r', encoding='utf16')  # opens file for reading utf16

# ERROR: cannot write bytes when a string is expected:
open("foo.txt", "w").write(b"foo")

Files opened in a binary mode (e.g. 'rb' or 'wb') deal with bytes. No encoding argument can be specified as there is no encoding.

open(fn, mode='wb')  # open file for writing bytes

# ERROR: cannot write string when bytes is expected:
open(fn, mode='wb').write("hi")

Syntax:

  • str.encode(encoding, errors='strict')
  • bytes.decode(encoding, errors='strict')
  • open(filename, mode, encoding=None)

Parameters:

ParameterDetails
encodingThe encoding to use, e.g. 'ascii', 'utf8', etc...
errorsThe errors mode, e.g. 'replace' to replace bad characters with question marks, 'ignore' to ignore bad characters, etc...

Contributors

Topic Id: 1216

Example Ids: 3988,3989,3990

This site is not affiliated with any of the contributors.