James's Ramblings

Python for Devops

Created: April 20, 2020

Notes written when reading a Python for Devops book. (On-going.)

Basic Math

  • Integer division operator: //. 5//2 = 2.
  • Modules operator: %. 5%2 = 2.

Comments

# one line

"""
Multi line
"""
'''
Also multi line
'''

Range

>>> range(10) 
range(0, 10) 
>>> list(range(10)) 
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 
>>> list(range(5, 10))
[5, 6, 7, 8, 9] 
>>> list(range(5, 10, 3)) 
[5, 8]
>>> list(range(15,10,-2))
[15, 13, 11]
  • Range is technically a type representing a sequence of numbers, not a function.
  • With three arguments: the last value is the step. The step can be negative.

if/elif/else

if [CONDITION1]:
	[DO_SOMETHING]
elif [CONDITION2]:
	[DO_SOMETHING_ELSE]
else:
	[DO_A_THIRD_THING]

for loop

for i in range(INT):
	[DO_SOMETHING]

while loop

while [CONDITION]:
	[DO_SOMETHING]

break and continue

  • continue skips the rest of the commands in the current iteration of a loop.

  • break terminates the loop. In a while loop it can be used as an alternative control statement:

    while True:
    	if [SOMETHING]:
    		break
    

Exceptions

try:
	[DO_SOMETHING]
except [ERROR_TYPE] as [IDENTIFIER]:
	[DO_SOMETHING_ELSE]

Type function

  • Returns a variables type.

Classes and functions

class FancyCar():
	wheels = 4
	def driveFast(self):
		print("Driving so fast")
my_car = FancyCar()
my_car.wheels
4
my_car.driveFast()
Driving so fast

Functions

def <FUNCTION NAME>(<PARAMETERS>):
	'''A doc string.
	'''
	<CODE BLOCK>
  • If a string using multiline syntax is provided first in the indented block, it acts as documentation.

  • Arguments can be passed with keywords in addition to the usual method. When using keyword parameters, all parameters defined after a keyword parameter must be keyword parameters as well.

  • This allows default values to be specified and the values to be passed in any order.

    >>> def keywords(first=1, second=2):
    ...     print(f"first: {first}") 
    ...     print(f"second: {second}")
    >>> keywords(0) 
    first: 0 
    second: 2
    >>> keywords(second='one', first='two') 
    first: two 
    second: one
    
  • All functions return a value. The return keyword is used to set this value. If not set from a function definition, the function returns None.

  • Functions are objects. They can be passed around, or stored in data structures.

    >>> def double(input): 
    ...     return input*2 
    ... 
    >>> double 
    <function double at 0x107d34ae8> 
    >>> type(double) 
    <class 'function'> 
    >>> def triple(input): 
    ...     return input*3 
    ... 
    >>> functions = [double, triple] 
    >>> for function in functions: 
    ...     print(function(3))
    ... 
    ... 
    6 
    9
    
  • lambda functions are unnamed (anonymous) functions. lambda <PARAM>: <RETURN EXPRESSION>

  • These functions should be very short and should only be usually only be used when calling another function.

    >>> items = [[0, 'a', 2], [5, 'b', 0], [2, 'c', 1]]
    >>> sorted(items, key=lambda item: item[1]) # sort by the second list value
    [[0, 'a', 2], [5, 'b', 0], [2, 'c', 1]]
    >>> sorted(items, key=lambda item: item[2]) # sort by the third list value
    [[5, 'b', 0], [2, 'c', 1], [0, 'a', 2]]
    

Sequences

  • list, tuple, range, string and binary types.

Lists

  • Ordered collection of items of any time.
  • Items can be of different types.
  • Square brackets indicate a list.
>>> list() 
[] 
>>> list(range(10)) 
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 
>>> list("Henry Miller") 
['H', 'e', 'n', 'r', 'y', ' ', 'M','i', 'l', 'l', 'e', 'r']

List methods

  • list.append([SOMETHING]) to add an item to the end of a list.
  • list.insert([INDEX],[SOMETHING]) to add an item at a specific index.
  • list1.extend(list2) append list2 to the end of list1.
  • list.pop() pop the last time from a list.
  • list.pop(1) pop the item at index 1 from a list. Inefficient.
  • list.remove([SOMETHING]) remove the first occurrence of an item from a list.

List comprehension

  • Populate a list in a concise manner.
  • What would be the inner block content if put first.
  • Filtering can be done using if statements.
>>> squares = [i*i for i in range(10)]
>>> squares 
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> squares = [i*i for i in range(10) if i%2==0] 
>>> squares 
[0, 4, 16, 36, 64] >>>

Strings

  • UTF-8 encoding by default.

  • Strings are immutable.

  • str() creates an empty string.

  • Single or double quotes.

  • Use str(OBJECT) to turn an object into a string.

  • Use triple quotes for multi-line strings.

    >>> multi_line = """This is a 
    ... multi-line string, 
    ... which includes linebreaks. 
    ... """
    
  • string.strip() removes whitespace from the beginning and end of a string.

  • string.rstrip() removes whitespace from the end of a string.

  • string.lstrip() removes whitespace from the beginning of a string.

  • string.ljust(x) if the string’s length is less than x, adds whitespace to the end of the string, to bring the length to x.

  • string.rjust(y,[CONSTANT]) if the string’s length is less than y, adds a constant to the beginning of the string, to bring the length to y.

  • string.split() split a string into a list. The default delimiter used is spaces. An alternative delimiter can be specified inside the parentheses.

    >>> text = "Mary had a little lamb" 
    >>> text.split() 
    ['Mary', 'had', 'a', 'little', 'lamb']
    >>> url = "gt.motomomo.io/v2/api/asset/143" 
    >>> url.split('/') 
    ['gt.motomomo.io', 'v2', 'api', 'asset', '143']
    
  • string1.join(sequence) can be used to create a new string, where string1 is a delimiter between items in the list.

    >>> items = ['cow', 'milk', 'bread', 'butter'] 
    >>> " and ".join(items) 
    'cow and milk and bread and butter'
    
  • string.capitalize() capitalise the first letter.

  • string.upper() all characters to uppercase.

  • string.title() the first character of every word to uppercase.

  • string.swapcase() toggle the case of every character.

  • string.lower() all characters to lowercase.

  • string1.startswith(string2)returns True if a string starts with a certain substring.

  • string1.endswith(string2)returns True if a string ends with a certain substring.

  • string.isalnum() returns True if a string contains only alphanumeric characters.

  • string.isalpha() returns True if a string contains only alphabetic characters.

  • string.isnumeric() returns True if a string contains only numeric characters.

  • string.istitle() returns True if the first character in every worse is capitalised.

  • string.islower() returns True if all alphabetic characters in the string are lowercase.

  • string.isupper() returns True if all alphabetic characters in the string are uppercase.

  • The old printf equivalent:

    >>> "%s + %s = %s" % (1, 2, "Three") 
    '1 + 2 = Three'
    >>> "%.3f" % 1.234567 
    '1.235'
    
    • Can cause errors, so not recommended.
  • Recommended alternative is string.format:

    >>> '{} comes before {}'.format('first', 'second')
    'first comes before second'
    >>> '{1} comes after {0}, but {1} comes before {2}'.format('first',            'second','third')
    'second comes after first, but second comes before third'
    
  • Dict values can also be used:

    >>> '''{country} is an island. 
    ... {country} is off of the coast of 
    ... {continent} in the {ocean}'''.format(ocean='Indian Ocean', 
    ...                                      
    continent='Africa', 
    ...                                      
    country='Madagascar') 
    'Madagascar is an island. Madagascar is off of the coast of Africa in the Indian Ocean'
      
    >>> values = {'first': 'Bill','last': 'Bailey'}
    >>> "Won't you come home {first} {last}?".format(**values) 
    "Won't you come home Bill Bailey?"
    
  • Format specifications are done using the format specification mini-language:

    >>> text = "|{0:>22}||{0:<22}|" 
    >>> text.format('O','O') 
    '|                     O||O                     |' 
    >>> text = "|{0:<>22}||{0:><22}|" 
    >>> text.format('O','O') 
    '|<<<<<<<<<<<<<<<<<<<<<O||O>>>>>>>>>>>>>>>>>>>>>|'
    
  • Python f-strings use the same formatting language as the format method, but offer a more straightforward and intuitive mechanism for using them.

    >>> a = 1 
    >>> b = 2 
    >>> f"a is {a}, b is {b}. Adding them results in {a + b}"
    'a is 1, b is 2. Adding them results in 3'
      
    >>> count = 43 
    >>> f"|{count:5d}" 
    '|   43'
      
    >>> padding = 10 
    >>> f"|{count:{padding}d}" 
    '|        43'
    

Dictionaries

  • It’s possible to convert a nested list to a dict:

    ```

    kv_list = [[‘key-1’, ‘value-1’], [‘key-2’, ‘value-2’]] dict(kv_list) {‘key-1’: ‘value-1’, ‘key-2’: ‘value-2’}

  • dict.value() returns all dict values.

  • dict.keys() returns all keys.

  • Dict comprehension:

    >>> letters = 'abcde' >>> 
    # mapping individual letters to their upper-case representations 
    >>> cap_map = {x: x.upper() for x in letters} 
    >>> cap_map['b'] 
    'B'
    

Tuples

  • Tuples are ordered and immutable.
  • Tuples are defined using parentheses.
  • An empty tuple can be created with () or tuple().

Sequence operations

in/not in operators

>>> 2 in [1,2,3] 
True 
>>> 'a' not in 'cat' 
False 
>>> 10 in range(12) 
True 
>>> 10 not in range(2, 4)
True

Referencing a sequence

  • Square brackets and an integer, like most other languages.
  • 0 is the first item.
  • -1 is the last item.
  • -2 is the second to last item.
>>> my_sequence = "Bill Cheatham" 
>>> my_sequence[–1] 
'm' 
>>> my_sequence[–2] 
'a' 
>>> my_sequence[–13] 
'B'

index method

  • Searches a sequence for the first occurrence of an constant.
  • The second and third arguments define a sub-range to search.
>>> my_sequence = "Bill Cheatham" 
>>> my_sequence.index('C') 
5
>>> my_sequence.index('a',9, 12)
11

Slicing

  • the_sequence[start:stop:step]
  • If values aren’t specified, defaults are used.
  • Defaults are: [0:sequence.length:1]
>>> my_sequence = ['a', 'b', 'c', 'd', 'e', 'f', 'g'] 
>>> my_sequence[2:5] 
['c', 'd', 'e'] 
>>> my_sequence[:5] 
['a', 'b', 'c', 'd', 'e'] 
>>> my_sequence[3:] 
['d', 'e', 'f', 'g']
  • Negative numbers can be used to index backwards:
>>> my_sequence[–6:] 
['b', 'c', 'd', 'e', 'f', 'g']
>>> my_sequence[3:–1] 
['d', 'e', 'f']
>>> my_sequence[3:–1] 
['d', 'e', 'f']

length, min max

  • Min and max only works on sequences with items that are comparable.
  • len(my_sequence)
  • min(my_sequence)
  • max(my_sequence)

Regular expressions

  • import re

  • re.search(r'regex',[WHAT_TO_SEARCH]) searches for text matching a regular expression

    import re
    >>>> re.search(r'Rostam', cc_list) 
    <re.Match object; span=(32, 38), match='Rostam'>
    
  • re.search can be used as a condition in an if statement – to test for whether a value was matched.

  • Standard regex syntax:
    • [R,B] – R or B.
    • [i,y] – i or y.
    • [a-z] any single lowercase alphabetic character.
    • [A-Za-z] any single alphabetic character.
    • + multiplier for one or more.
    • \ is the escape character.
    • \w is the equivalent of [a-zA-Z0-9_].
    • \d is the equivalent of [0-9]
  • Groups can be defined with parentheses and access using re.group(INT). re.group(0) is the whole match.

    >>> matched = re.search(r'(\w+)\@(\w+)\.(\w+)', cc_list) 
    >>> matched.group(0) 'ekoenig@vpwk.com' 
    >>> matched.group(1) 'ekoenig' 
    >>> matched.group(2) 'vpwk' 
    >>> matched.group(3) 'com'
    
  • Names can be supplied for the groups by adding ?P<Name> in the group definition. Groups can then be accessed by name instead of a number:

      
    

    matched = re.search(r’(?P\w+)\@(?P\w+)\.(?P\w+)', cc_list) matched.group('name') 'ekoenig' print(f'''name: {matched.group("name")} ... Secondary Level Domain: {matched.group("SLD")} ... Top Level Domain: {matched.group("TLD")}''') name: ekoenig Secondary Level Domain: vpwk Top Level Domain: com

      
    
  • findall can be used to return all of the matches as a list of strings:

    >>> matched = re.findall(r'\w+\@\w+\.\w+', cc_list) 
    >>> matched ['ekoenig@vpwk.com', 'rostam@vpwk.com', 'ctomson@vpwk.com', 'cbaio@vpwk.com'] 
    >>> matched = re.findall(r'(\w+)\@(\w+)\.(\w+)', cc_list) 
    >>> matched [('ekoenig', 'vpwk', 'com'), ('rostam', 'vpwk', 'com'),  ('ctomson', 'vpwk', 'com'), ('cbaio', 'vpwk', 'com')] 
    >>> names = [x[0] for x in matched] 
    >>> names ['ekoenig', 'rostam', 'ctomson', 'cbaio']
    
  • For dealing with large input, such as logs, it may be necessary to use finditer to iterate over each line:

    >>> matched = re.finditer(r'\w+\@\w+\.\w+', cc_list) 
    >>> matched <callable_iterator object at 0x108e68748> 
    >>> next(matched) 
    <re.Match object; span=(13, 29), match='ekoenig@vpwk.com'> 
    >>> next(matched) 
    <re.Match object; span=(51, 66), match='rostam@vpwk.com'> 
    >>> next(matched) 
    <re.Match object; span=(83, 99), match='ctomson@vpwk.com'>
    

    The iterator object, matched, can be used in a for loop as well:

    >>> matched = re.finditer("(?P<name>\w+)\@(?P<SLD>\w+)\.(?P<TLD>\w+)", cc_list) 
    >>> for m in matched: 
    ...     print(m.groupdict()) 
    ... 
    ... 
    {'name': 'ekoenig', 'SLD': 'vpwk', 'TLD': 'com'} 
    {'name': 'rostam', 'SLD': 'vpwk', 'TLD': 'com'} 
    {'name': 'ctomson', 'SLD': 'vpwk', 'TLD': 'com'} 
    {'name': 'cbaio', 'SLD': 'vpwk', 'TLD': 'com'}
    
  • Regex can also be used to substitute:

    >>> re.sub("\d", "#", "The passcode you entered was  09876") 
    'The passcode you entered was  #####' 
    >>> users = re.sub("(?P<name>\w+)\@(?P<SLD>\w+)\.(?P<TLD>\w+)",                    					   "\g<TLD>.\g<SLD>.\g<name>", cc_list) 
    >>> print(users) 
    Ezra Koenig <com.vpwk.ekoenig>, 
    Rostam Batmanglij <com.vpwk.rostam>, 
    Chris Tomson <com.vpwk.ctomson, 
    Chris Baio <com.vpwk.cbaio
    
  • If the same match is going to happen many times, re.compile can be used to compile the regex, which is more efficient for continuous use:

    >>> regex = re.compile(r'\w+\@\w+\.\w+')
    >>> regex.search(cc_list) 
    <re.Match object; span=(13, 29), match='ekoenig@vpwk.com'>
    

Lazy Evaluation

  • Lazy evaluation is the idea that, especially when dealing with large amounts of data, you do not want process all of the data before using the results.
  • You have already seen this with the range type, where the memory footprint is the same, even for one representing a large group of numbers.

Generators

  • You can use generators in a similar way as range objects. They perform some operation on data in chunks as requested. They pause their state in between calls. This means that you can store variables that are needed to calculate output, and they are accessed every time the generator is called.

  • To write a generator function, use the yield keyword rather than a return statement. Every time the generator is called, it returns the value specified by yield and then pauses its state until it is next called.

    >>> def count(): 
    ...     n = 0 
    ...     while True: 
    ...         n += 1 ...         
    yield n 
    ... 
    ... 
    >>> counter = count() 
    >>> counter <generator object count at 0x10e8509a8> 
    >>> next(counter) 
    1 
    >>> next(counter) 
    2 
    >>> next(counter) 
    3
    
  • Note that the generator keeps track of the value of n.

  • Generators can also be in used for loops:

    >>> def fib(): 
    ...     first = 0 
    ...     last = 1 
    ...     while True:
    ...         first, last = last, first + last 
    ...         yield first 
    ... 
    >>> f = fib() 
    >>> for x in f: 
    ...     print(x) 
    ...     if x > 12: 
    ...         break 
    ... 
    1 
    1 
    2 
    3 
    5 
    8 
    13
    

Generator Comprehensions

  • We can use generator comprehensions to create one-line generators.

  • They are created using a syntax similar to list comprehensions, but parentheses are used rather than square brackets:

    >>> list_o_nums = [x for x in range(100)] 
    >>> gen_o_nums = (x for x in range(100)) 
    >>> list_o_nums 
    [0, 1, 2, 3, ...  97, 98, 99] 
    >>> gen_o_nums 
    <generator object <genexpr> at 0x10ea14408>
    # Memory consumption:
    >>> import sys 
    >>> sys.getsizeof(list_o_nums)
    912 
    >>> sys.getsizeof(gen_o_nums) 
    120
    

iPython

  • To run a shell command prepend an exclamation mark: ls = !ls.
  • The output of the command is assigned to a Python variable ls.

iPython grep, fields and sort

  • The type of this variable is IPython.utils.text.SList. The SList type converts a regular shell command into an object that has three main methods: fields, grep, and sort.

    In [6]: df = !df 
    In [7]: df.sort(3, nums = True)
    In [10]: ls = !ls -l /usr/bin 
    In [11]: ls.grep("kill")
    

Magic commands

  • Magic commands use two percentage signs and are used to run something external within the iPython shell.

  • Writing a quick bash script:

    In [13]: %%bash     
    ...: uname -a     
    ...:     
    ...: 
    Darwin nogibjj.local 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar ...
    
  • Creating a python script file:

    In [16]: %%writefile print_time.py     
    ...: #!/usr/bin/env python     
    ...: import datetime     
    ...: print(datetime.datetime.now().time())     
    ...:     
    ...:     
    ...:
    Writing print_time.py
    In [18]: !python print_time.py 
    19:06:00.594914
    
  • Checking what is loaded into memory:

    In [20]: %who 
    df     ls     var_ls
    

Automating Files and the Filesystem

open

  • You can use the open function to create a file object that can read and write files.

  • It takes two arguments, the path of the file and the mode (mode optionally defaults to reading).

  • You use the mode to indicate, among other things, if you want to read or write a file and if it is text or binary data.

  • You can open a text file using the mode r to read its contents.

  • The file object has a read method that returns the contents of the file as a string:

    In [1]: file_path = 'bookofdreams.txt' 
    In [2]: open_file = open(file_path, 'r') 
    In [3]: text = open_file.read() 
    In [4]: len(text) 
    Out[4]: 476909  
    In [5]: text[56] 
    Out[5]: 's'  
    In [6]: open_file 
    Out[6]: <_io.TextIOWrapper name='bookofdreams.txt' mode='r' encoding='UTF-8'>  
    In [7]: open_file.close()
    
  • It is a good practice to close a file when you finish with it. Python closes a file when it is out of scope, but until then the file consumes resources and may prevent other processes from opening it.

readlines

  • You can also read a file using the readlines method.

  • This method reads the file and splits its contents on newline characters. It returns a list of strings. Each string is one line of the original text:

    In [8]: open_file = open(file_path, 'r') 
    In [9]: text = open_file.readlines() 
    In [10]: len(text) 
    Out[10]: 8796  
    In [11]: text[100] 
    Out[11]: 'science, when it admits the possibility of occasional hallucinations\n'  
    In [12]: open_file.close()
    

with

  • A handy way of opening files is to use with statements.

  • You do not need to close a file explicitly in this case. Python closes it and releases the file resource at the end of the indented block:

    In [13]: with open(file_path, 'r') as open_file:
    ...:     text = open_file.readlines()     
    ...:  In [14]: text[101] 
    Out[14]: 'in the sane and healthy, also admits, of course, the existence of\n'  
    In [15]: open_file.closed 
    Out[15]: True
    

Windows vs Unix line breaks

  • Different operating systems use different escaped characters to represent line endings.

  • Unix systems use \n and Windows systems use \r\n.

  • Python converts these to \n when you open a file as text. If you are opening a binary file, such as a .jpeg image, you are likely to corrupt the data by this conversion if you open it as text.

  • You can, however, read binary files by appending a b to mode:

    In [15]: file_path = 'bookofdreamsghos00lang.pdf' 
    In [16]: with open(file_path, 'rb') as open_file:     
    ...:     btext = open_file.read()     
    ...:  
      
    In [17]: btext[0] 
    Out[17]: 37  
    In [18]: btext[:25] 
    Out[18]: b'%PDF-1.5\n%\xec\xf5\xf2\xe1\xe4\xef\xe3\xf5\xed\xe5\xee\xf4\n18'
    
  • Adding this opens the file without any line-ending conversion.

write mode

  • To write to a file, use the write mode, represented as the argument w.

  • The tool direnv is used to automatically set up some development environments.

  • You can define environment variables and application runtimes in a file named .envrc; direnv uses it to set these things up when you enter the directory with the file.

  • You can set the environment variable STAGE to PROD and TABLE_ID to token-storage-1234 in such a file in Python by using open with the write flag:

    In [19]: text = '''export STAGE=PROD     
    ...: export TABLE_ID=token-storage-1234'''  
      
    In [20]: with open('.envrc', 'w') as opened_file:     
    ...:     opened_file.write(text)     
    ...:  
      
    In [21]: !cat .envrc 
    export STAGE=PROD 
    export TABLE_ID=token-storage-1234