Some notes from PyGotham (corrections welcome)

Getting Rich with Comparison Methods 

Matt Story

the truth of x == y does not imply that x != y is false […] Mind == blown and mind != blown.

pro tip: when defining ‘__eq__’ also define ‘__ne__’ its reflection method

operators like ‘<‘ have reflection properties to handle ‘>’ when overridden
right-side reflection for left-side method

how much of life is wasted by not reading documentation first?

keep-calm-and-rtfm

python has a NotImplemented error. singleton constant

MRO –> method resolution order –> use the more specific class’s methods, regardless on which side of comparison
this is the reason to use mix-ins as opposed to normal inheritance

test ALL the things
import operator module to test all the cases for bitwise and arithmetic operators

eq, ne, lt, le, gt, ge – check for existence, return the thing or NotImplemented

@functools.total_ordering – may drastically reduce the code

comparison methods are well documented. fairly robust methodology to learn which method to use on right/left side. complex so need to test.

comparison methods do not need to return bools, so you can do whatever you want. harness asymmetrical and non-boolean comparisons… useful for lazy-loaded filtering, iteration


Customizable and SaaSy REST APIs

Juan Gutierrez

Django rest framework – routers and ViewSets: separate core concerns and client customizations

want to implement client code in core?

put thousands of if statements? not pythonic, don’t. yuck
try/except client imports – try to do client stuff, then go into the core
custom URLS – platform fragmentation, who is hitting what… thousand different ways. gross py management

wouldn’t it be nice if URLs were consistent? methods/classes, filenames/filesystem paths
REST! extend the core code by client-specific code, based on URL. base core is slightly tweaked by what’s in the client extensions

core defines what view set, actions, permissions, serializers.

how well are concerns separated?
single responsibility principle

highly dependent on the quality of core classes

@link/@action are useful because they are not limited to CRUD methods


Python Wats: Uncovering Odd Behavior (link)

Amy Hanlon

identity, mutability and scope —> trivia, why, tips. Python interpreter can have weird behavior.
ints between -5 and 256 stored in an integer array. point to location in array (memory pointer)
‘is’ —> is same object in memory
‘==’ for testing values, except for None or during identity tests

new_array = old_array creates a reference to the original pointer, does not create a copy on mutable objects.
copies using [:] slice or list()  —> shallow copy
shallow copy does not copy nested mutable objects [[“cats”, “dragons”]]
copy.deepcopy —> copies recursively

mutable default arguments: f.func_defaults

namespaces: locals() >> globals() >> __builtins__


Bytes in the machine: inside the CPython interpreter (link)

Allison Kaptur

python has many layers often referred to as interpreter.

Byterun: python interpreter written in python.

interpreter and virtual machine —> interpreter is a VM that runs like a computer
bytecode is internal representation of program in the interpreter
dis – disassembler, when you have code for a machine that a person can consume

dis returns—> line number, index in byte code, instruction name for humans, more bytes: the arg to each instruction, hint about args

LOAD_FAST most common in cpython
dynamic – a lot happens at runtime, very little happens at compile-time (allows for generic objects) so the interpreter doesn’t have to know a lot

1,500 line switch statement !!

“Dynamic” – you can overload ‘__mod__’
‘%s’ uses mod on strings

>>> mod("%s%s", ("Py", "Gotham"))
PyGotham

in the general absence of type information, almost every instruction must be treated as INVOKE_ARBITRARY_METHOD – Russell Power and Alex Rubinsteyn “How Fast Can We Make Interpreted Python?”

one data stack for entire VM vs one data stack per frame
generator pauses and executes the frame
having one stack made generating not work

ordinary iteration is like sending None into a generator. iterator control loop that yields, right away throws away top of the stack, pop top


Failing With Grace

Sean O’Connor

Async Queues – NSQ and the beauty of pub sub messaging
Timeouts – Don’t get stuck.
Smart Retries – Not giving up while not making things worse.
Immutable Data – Everything is easier when nothing changes.
Monitoring – It’s broken, you just don’t know it yet

Timeouts are useful for everything
Retries! (preferably in different host)
Immutable transactions, idempotent: use event based directions, not just commands
Monitoring, always
Logging – centrally, all important things —> you do not want to start pulling data from different logs. know what level of detail is useful
Load balancing on various layers of the application

Tradeoffs in what’s important to protect: dropping link usage data in bitly vs serious financial company transactions


Incremental Non-Design: Fighting Entropy like Sisyphus (link)

Paul Winkler

Agile – emergent design, solve things as needed —> software can always be better (Build > Refactor > Repeat)

We must imagine Sisyphus happy

“We must imagine Sisyphus happy” (this title is from Camus) http://slinkp.com/sisyphus_pygotham_2014/#/step-5

how do things get worse over time? maybe overuse of inheritance
DRY: don’t repeat yourself == make MOAR base classes?

why is too much inheritance bad? big inheritance graph and mix-ins start to turn into class explosion! every time there’s a new concern, you’re multiplying the number of classes

yo-yo trouble-shooting —> try to understand the message tree, need to find where things are defined and/or overridden
multiple inheritance is a pinball problem: who is “self” anyway?
poor separation of concerns, conceptually unrelated objects get mixed-in (violates single responsibility principle)

Favor composition over inheritance: use pattern that separates concerns

‘has-a’ or ‘uses-a’ relationships are usually better (instead of ‘is-a’ composition)

hard to fight inertia of existing design, especially when all methods are referring to each other

refactor! proxy object
build a wrapper, treat it separately, the underlying class doesn’t need to be aware of it

# Do not make SharkWithLasers(Shark, LaserMixin)

class Shark(Animal):
    ...

class Armored(object):
    def __init__(self, wearer):
        self.wearer = wearer 
    ....

shark_with_armor = Armored(wearer=Shark())
shark_with_laser = Shark(weapon=Laser())

# Shark 'has' or 'uses' laser, rather than 'is' laser

pro-tip: refactoring needs good test coverage before starting


Decorators 101: A Gentle Introduction to Functional Programming

Jillian Munson

how to learn decorators: usability testing, falling down the user hole
functions are first-class objects: supports all the operations generally available to other entities we can pass functions to other functions as a parameter —> higher order functions

‘@’ – python interpreter directive. execute decorator, passing in the decorated function

what if we want our decorator to work with functions with various signatures? use (*args, **kwargs)

Use-cases for decorators
[] add behavior that executes before/after function
[] validate input
[] format output

closures —> can implement with decorator

decorator returns a function definition, it should be meta function and be executable. replaces original function


Enough Machine Learning to Make Hacker News Readable Again (link)

Ned Jackson Lovely

I can machine learn and you can too!
machine learning is transforming itself from scientist-only to engineering, highly applicable toolkits

don’t need to know how to make an i-beam to make a bridge, don’t need to know advanced mathematics to understand machine learning

Everything is AI until we understand it, then it becomes CS

ML: applying statistics to piles of data

second 2 parts are mostly libraries:
1. get data – scrape the web? find a source
2. engineer the data, format to fit into models
3. train and tune models
4. apply model

scikit-learn can do real awesome machine learning right there
google things, don’t be afraid of crazy math. don’t need to understand the proofs. get a feel for it, learn terminology enough to understand when it’s a param in your problem. learn which things matter, what to pay attention to… be an autodidact

scikit-learn algorithm cheat-sheet

which things to google: scikit-learn algorithm cheat-sheet http://peekaboo-vision.blogspot.com/2013/01/machine-learning-cheat-sheet-for-scikit.html

supervised learning: with lump of data, can i predict some other thing?
based on email, is it spam?

unsupervised: I have a pile of data, can i learn something interesting from it?

scikit patterns:
parallel arrays – X, y (X is input, y is output)
set aside a validation set

Machine learn in these two easy steps
1. build pipelines
2. optimize parameters

support vector machines: magic to classify stuff, trains itself, makes predictions
kernel trick – twisted planes or something?

tweak parameters based on googles, hyper-parameters?

transform() – pass in value, turn it into more useful value >> turn text into numbers
fit() – trains based on input data and expected data
predict() – make predictions with models

Standardize the distribution: mean of 0 with standard deviation of 1, turn into numpy arrays, etc

“Time flies like an arrow; Fruit fries like bananas”

bag of words – count what’s there. T/F on if things appear. n-gram bi-gram, whatever to glob words together
normalized form —> stemming heuristic
stop words – english language glue

feature extraction: term frequency, inverse document frequency: word that shows up a lot in a doc, but not in a lot of docs overall

engineering features >> pulling out the relevant text. readability module

live on the internets: hn.njl.us


Statistics and Linear Regression Models with Python

Aaron Hall

examining linear models in Python. fast to develop applications, easy to read/write
analysis of data with linear models and diagnostics to get an optimal model

assumption of linear regression model: data is linear in the parameters

data needs to be properly transformed for a good fit

boxcox determines if transformation would provide a better fit. highest likelihood for the best fit. based on graph, determines if data should have log or other transformation, resulting in better confidence interval


How to write actually object-oriented python (link)

Per Fagrell

object orientation vs procedural programming

teachings of Uncle Bob Martin

procedural programs: step by step, recipe, script-style. clear thread of control. follow along –> like line of ducks walking in a row

object orientation is using classes as blueprints and interaction instructions in memory –> like everyone talking to each other in a crowd

OO maps your mental model to code, better than any other current paradigm, words and concepts map to code

nothing in python that forces consistency. you need to apply discipline, worth the extra energy to improve maintainability, simplifies testing, simplifies communication with yourself and others looking at your code

design principles: apply at any level, from function, to module, to program

be responsible

“Be Responsible” Sam Howzit (CC BY 2.0)

Single Responsibility Principle – code should have one and only one reason to change
one class shouldn’t have tons of methods that do too many things, big codebase, harder to test, more points of failure

Modem() does everything (bad) vs Connection() and Data() <<split out functionality>>

objects shouldn’t do persistence and business rules in the same place
create mix-in or combine via wrapper with inheritance to create meta class for runtime. not maintained within
create orchestration class separate from IO that is being controlled

Open/Closed Principle – open to extension, closed to modification
shouldn’t have to modify old code to add features/tools
layer in between to handle things that are subject to change
write code so that it doesn’t have to be rewritten

Liskov Substitutability Principle – duck typing
anywhere you’re using a base class, you should be able to use a subclass without knowing
same methods, same interface, same behavior. subclass shouldn’t suddenly require extra work

Interface Segregation Principle – don’t force clients to use interfaces they don’t need
how many parts of objects are used at a given time, how tightly cupeled is the code?

Dependency Inversion Principle – high-level modules shouldn’t rely on low-level modules, both should rely on abstractions
music player shouldn’t have speaker interface. it should be abstracted, so other things like streaming can extend music player. easier to test each piece

tell, don’t ask. tell objects to do the work. don’t ask for the data. tell butcher to give you a sausage. don’t ask for casings and meat
code for an object will be littered all over the place. this is bad. objects are meant to keep things all in one place

think ‘objects’ anytime you write code, if there are 80 parameters, if data is being passed around, if tests become unmanageable… probably should refactor


Confidence in the Lasso (link)

Jared Lander

The lasso is one of the most significant machine learning algorithms from the past 15 years. Conceived by Hastie, Tibshirani and Friedman from Stanford, the lasso performs dimension reduction and variable selection making it well suited for the high dimensionality of today’s datasets. In this talk we will go over some of the math behind the lasso and discuss some recent advancements in performing inference on lasso-fitted models

Lasso is an extension of regression. regression is used to fit a line through some points.

Solve for Beta: B = (X^T X)^-1 X^T Y
which variables matter?
if you have hundreds of variables, how do you know which ones to keep?
if you have overfitting, confidence intervals get too wide

P-values suck. (sorry old-fashioned statisticians) They’re arbitrary values, that solved a purpose before there were computers.

Cannot look at one confidence interval without influencing the others
forward-step or backwards-elimination to find a happy point
stepwise: forward and back. add a variable then take away, back and forth.

All subsets: P-choose 1, P-choose 2… —> 2^(P-1) … 10 subsets you end up with shitload of combinations. 100 variables goes into trillions. Don’t do this by hand.

Fit a model with 5 variables, fit a model with 6, compare.
AIC, BIC, CIC, DIC, Deviance

what do you do with thousands of variables? curse of dimensionality. when you have too much, things will go awry. throw everything at the machine, expect the machine to do it.
multi-dimensional hypercube: awesomeness. imagine cube is data, inner cube can be some dimension of data. data is sparsely distributed. becomes problematic with large data.

shrinkage and regularization. reduce coefficients.

Ridge (set alpha to 0). Lasso…

minimize normal regression curve. keep coefficients small. Ridge is an L2 penalty Lasso is an L1 penalty. Lasso is great for variable selection. give is a thousand, it’ll come back with 10 or so that matter. Lasso can work when you have more columns than data. 5000 rows, 20 columns of data >> still works.

Elastic-Net, blends both penalty terms. does a bit of both. alpha tuning parameter allows you to fiddle with them. Can do lots of different types of regression with this technique.

Good: Lasso is fast. p >> N. Variable Selection. Lasso uses cross-validation

Difficult to get standard error-bars (confidence intervals). these things matter
Can implement bootstrap confidence intervals, though it’s a biased estimator
Covariance Test Statistic – compare coefficients at lambda k + 1. coefficients now and coefficients with one more variable.


Python and Julia. Why do we need another language? (link)

Dwight J. Browne

slow computers required efficient languages (FORTRAN, C)
compressed delivery times, bloated enterprise apps, computers kept getting faster

things were “good enough” for a while… filled a void, but didn’t solve any problem: like reality tv: 300 channels, but nothing to watch. bloated apps created lots of technical debt

big applications, complex dependencies. ECLE over and over again, processing speeds are increasing slowly

need for speed: dynamic language. multi paradigm. slowness will be forgiven for fast delivery…
but not always

if you need to write a performant application, you need something with C-like performance.
Julia has C interoperability = python interoperability

multiple dispatch is powerful – Julia is incredibly fast due to type invariance. dispatch to optimize for type
full ipython integration

LLVM will execute code based on dispatch

note on profiling: YMMV

Leave a comment