Negated Index and Machine Learning

Weeks five and six of the Hacker School experiment:

  1. Negated index
  2. URL shortener
  3. Graph traversal
  4. Machine learning

Trying to implement a new {idx::Negated Index} type for Julia lang

After solving 12 Project Euler problems, I decided to dive into the deep-end and work on a complicated real-world Julia problem.  One of my goals for this Hacker School batch is to contribute to an open source project, so…

What is a negated index?  A proposed new type to the Julia language, such that the negated index [!x] would strip the member whose position has the same value as the index [x], creating a vector slice with the member removed.

This involves defining methods for the ! operator to negate indices (int, range, vector…) by converting them to a NegatedIndex type, as well as methods for getindex (and perhaps setindex, in, maximum, minimum…) to handle this new type (idx::NegatedIndex).
For example: a[!idx] will return a vector of all elements that are not in idx.

a = [1,2,5,7]
a[!1] = [2,5,7]
a[!(2:3)] = [1,7]
a[![1,7]] = [2,5]

Details: https://github.com/JuliaLang/julia/issues/1032

Yet another URL shortener clone

http://pogiralt.appspot.com/shorten – yay finally something I finished in an afternoon!  Nothing flashy, but it works.

Building Graphs

Friday’s algorithm problem involved creating and searching through a graph.

  1. Here’s graph – find a path from 1 to 10.
    1----2----3    11
    |    |         |
    4    5----6----10
    |    |    |
    7    8    9----12
    
  2. Here’s a graph. Find the shortest path from 1 to 12.
    1----2----3----11
    |    |         |
    4----5----6----10
    |    |    |
    7    8    9----12
    |    |
    13---14

My code is not perfect, but I think it works (code reviews & feedback welcome).  Since I got a D in CS112 back in college, it feels great to finally get the hang of building data structures. Thanks Tom and Alan for running these weekly workshops.

Machine learning: Gradient Decent using linear and logistic regressions

gradientDescentMulti.m implements the cost function and gradient descent for linear regression with multiple variables; algorithm learns theta by minimizing the cost function.
theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters)
updates theta by taking num_iters gradient steps with learning rate alpha

% evaluate hypothesis vector for all m examples in X
hypothesis = X * theta;

% calculate delta vector for all hypotheses and y
delta = (1/m) * (X' * (hypothesis - y));

% simultaneous update for the vector theta
theta = theta - (alpha * delta);

% Save the cost J in every iteration
J_history(iter) = computeCostMulti(X, y, theta);

Next Steps:

For the logistic regression, the cost function code has to predict y=0 / y=1 instead of a linear regression.  This allows us to predict boundaries between data, then cluster the data based on these decision boundaries. The boundaries don’t even have to be linear!

decision boundaries

predicting clusters of data

Looking forward to implementing some of these machine learning algorithms on real data sets.

Also, I really hope I can figure out this new Julia type.

My batch of Hacker School may be half over, but I’m planning to “never graduate”.

Leave a comment