For a recent project I needed to find a method to determine which colour palettes were present in a large dataset of artworks, ideally without any input from myself! This task aligned itself to the unsupervised technique of clustering and can be divided into two steps:
Both of these can be accomplished with K-Means clustering, which simply clusters points by the distance between points and cluster centers.
To effectively use K-means to extract colour combinations it is important to have the euclidean distance between colour data points be interpretable as…
Pandemics, bushfires and economic crisis. Not only the menu for the last year but a small subset of the phenomena that can be modelled under the framework of agent based modelling (ABM).
ABM is well suited to understanding how complex behavior emerges in systems based on simple interactions between the system’s individual participants. ABMs have shown particular strength in explaining general observations, referred to as ‘stylized facts’, such as observed neutron scattering patterns or the distribution of asset returns. These agents could be anything from people or companies to animals in an ecosystem or atoms in a gas. …
Pandemics, bushfires and economic crisis. Not only the menu for the last year but a small subset of the phenomena that can be modelled under the framework of agent based modelling (ABM).
ABM is well suited to understanding how complex behavior emerges in systems based on simple interactions between the system’s individual participants. ABMs have shown particular strength in explaining general observations, referred to as ‘stylized facts’, such as observed neutron scattering patterns or the distribution of asset returns. These agents could be anything from people or companies to animals in an ecosystem or atoms in a gas. …
For a recent project I needed to find a method to determine which colour palettes were present in a large dataset of artworks, ideally without any input from myself! This task aligned itself to the unsupervised technique of clustering and can be divided into two steps:
Both of these can be accomplished with K-Means clustering, which simply clusters points by the distance between points and cluster centers.
To effectively use K-means to extract colour combinations it is important to have the euclidean distance between colour data points be interpretable as…
The range and accessibility of basic services such as supermarkets and banks varies along with average income between neighboring postcodes in Victoria, Australia.
However, unlike the US where “banking deserts” are leaving low-income communities without access to financial services and western Sydney where “food deserts” are limiting access to groceries, Victoria seems to demonstrate the opposite pattern.
Lower income communities have significantly more schools, banks, supermarkets and health vendors within walking distance than their high income counterparts. Conversely, high income suburbs do have significantly more options within a larger driving distance.
In the previous story, I developed a model to encode the art style of an image into a vector in higher dimensional space, where the euclidean distance between vectors represents how visually similar the images are.
For an image recommendation system to use these embeddings, the vector for every image in the data set needs to be stored on disk and the distance between two or more vectors will need to calculated at some point. The more dimensions that are present in each vector, the greater the storage and computational needs of the system, increasing costs and search times.
Due…
Document Classification: The task of assigning labels to large bodies of text. In this case the task is to classify news articles into different labels, such as sport or politics. The data set used wasn’t ideally suited for deep learning, having only low thousands of examples, but this is far from an unrealistic case outside larger firms.
Now normally this type of technical article would run through a few models, before concluding with a comparison of results and an overall evaluation, but today I thought I’d save you a scroll and start off with the unexpected results.
Simple models worked…
The aim of this exercise is to find a function that transforms images into embedding vectors where the euclidean distance between vectors represents how visually similar the images are. This allows a nearest neighbors search on one image’s embedding to return images that are visually similar, empowering image recommendation and clustering.
Document Classification: The task of assigning labels to large bodies of text. In this case the task is to classify BBC news articles to one of five different labels, such as sport or tech. The data set used wasn’t ideally suited for deep learning, having only low thousands of examples, but this is far from an unrealistic case outside large firms.
Now normally this type of technical article would run through a few models, before concluding with a comparison of results and an overall evaluation, but today I thought I’d save you a scroll and start off with the unexpected…
Don’t get me wrong, I love Google Maps. Its a borderline miraculous service that has given me superb driving directions, public transport timetables, and walking routes. All over the world. For free.
However, when I tried to get cycling directions to a store a short distance across town I encountered some issues. I plugged in my headphones, put away my phone, and was promptly directed to take a left onto a four lane highway.