Office Hours

and why you should care…

Photo by Vladimir Mokry on Unsplash

As organisations become more data driven, machine learning models are being used in more and more applications. Despite their usefulness and value, the use of such models represents a new source of risk — namely that poorly built models can lead to bad business decisions, bad customer experiences and even cause an organisation to break the law.

Model Governance helps to minimize that risk by establishing procedures to ensure model quality.

Open Risk Manual defines Model Governance as:

An internal framework of a firm or organization that controls the processes for model development, validation and usage, assign responsibilities and roles…

Plus sample questions to use in an interview

Photo by Glenn Carstens-Peters on Unsplash

Finding good Data Scientists can be a tricky task. Googling “Data Science Skills Gap” shows that in many countries around the world, companies are struggling to find suitable candidates for their ever growing Data Science needs.

A bad hiring choice can be very expensive, and as a result, it’s important to be able to vet candidates effectively, to make sure they are a good fit for the position and are going to be effective in introducing/expanding Data Science at your company.

Understanding the source of the data one has available, along with any Data Quality issues, is imperative to producing…

Using AI to do AI

Photo by Morning Brew on Unsplash

Automation has transformed many industries around the world. From self-service checkouts in supermarkets to car-building robots, technological solutions are constantly encroaching on the areas of work once the exclusive domain of humans.

As Data Scientists, we are not immune from this. Every day new products are being developed to automate parts of the Data Science life-cycle.

Data wrangling

They say that Data Science is 80% preparation and 20% analysis and modelling. But new tools are eating into that 80%, allowing us to spend more time on the high value work at the end of the Data Science process.

Automunge makes the process…

At what point should you stop chasing percentage points and label your model “done”?

Photo by John Matychuk on Unsplash

In predictive analytics, it can be a tricky thing to know when to stop.

Unlike many of life’s activities, there’s no definitive finishing line, after which you can say “tick, I’m done”. The possibility always remains that a little more work can yield an improvement to your model. With so many variables to tweak, it’s easy to end up obsessing over tenths of a percentage point, pouring huge amounts of effort into the details before looking up and wondering “Where did the time go?”.

Iterating your model, via feature engineering, model selection and hyper-parameter tuning is a key skill of…

How this counter-intuitive statistical “paradox” relates to satellite collisions, DNA evidence and other coincidences

If you have a group of people in a room, how many do you need to for it to be more likely than not, that two or more will have the same birthday?

Photo by S O C I A L . C U T on Unsplash

Theoretically, the chances of two people having the same birthday are 1 in 365 (not accounting for leap years and the uneven distribution of birthdays across the year), and so odds are you’ll only meet a handful of people in your life who enjoy the same birthday as you. This leads many people to intuitively guess around 180.

The correct answer is just 23.

That means in…

Plus other unintended consequences of autonomous vehicles.

Göbekli Tepe in Anatolia, Turkey is the world’s oldest known human settlement. At 11,500 years old, its founding marked the beginning of our species’ transition from small groups of nomadic hunter-gatherers, to complex societies within ever-growing communities.

Göbekli Tepe — Source: Wikipedia

From there, it’s been a one-way street.

By some accounts, Rome became the first city in history to have over 1 million inhabitants in 133 BC. By 1500, 1 in 25 people lived in towns and cities. Then, the industrial revolution sent this process into overdrive, bringing huge numbers of people to the cities in search of manufacturing jobs. …

How to thrive as the first data-soldier on the ground

More and more businesses are taking their first steps into Data Science, and many are hoping to build that capability in-house. This gives Data Scientists ample opportunities to get in on the ground floor and play a part in guiding an organisation towards a data savvy future.

I’ve been lucky/masochistic enough to do this with two different companies — a web design agency and a law firm. In the process I’ve learned a lot about how my field can better serve those working in other areas and bring value to a company. It hasn’t been easy, and there have been…

Big Retail + Big Data

Supermarkets are big business and they use data on a big scale. Originating in the US in the 1930s, supermarkets have since gradually taken over a bigger and bigger share of the retail and grocery market. Giants like Wal-Mart, Aldi and Carrefour are among the largest retailers in the world with revenues approaching the hundreds of billions. As such many have invested heavily in big data, with analytics and data science forming a core part of their decision making.

Photo by nrd on Unsplash

Every product purchased, along with its price, is recorded in gargantuan databases, with tables exceeding hundreds of billions of rows. Loyalty…

Software development tools for staying organised and keeping quality high

There are many online lists of the software and packages used in Data Science. Pandas, Numpy and Matplotlib are always featured, as are machine learning libraries Scikit-learn and Tensorflow.

However, just as important are some less DS-specific software development tools that should be part of your workflow on every project.

Photo by Todd Quackenbush on Unsplash

Version control is a necessity on any coding project, with Data Science being no exception. Keeping track of who did what when, and having a comprehensive history of working, tested code is invaluable in projects of any scale, but especially when collaborating with others.

Git keeps track of changes made…

Don’t start with the data, start with the person

Photo by Josh Calabrese on Unsplash

We data scientists are required to have many strings to our bows. Extracting, wrangling, cleansing, transforming, querying, analysing, modelling, visualising, deploying — all in a day’s work. A “Full Stack” data scientist needs to be competent across myriad packages, frameworks and technologies, or at least have the ability to quickly up-skill where necessary. The diverse technical challenges are part of why we love the field so much.

But despite our predispositions, there is one skill that we should all be working to perfect.


The OED describes empathy as…

The power of projecting one’s personality into (and so fully comprehending)…

Richard Farnworth

Data scientist, computer programmer and all-round geek with 10 years of using data in finance, retail and legal industries. Based in Adelaide, Australia.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store