Cats and Dogs

Published on May 25, 2017

Last week I had an amazing opportunity to present about machine learning and innovation to 110 people leaders at my company. It was a mixed audience of technical and non technical people.

My message is that machine learning needs to be understood by everyone in the business, not just the data scientists. Domain knowledge with machine learning will really enable successful data projects.

Here are my slides and some notes

1

A few months ago I followed a tutorial on https://course.fast.ai and entered a Kaggle.com competition. The competition problem was to label 20,000 images of cats and dogs using computer vision. There were 1300 entries from around the world.

I downloaded an existing, freely available algorithm – vgg16 and slightly modified it to get 87% accuracy.

I thought this was a great result. It’s REALLY hard for computers to tell what is in these pictures. 5 years ago a team of scientists got 57% accuracy with this same dataset.

2

However it turns out I didn’t do very well:) I came 600th! The winners got closer to 97% accuracy.

We’ll never all be data scientists but the technology is at a point where anyone new to machine learning can download great solutions and start solving these problems. You as domain experts are in the best place to see these opportunities and start experimenting.

3

What is machine learning and why is different to what we do now? This is very simplistic but with traditional computing we would tell the computer exactly what result we wanted for a given set of inputs. With machine learning we give the computer a large amount of information and we ask the computer to give us insights in to the data.

We don’t write explicit programs. The ‘program’ is an output from the data and will change based on the data.

4

It does this using some well known and well studied mathematics. Data scientists even have a cheat sheet for which algorithm to use. For more difficult aspects of machine learning like Deep Learning, there are some very good models available for free online. I downloaded one of these for cats and dogs.

But we’re not here to learn the cheat sheet so forget about the detail.

5

Just remember that the algorithms are well known for a given problem.

6

What gives companies an advantage in machine learning is their data.

We have an incredible set of users here. And they’re giving us some great data. Crunching all this data costs money.

7

One of the reasons you hear a lot about machine learning recently is that computing power has gotten very cheap. I spent just $150 for a few hours of computing from amazon for cats and dogs.

Why now? Exponential innovation…

8

Every few years for the past 100 years the amount of computing power you can buy for $1000 dollars has doubled. We are just at the tail end of the most recent technology advance – semi conductors. This pattern means that right now for roughly $1000 dollars you can purchase the same amount of computing power as a mouse’s brain.

9

If this trajectory continues then by 2024 for that same $1000 dollars you will be able to purchase the same amount of computing power as the human brain.

Now this is a wacky idea and I don’t believe it myself. But that’s perfectly normal! Humans are really bad at thinking exponentially.

If I ask you to walk 30 steps linearly then that’s easy to picture, 30 meters. However if I ask you to walk 30 steps exponentially, doubling every step – 1m, 2m, 4m, 8m. Then by the 30th step you will step billions of meters. The final step will take you 26 times around the world!

We can’t think this way but this is how fast and how cheap computers are becoming.

There are three things we can do to help accelerate adoption of machine learning throughout the business.

Realize that machine learning is absolutely accessible and it’s not magic once you know what types of problem can be solved.

There are 5 major types of problem…

Classification

Regression

Clustering

Ranking

Anomaly Detection

Think about problems in your part of the business that can be phrased this way.

We need to collect better data, not just more data. We need to collect relevant data and this is where your domain knowledge is vital.

You are also in the perfect position to identify gaps in our current data. We should find these as soon as possible and start plugging them.

We need to identify any possible external sources of data, council data for example.

You should identify areas of the business where we are making subjective decisions. If we can eliminate ambiguity and subjective decisions from the business we can make better decisions.

Collaborate with data scientists – your domain knowledge combined with the skills of our data science is what will produce the best results

Don’t silo data. ask your data team where you should push data so everyone in the business can access it.

Don’t be afraid of sensitive data. We can anonymise the data and still get great insights from it.

We have some slack channels. Anyone can join the channels. We are all learning and these are a safe space for any level of knowledge in machine learning.

We will be running classes ranging from this type of over view information all the way to implementing real solutions.

So don’t be afraid of machine learning. Here we have a huge loyal user base and they’re generating amazing data, we have a group of the top technical and business talent in the country.

But our industry is changing faster than we can imagine and we need to use every tool available to keep our advantage in the future.

Think of machine learning as another technology or tool like Word, Excel or Photoshop. Learn about it. Get involved.

Please get in touch if you would like more information.

Some images from this amazing article on wait, but why: https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html

The slides and idea for exponential innovation from this great talk by Kaila Colbin: https://www.youtube.com/watch?v=XwxwVSJcOGU

Course: https://course.fast.ai

Darragh ORiordan

Hi! I'm Darragh ORiordan.

I live and work in Sydney, Australia building and supporting happy teams that create high quality software for the web.

I also make tools for busy developers! Do you have a new M1 Mac to setup? Have you ever spent a week getting your dev environment just right?

My Universal DevShell tooling will save you 30+ hours of configuring your Windows or Mac dev environment with all the best, modern shell and dev tools.

Get DevShell here: ✨ https://usemiller.dev/dev-shell


Read more articles like this one...

List of article summaries

#management

Flexible work is here to stay but you should choose an emphasis

Note: This was written in August 2022 and I assume things will change quickly. It will be interesting to look back in a couple of years to see how much of this article is still relevant!

Most tech organisations were forced to change to remote during lockdown but haven’t explicitly changed their office attendance policy.

#management

Hiring engineers in a candidate-driven marketplace

I’m writing this at the start of 2022 and it’s never been tougher to hire engineers. There is a very strong candidate market in software engineering at the moment.

There are roughly 1 million open software engineering roles in the USA and somewhere around 200,000 candidates. The rest of the world is having similar issues hiring engineers. Most people seem to think it will be this way for quite some time. There just aren’t enough engineers as software becomes more important to every industry.

When I started my career getting hired was skewed in favour of the hiring organisation. A candidate had to have a degree and there was no remote work so your choices for where to work were limited.

Now, in 2022 candidates with a couple of years of experience are in high demand and practises for hiring have changed significantly. You can work anywhere across a few time zones and university degrees thankfully aren’t necessary any more.

I’m mostly on the other side of interviews these days, so my are tips for other folks trying to hire engineers during these tricky times!

#developer-experience

How engineers can help deliver software effectively

Delivery managers and team leads have the responsibility to deliver a software system via an engineering team.

Your customer wants every feature to work perfectly and they want it delivered yesterday. Your team wants to learn and grow.

It’s a tough role managing all the stakeholders and creators in a project.

Engineers can help drive great delivery by empathising with and supporting the delivery manager or leads in a project team.

#engineering

Engineering systems for consistency and impact

Your most impactful engineering is done before you write any code.

It’s important to have some systems around how you approach problems to make sure you’re consistent every time.

These are some of the techniques I use to make sure I’m covering as many angles as possible when doing my pre-coding engineering.

Comments