Advice for Aspiring Data Scientists

Jan 20, 2019 · 1017 words · 5 minutes read data science

Sometimes people ask me for advice about how to start a career in data science. I’ve written before about how I became a data scientist, which gives a bit of my personal journey. I’ve told my story because people often can only see where someone is now and not always where they’ve been or the path taken to get there.

Recently an aspiring data scientist reached out to me with some questions and I thought to share my response here so others can benefit.

Excerpts from the the e-mail and my responses are below:

I recently completed a course on Python as well as one on data mining and machine learning algorithms at a local University. I absolutely adored data mining and I want to work towards starting a career in it. I attached a copy of my resume I made based on what I learned from the classes. What are your thoughts on my current marketability? Is there anything you would change for the Data Science Skills section?

These questions are fairly common and I like to rephrase them as “how should I showcase my data science skills?”

Resumes are important but there are also a lot of way to demonstrate your data science skills. @drob recently gave a talk, The unreasonable effectiveness of public work, that highlights more ways to make your work (and yourself) visible such as writing blog posts and giving talks at local Meetups. Rather than trying to include an exhaustive list of skills and buzzwords on your resume, it can actually be better to simply redirect recruiters to a website or GitHub profile. Being able to see your work via a website or portfolio communicates much more about what you can do – which is usually what employers care about. Of course, recruiters love buzz words so you can’t remove them completely.

You should still communicate your skills on your resume but there are different ways to do so. One thing is to list a few projects with a brief description that includes the tools and methods you used. When you don’t have much work experience, it’s perfectly fine to list projects you’ve done as a student or part of a class. It’s good to frame things in a way that illustrate what the problem was and how you solved it.

I am looking to add more to my resume and, most importantly, learn the right things. I have signed up to a coursera course on machine learning, as well as a few on udemy for Scrapy, TensorFlow, and MySQL. What are your thoughts on these courses/subjects? What would you add? Are there any courses or resources you would recommend?

The “right things” are the things that you’re interested in. The “right things” are also the things you need to know to solve the problem you’re working on at the time. You can’t learn everything and then solve problems – you have to get use to learning things as you go.

The machine learning course looks good to me (based solely by the outline). Web scrapping can be a great way to get data you’re interested to play around with. SQL generally is quick to learn and is one of the most used languages by data scientists. It probably makes sense to wait to learn Tensorflow until you’ve reached the neural networks section of the machine learning course – or until you have a problem that requires it.

Finally, a few spit fire questions: is it worth starting fresh and taking a computer science major from scratch,

It’s almost definitely not worth restarting with a computer science degree. Most computer science programs focus on theory and won’t make you a better coder or data scientist.

or is learning specific subject and languages a viable option?

Languages don’t really matter. Some are better suited to certain tasks than others but “knowing” a language doesn’t make you a good data scientist or software engineer. People like what they like – and they also like to argue about it. Once you know one language it’s a lot easier to learn others when you need to and then you can judge for yourself which ones you like.

All of that being said here’s my thoughts about a few different langues:

  • R: one of the most popular languages for data science and statistics so if you don’t know R then it’s definitely worth learning some of the basics.
  • SQL: an absolute requirement for any data scientist. It’s fairly easy to master the basics and you shouldn’t worry about becoming a DBA, you just need to know enough to get by.
  • Python: another popular language for data science but also general programming. If you’re also interested in general programming then Python is a good choice.
  • Javascript: supposedly the most popular language, so it might be worth checking out if you’re going to work with any web developers. Otherwise it’s not used much for data science, so I wouldn’t spend much time learning it.

What jobs titles should I be looking out for as an entry job?

Data scientist, data analyst, statistician, research scientist, business intelligence analyst, or some variation on any of these.

Should I be cold calling companies?

You shouldn’t cold call companies but you can reach out to people. Don’t send people random LinkedIn requests, but instead ask people about their experience and for their advice. It’s better to go to Meetups and events to make connections.

What type of company should I be looking to apply to?

I don’t think I can really answer this question because people want different things out of their work. Do you want to work for a small, medium, or large company? Larger companies tend to be more likely than smaller companies to have data, but sometimes it can be more difficult to get access to that data. Larger companies are also more likely to have more experienced managers and senior employees (simply because they have more people). But smaller companies can give you the chance to wear many different hats and work on many different types of projects.