Advanced Python for Data Science: A Review (the GOOD part 2)

TLDR; python is not that hard as long as someone explains it well.

Summary of the GOOD parts of Advanced Python for Data Science

  • Software carpentry is a lawful good

  • Get taught a mature approach to design

  • Disciplined environment/package management

  • Focus on continuous integration

  • Use of git, conventional commits, and facilitating collaboration

  • Amazing peer review from other students

  • Generous marking of assignments

  • Modern, popular libraries used: pandas, numpy, Django, Luigi, pytorch

(Continued from part 1)

I write these in parts because I have a rule about writing posts - I force myself to publish whatever I’ve written by the end of the day. My experience is that if you don’t publish on the day, you probably won’t publish at all!

So there are a lot of good things about Advanced Python for Data Science.

Software Carpentry 

I’ve already talked about it in the previous article. If you’re not incorporating some software carpentry (in some form or another) into your coding journey then I believe you’re setting yourself up for some serious pain down the track. Both for yourself and your boss.

I don’t think you should start with software carpentry, because it’s pretty boring - in fact I think you should start with a jupyter notebook running on a Google Colab. Or, even better, a little practice coding tool on something like EdX or CodeAcademy.

But you do need to learn it early. I did not. That’s why I used to store my array rows as individual text files and load them into memory one by one.

Mommsen_p265.jpg

Hannibal Barca said that he would cross the Alps because he “would either find a way or make one!”

This is better than my approach: “Just make one!”

Get taught a mature approach to design

In Melbourne we have something called the Moomba Birdman Rally. Locals build light aircraft and try to achieve flight by leaping off a ledge into the river. They are not successful. There are extremely well established principles that govern flight. My colleague Lachie Price tells me that while it is not fully understood why these principles work, they are quite reliable. In my mind, those are the best type of principles (empirical ones).

There are principles to code that are important to abide by. They are often old and well established. They tend to work well. If everyone abides by them, we all get along.

Advanced Python uses a package called Cookiecutter, which can create new project directories with certain templates. Cookiecutter templates are amazing. They are a great example of the nudge concept:

A nudge, as we will use the term, is any aspect of the choice architecture that alters people's behavior in a predictable way without forbidding any options or significantly changing their economic incentives. To count as a mere nudge, the intervention must be easy and cheap to avoid. Nudges are not mandates. Putting fruit at eye level counts as a nudge. Banning junk food does not. - wikipedia.

In Advanced Python, you can do things in almost any way you want. But do you really want to?

Two roads diverged in a wood, and I—I took the one less traveled by,and the app was unsupported and buggy.

Two roads diverged in a wood, and I—

I took the one less traveled by,

and the app was unsupported and buggy.

Disciplined environment/package management

Speaking of mature design, there is a big problem with python where the working environment gets way too messy (no, I’m not going to show the bloody cartoon!). Moreover, packages that you are installing afresh will sometimes conflict with each other, and when you are importing heaps of packages, this can cause massive problems. For instance, since NumPy 1.19.4 causes some bugs on windows, then you’d better hope that your packages don’t settle on that version of numpy as their dependency because your app will suddenly spit the dummy.

Scott had us creating a new virtual environment for every project, which can be activated seamlessly and populated with packages from scratch so it’s easy to deploy. In hindsight, I don’t know why I didn’t do this for every project I’ve worked on.

Oh yeah, because I would have had to use the command line... 

Me, using the command line to do literally anything in 2019.

Me, using the command line to do literally anything in 2019.

Use of git, conventional commits, and facilitating collaboration

Ok, the use of git in this subject is heavy. My use of Git is pretty straightforward, but I am strict about it. If keeping good git commits is like cleaning your room, then call me Jordan Peterson. 

Oops, sorry, mask-off for a second there.

Oops, sorry, mask-off for a second there.

Focus on continuous integration

So, since we’re doing virtual environments, it’s only a skip, step, and a jump to deploying, which we test via Travis builds. Scott had us sticking a Travis build badge on all our repositories to show that our build is at least setting itself up successfully. We also ran pytest and coverage on travis to indicate that our testing is appropriate.

Letting that build badge go red (failing build) is a little like towing your car to a car show.

Like, yeah, looks good.

But did it ever run?

It’s also turned me into a real hardarse on my own projects. I’m so thirsty for that working build I’ll often tell collaborators to create a dummy function just to get the module to run. At the end of the day, writing is easy, deploying is hard.

So we make deploying a priority.

Amazing peer review from other students

As part of Advanced Python you mark others’ work and they mark yours.

Now I don’t know what kind of Silicon Valley float tanks the HES admissions advisors unsealed to recruit for this subject, but whoever they dragged out of that epsom-salt rich plasma is way too smart to be studying in this course. 

I was consistently amazed by what I was seeing when I had to mark others’ projects. I was also very happy to learn from their methods and pick up a few tips and tricks along the way. The crazy thing was that even if I wasn’t so crash hot on the content taught by Scott in the lectures, the other students would often show me in their code how they interpreted the demands of the project. So many times I thought to myself,

Oh, that’s what Scott wanted me to do…. 

Plus, it’s great getting encouragement, especially if I had been working on a project that I didn’t finish. Classmates would leave comments like “This was really hard. I can see you were almost on the right track. Don’t give up”.

Oh, alright then…

Oh, alright then…

Generous marking of assignments

I loved how generous the marking was for these assignments. First of all, a lot of marks are given for generic elements of the project like python quality, git quality, and testing.

A few quick bits of advice for you if you take this course

  1. Use conventional commits and manage commits via an IDE git plugin. This means that you select which file changes go together with which commits. So if you accidentally do a heap of work, and then don’t want to do one big commit, you can view all changed files and group them together in individual commits. When you write your commit message, write it in the style of conventional commits. Having this little discipline helped me pick up heaps of marks for ‘git quality’. Also, never push to main/master.

  2. Always run the black plugin to style your python code correctly. Your code should look neat. Comment religiously. You will pick up heaps of marks for ‘python quality’.

  3. Testing starts small. First, write a test to test only one function (If you can’t find a small function to test, you’ve got bigger things to worry about. Go refactor some code into a function). Modify it to test another function. Put the tests together in the directory with those functions. Now, you have established tests going and something to report on pytest coverage. Halfway there.

    Ok, so you don’t know how to write tests for everything else? The project is due in one hour? Go through and create the python test files, create the unit tests, add assert True is False to all of them (we’re not faking success here) and then comment on those tests what your strategy would have been if you had more time.

    In reality, given enough time, you can figure out how to test almost anything. 

    In practice, if you don’t have time, at least show you had dreams.

I would have tested that whole module.  No doubt. No doubt in my mind.

I would have tested that whole module. No doubt. No doubt in my mind.

4. If you are really stuck on a project, then DO THIS: Whenever you put down your work and have not made the progress you sought to make, ASK SOMEONE for help.

Either ask the forums (the question and answer may already be there waiting for you), or ask your TA. Don’t ever feel like you have to wait until you’ve done ‘enough struggle’ to ask. Ask early. Ask intelligently. You will not just get answers, but also friends, and grades for participation.

And finally…

Modern, popular libraries used: pandas, numpy, Django, Luigi, pytorch, cookiecutter, awscli

(hmmm… notice how I did not mention pipenv here)

It is important that the lecturer uses good modern libraries in the course. It’s how we know he’s a hip young dude and not an old fogey who wishes we would learn how to do punch cards.

Hokey pointers and ancient languages are no match for a good import at your side, kid

Hokey pointers and ancient languages are no match for a good import at your side, kid

So do I know for sure that the libraries we’re using are good? 

No! 

But do they seem to work well? 

Also no. 

But is that my fault? 

Probably!

At the end of the day, it’s nice learning how to use the packages that everyone’s talking about on reddit (I think that’s generally a good sign). There’s one package in particular, pipenv, which I believe should be sent straight to the gulag, but I will talk about that in the BAD article.

It’s actually a pleasure being able to quickly whip up a prototype at work using libraries released in the last few years. It is a great way of adding value to a team.

After all, if you can’t be the best, at least be the first.

Now, you may notice in this article that I haven’t really talked about what happened during the subject.

In fact, all of these good things were sort of… best-laid plans.

A little bit like Karl Marx sitting down to write The Communist Manifesto; there was probably a good intention and not much understanding of where this was all going…

In the end, I did leave the subject with these skills, and that’s good!

But… that’s a little bit like saying John King got a great watch out of the Burke and Wills expedition.

So, before we get to the BAD, I will give you my verdict. This subject is… good!

Now,

onto the bad. (COMING SOON)

I’m sorry, the old Michael can’t come to the phone right now.

I’m sorry, the old Michael can’t come to the phone right now.

Previous
Previous

The BAD of CSCI E-29: Advanced Python for Data Science, Harvard Extension School Fall 2020. A Review.

Next
Next

Advanced Python for Data Science: A Review (the GOOD part 1)