Advanced Python for Data Science: A Review (the GOOD part 1)
TLDR; Great content delivered in an unintentionally chaotic way
This article is…
Summary of the GOOD parts of Advanced Python for Data Science
Software carpentry is a lawful good
Get taught a mature approach to design
Disciplined environment/package management
Focus on continuous integration
Use of git, conventional commits, and facilitating collaboration
Amazing peer review from other students
Generous marking of assignments
Modern, popular libraries used: pandas, numpy, Django, Luigi, pytorch
I was recently asked to audit a few machine learning projects.
“Do you have knowledge in statistics?” the project lead had asked.
I... think I have knowledge in statistics, I answered. Everyone laughed.
“But seriously though, are you qualified to assess these projects?”. I knew I had to stop joking around.
Yes, I do. The last project I did was with a small dataset. I was under a lot of pressure to produce a ‘good’ result. I experimented with a lot of methods that I now know aren’t legitimate. Whatever tricks someone can pull to produce artificially good results, I’ve used them.
(I saw a few eyebrows raise)
Unintentionally, of course! I’m 100% about best practices now.
I got the gig.
But I did reflect on some of my own bad habits…
When I was 11 years old, my Karate teacher, John Tzinis, gave me advice about ageing.
”You have to stretch all throughout life,” he said, “Because by the time you get old, if you haven’t stretched, it will be too late. Stretch like a cat!”
I was bored by this advice. Stretching involved bending over, sticking your bottom out, and touching your toes. Being an 11 year old boy, I knew not to stick my bottom out. A stuck out bottom could get slapped, smacked, poked, kicked, dacked (that’s Australian for ‘pantsed’), and laughed at, and other 11 year old boys were the worst offenders.
No reward in old age is worth that.
Plus I had realised that many of the techniques he taught me were redundant. I knew how to fight. I would hold onto them with my left hand and punch them in the face repeatedly with my right hand until they gave up.
My technique?
Learned from Don Frye and Yoshihiro Takayam
I took the same approach to my machine learning projects. I punched the data in again and again until I got a working result. To hell with best practices. Forget repeatability. I had jupyter notebooks, GPUs, and fastAI.
Call me Kid Cuda.
In peer review, when asked about my models, I knew exactly which word to drop - it’s an ensemble (yeah - an “ensemble” of methods I don’t really understand).
I talked in my other article about being aware that my code was not at a professional level.
So it was time to start stretching in python.
I was on the hunt for a course that would really teach python. Which is a lot harder than you think. Almost every MOOC was geared towards entry level practitioners (why not go for the low hanging fruit?). So I did what any self-respecting fan of The Social Network would do. I went to Harvard.
Can’t leave Australia, of course, or stop working as a junior doctor. Also can’t actually get into Harvard as a regular student.
Can’t get out of the country, can’t get into the country club!
However, thanks to an opium-smoking philanthropic backpacker from the 1800s named John Lowell Jr, I could study online. Via Harvard Extension School, an institution designed to reduce unequal access to education.
This is me by the way, and this was my school uniform
So I signed up for Advanced Python for Data Science, taught by Professor Scott Gorlin (Hi Scott!).
Within a day of the course starting I had a message in my Canvas (online learning management system) inbox:
FROM Hamza Mahjoubi, TO Woody Woodburn
Advanced Python for Data Science
August 31, 2020 at 9:05am
START MESSAGE
“Hey there! I think you might have missed my previous message. I texted you previously on being study buddies and I was thinking maybe I can help you to double check your work on private basis, so I am reaching out to you. I got a few seniors who took this course in Spring and they don’t mind sharing tips on how to ace the course thru their experience in the homework, assignments, problem set and exams. If you want to know more, you can add me on WhatsApp. My number is +1 (747) 999-8708, or just click wa.me/17479998708 to add my Whatsapp directly. H.”
END MESSAGE
Oh dear… I’m already being asked to cheat.
This is the problem with online education. On the internet, everyone’s a scumbag.
(Except me. Feel free to shoot me a message as I’m very friendly)
I knew I had to fire back hard.
FROM Woody Woodburn, TO Hamza Mahjoubi
START MESSAGE
Thanks but I'm paying 3000USD to get help from the TAs.
I'm not interested in paying you to help me get answers from 'seniors'.
But I'm more than happy to let Scott know that you're offering these services to his students.
Perhaps he'll encourage one of his TAs to take you up on the offer to see what it's all about.
Who knows? The next person who accepts your invitation might actually be Scott himself!
END MESSAGE
That person was promptly let go from the course. I was somewhat disheartened. Was I now going to be competing against cheaters for a grade in this course?
I never had this problem with the MicroMasters at MIT…
Fortunately, I saw no evidence one way or another whether anyone else was buying their answers.
And so the great race to getting a B began (B necessary to get onto the degree program).
I didn’t realise at the time that the B was short for “Be ready to give up your weekends”.
I have to say, straight off the bat, it was great being forced to install an IDE. When I used to wander around Harvard libraries asking young folk what they were working on, I noticed that they all had the IDE tuned. We’re talking SSH, extensions, git hooks, and dark mode (need not even be mentioned). So now I did too. And for the first assignment, it was humming.
Now this is coding!
Suddenly, all the little things began to make sense. Oh, that’s getting red underlined because the relative import needs an additional period. Oh, I forgot an __init__.py? Thanks Pycharm. Code looks like crap? Run black every now and again - code still dodgy, but looks not dodgy.
I began doing the pre-reading, called Software Carpentry. My god, this is what I missed out on at MITx. Good fundamentals like using the command line, and changing the directory, were first and foremost. Yeah, it was presented like wet laundry, but the content was good. Like fish and chips wrapped in newspaper.
I finally felt like I could show my code with pride.
Yep, this baby can hold the weight of my massive ambitions.
I looked up to Scott like a demigod.
Our father, who art in Boston, hallowed be thy name.
When Scott popped up in the forums, I was a happy man. At work I’d tell people My professor said this, my professor said that, like I was a disciple of some exclusive American cult.
For a while, it was good. Very good. Things worked. Sometimes an engine light would go on, and I’d have to push start my vehicle, but at the end of the day, I got to where I was going.
What could possibly go wrong?
You're gonna need a bigger build
The GOOD continued in part 2.