A case for machine-learning infrastructure automation to keep Higher Ed competitive

Posted by John Mathon on Mar 10, 2019, 6:00:00 AM

Every school’s goal is to give students an advantage in the professional world. No matter what subject they teach, university professors squeeze as much information as possible into a short semester to create intellectual value. For computer science, this is most obvious in the global race for artificial intelligence (AI) talent. According to Forbes, “The countries pushing AI forward have ready access to qualified professionals. They have also developed university programs and AI curriculum to develop more talent. When it comes to emerging technologies, intellectual capital is a huge strategic advantage.”

When it comes to machine learning and artificial intelligence, the world is moving so fast that educators and students alike struggle to stay current. “I spoke with a computer science chair who mentioned one of his key priorities was to rewrite their AI class labs off of Lisp,” says John Morada, vice president of business development. John says he also used the Lisp programming language years ago when he studied AI at The American University. Schools need to modernize their curriculums, and that requires investment and modern infrastructure. The private sector knows—that’s why 11 of the top 100 AI companies in the world have raised $6.254 billion in capital.

For students, the challenges are magnitudes greater. Not only do they need to learn the fundamentals of data science but also compliment their knowledge with pragmatic labs and tooling. Here’s the problem: most machine-learning labs start with several class sessions dedicated to installing software on students’ laptops, with the expectation that they’ll then use their personal machine’s computing power to analyze data. Debugging the installation process and training students to use each piece of software on their laptop takes additional time. And nothing about this process resembles the environment a student would experience in the professional world.

The Call for Agility

When students spend time installing and configuring software on their laptops, there is a huge opportunity cost. Education at this level should not be about installing bits and bytes; it should focus on learning data science. What students need, and should expect, are agile lab environments. Agility means the following:

  • Autonomous scaling compute infrastructure;
  • Adaptable, best-in-class software stacks.

In my white paper titled A Case to Unite Cloud + DevOps for the Agile Digital Business, I describe how businesses must build agile systems to keep current with the trends in digital transformation. Universities feed the commercial and government sectors; so we need to start by training tomorrow’s workforce to use agile systems and labs.

Easy Scaling for Big Workloads

Being able to use the tools for modeling, analysis and visualization is an important skill, but even more important is time spent on the fundamentals of data science, or getting real-world experience using machine-learning technology to analyze real data sets. For example, running PyCharm + Anaconda on a Macbook Pro and training with 250 megabytes of United States Census data is likely to finish in a couple of hours. This sampling then lets the student progress to the next steps of running simulations to validate that the model was trained properly according to our perceptions. Any insightful student will have to wonder whether this sampling was enough? Does the model reflect the entire national reality of the U.S. Census or just the data sampling given? Was there bias in the students selection of only 250 megabytes of data?

In the business world, the best practice is generally to feed the model as much data as possible to reduce the risk of data bias. The compute needed to train models with petabytes of data is magnitudes greater than a single Macbook Pro could handle. Even most university data centers would be overloaded by the types of large datasets generated by commercial big data— which are the types of datasets employers are interested in. It’s costly for universities to procure servers to accommodate the needs of machine-learning courses—and the long lead time on server procurement makes it impossible to increase compute resources mid-semester.

Having limited compute capacity means that students aren’t able to run algorithms that use large data sets. This limits the students’ classroom experiences to simple data sets that generally won’t represent the type of data set generated by Big Data and IoT—experience that most employers are looking for.

The Agile Stacks Machine Learning Stack runs seamlessly on Amazon Web Services (AWS). One of the major advantages to using AWS IaaS are the AWS Auto Scaling groups, which allow users to grow or shrink AWS Elastic Cloud Computing (EC2) instances to improve the machine-learning lab performance when students need it. There is very little manual effort required once the scaling policies are set. Remember when teaching assistants needed to triage a lab exercise running on a student’s laptop? In most cases, those memory errors no longer exist when running on a highly scalable public cloud infrastructure because students get all the compute, storage and memory needed to finish their homework without having to write a single command in the AWS Command Line Interface (CLI).

A Modern Toolchain for the Modern Data Scientist

If you’re trying to understand how deforestation in Brazil has affected the climate worldwide, you aren’t going to analyze that data on your laptop using Anaconda. Climate records and patterns alone will run in to the petabytes range. You’re going to be using detailed, worldwide data and running hundreds of permutations. You’ll need the computing power of a public cloud as well as software-based tools to analyze the data.

We offer a selection of software-based, best-of-breed open source tools: Kubeflow, Kubernetes, Tensorflow, Keras, Seldon, and many others. With our platform, we are now offering end-to-end machine-learning pipeline templates that can be used from Jupyter Notebook to create a model architecture, and then from the same notebook scale it for training on a large amount of data. It would be very applicable to climate analysis since they need infrastructure to prepare massive amounts of video data and feed it into the models for training.

Just as importantly, everything is cloud-based, which means there’s no software to install on anyone’s laptops, and students can be up and running with industry-standard tools in minutes, without wasting time installing software. Agile Stacks automates every part of the machine-learning pipeline that is best accomplished through automation, which allows students to focus on learning how to become true data scientists and to use those skills in a way that will be useful to potential employers.

Focusing on Human Intelligence

Whether you’re designing a machine-learning curriculum or running a machine-learning program at a university, it’s important to differentiate between the things that are best done by machines through automation and the tasks that require a human being. When it comes to teaching data science, this is particularly essential. You don’t want to get so carried away with teaching the tools used to analyze data that you forget to teach students how to apply AI tools in the real world.

Here are some of the ways humans can’t be replaced in a machine learning context:

  • Designing the machine-learning program based on the available data and the desired results/insights
  • Finding and selecting appropriate training data
  • Analyzing the AI results for bias and/or tainted data
  • Using the information from the data analysis to make decisions—for example, applying the information learned from the data to introduce new products or services

Creating Trusted Professionals

In the machine-learning and AI world, trust is one of the hardest things to cultivate. People don’t trust the data, and they don’t trust the people crunching the data, either. A well-designed machine-learning curriculum—one that gives students robust real-world experience running complex machine-learning programs that use large, realistic data sets and run multiple permutations—allows students to talk with authority about machine learning and how it is used to solve concrete, real-word problems.

This kind of expertise can only be developed if students are able to get a strong foundation in data science fundamentals mixed with robust experience in designing, deploying and running machine-learning algorithms. Fitting that amount of information into a semester or even academic year is only possible if you use the right tools to make sure students are focusing on the most important skills.

Want to see how Agile Stacks can get your faculty and students up and running with machine learning in a fraction of the time possible otherwise? Contact us to schedule a demo now.

Topics: AI, CTO, DevOps Automation, Machine Generated Code, Machine Learning, Higher Education

Subscribe Here!

Recent Posts

Posts by Tag

See all

RSS Feed