Design an Simple-to-Use Deep Studying Framework | by Haifeng Jin

The three software program design rules I discovered as an open-source contributor

Deep studying frameworks are extraordinarily transitory. When you examine the deep studying frameworks folks use in the present day with what it was eight years in the past, you’ll find the panorama is totally totally different. There have been Theano, Caffe2, and MXNet, which all went out of date. At the moment’s hottest frameworks, like TensorFlow and PyTorch, had been simply launched to the general public.

By all these years, Keras has survived as a high-level user-facing library supporting totally different backends, together with TensorFlow, PyTorch, and JAX. As a contributor to Keras, I discovered how a lot the crew cares about consumer expertise for the software program and the way they ensured a superb consumer expertise by following just a few easy but highly effective rules of their design course of.

On this article, I’ll share the three most essential software program design rules I discovered by contributing to the Keras by way of the previous few years, which can be generalizable to all varieties of software program and make it easier to make an affect within the open-source neighborhood with yours.

Why consumer expertise is essential for open-source software program

Earlier than we dive into the primary content material, let’s rapidly focus on why consumer expertise is so essential. We will be taught this by way of the PyTorch vs. TensorFlow case.

They had been developed by two tech giants, Meta and Google, and have fairly totally different cultural strengths. Meta is nice at product, whereas Google is nice at engineering. Consequently, Google’s frameworks like TensorFlow and JAX are the quickest to run and technically superior to PyTorch, as they assist sparse tensors and distributed coaching properly. Nevertheless, PyTorch nonetheless took away half of the market share from TensorFlow as a result of it prioritizes consumer expertise over different elements of the software program.

Higher consumer expertise wins for the analysis scientists who construct the fashions and propagate them to the engineers, who take fashions from them since they don’t all the time wish to convert the fashions they obtain from the analysis scientists to a different framework. They may construct new software program round PyTorch to clean their workflow, which is able to set up a software program ecosystem round PyTorch.

TensorFlow additionally made just a few blunders that brought about its customers to lose. TensorFlow’s basic consumer expertise is nice. Nevertheless, its set up information for GPU assist was damaged for years earlier than it was mounted in 2022. TensorFlow 2 broke the backward compatibility, which price its customers hundreds of thousands of {dollars} emigrate.

So, the lesson we discovered right here is that regardless of technical superiority, consumer expertise decides which software program the open-source customers would select.

All deep studying frameworks make investments closely in consumer expertise

All of the deep studying frameworks—TensorFlow, PyTorch, and JAX—make investments closely in consumer expertise. Good proof is that all of them have a comparatively excessive Python share of their codebases.

All of the core logic of deep studying frameworks, together with tensor operations, automated differentiation, compilation, and distribution are carried out in C++. Why would they wish to expose a set of Python APIs to the customers? It’s simply because the customers love Python and so they wish to polish their consumer expertise.

Investing in consumer expertise is of excessive ROI

Think about how a lot engineering effort it requires to make your deep studying framework a bit of bit sooner than others. Lots.

Nevertheless, for a greater consumer expertise, so long as you observe a sure design course of and a few rules, you’ll be able to obtain it. For attracting extra customers, your consumer expertise is as essential because the computing effectivity of your framework. So, investing in consumer expertise is of excessive return on funding (ROI).

The three rules

I’ll share the three essential software program design rules I discovered by contributing to Keras, every with good and unhealthy code examples from totally different frameworks.

Precept 1: Design end-to-end workflows

Once we consider designing the APIs of a bit of software program, you might seem like this.

class Mannequin:
def __call__(self, enter):
"""The ahead name of the mannequin.Args:
enter: A tensor. The enter to the mannequin.
"""
cross

Outline the category and add the documentation. Now, we all know all the category names, methodology names, and arguments. Nevertheless, this may not assist us perceive a lot in regards to the consumer expertise.

What we should always do is one thing like this.

enter = keras.Enter(form=(10,))
x = layers.Dense(32, activation='relu')(enter)
output = layers.Dense(10, activation='softmax')(x)
mannequin = keras.fashions.Mannequin(inputs=enter, outputs=output)
mannequin.compile(
optimizer="adam", loss="categorical_crossentropy"
)

We wish to write out your entire consumer workflow of utilizing the software program. Ideally, it ought to be a tutorial on tips on how to use the software program. It offers rather more details about the consumer expertise. It could assist us spot many extra UX issues throughout the design section in contrast with simply writing out the category and strategies.

Let’s take a look at one other instance. That is how I found a consumer expertise downside by following this precept when implementing KerasTuner.

When utilizing KerasTuner, customers can use this RandomSearch class to pick one of the best mannequin. We now have the metrics, and targets within the arguments. By default, goal equals validation loss. So, it helps us discover the mannequin with the smallest validation loss.

class RandomSearch:
def __init__(self, ..., metrics, goal="val_loss", ...):
"""The initializer.Args:
metrics: A listing of Keras metrics.
goal: String or a customized metric perform. The
title of the metirc we wish to reduce.
"""
cross

Once more, it doesn’t present a lot details about the consumer expertise. So, every part seems OK for now.

Nevertheless, if we write an end-to-end workflow like the next. It exposes many extra issues. The consumer is making an attempt to outline a customized metric perform named custom_metric. The target is just not so simple to make use of anymore. What ought to we cross to the target argument now?

tuner = RandomSearch(
...,
metrics=[custom_metric],
goal="val_???",
)

It ought to be simply "val_custom_metric”. Simply use the prefix of "val_" and the title of the metric perform. It’s not intuitive sufficient. We wish to make it higher as an alternative of forcing the consumer to be taught this. We simply noticed a consumer expertise downside by penning this workflow.

When you wrote the design extra comprehensively by together with the implementation of the custom_metric perform, you’ll find you even must discover ways to write a Keras customized metric. It’s important to observe the perform signature to make it work, as proven within the following code snippet.

def custom_metric(y_true, y_pred):
squared_diff = ops.sq.(y_true - y_pred)
return ops.imply(squared_diff, axis=-1)

After discovering this downside. We specifically designed a greater workflow for customized metrics. You solely must override HyperModel.match() to compute your customized metric and return it. No strings to call the target. No perform signature to observe. Only a return worth. The consumer expertise is significantly better proper now.

class MyHyperModel(HyperModel):
def match(self, trial, mannequin, validation_data):
x_val, y_true = validation_data
y_pred = mannequin(x_val)
return custom_metric(y_true, y_pred)tuner = RandomSearch(MyHyperModel(), max_trials=20)

Yet another factor to recollect is we should always all the time begin from the consumer expertise. The designed workflows backpropagate to the implementation.

Precept 2: Reduce cognitive load

Don’t power the consumer to be taught something until it’s actually crucial. Let’s see some good examples.

The Keras modeling API is an efficient instance proven within the following code snippet. The mannequin builders have already got these ideas in thoughts, for instance, a mannequin is a stack of layers. It wants a loss perform. We will match it with knowledge or make it predict on knowledge.

mannequin = keras.Sequential([
layers.Dense(10, activation="relu"),
layers.Dense(num_classes, activation="softmax"),
])
mannequin.compile(loss="categorical_crossentropy")
mannequin.match(...)
mannequin.predict(...)

So mainly, no new ideas had been discovered to make use of Keras.

One other good instance is the PyTorch modeling. The code is executed identical to Python code. All tensors are simply actual tensors with actual values. You possibly can rely upon the worth of a tensor to determine your path with plain Python code.

class MyModel(nn.Module):
def ahead(self, x):
if x.sum() > 0:
return self.path_a(x)
return self.path_b(x)

You can too do that with Keras with TensorFlow or JAX backend however must be written in another way. All of the if situations have to be written with this ops.cond perform as proven within the following code snippet.

class MyModel(keras.Mannequin):
def name(self, inputs):
return ops.cond(
ops.sum(inputs) > 0,
lambda : self.path_a(inputs),
lambda : self.path_b(inputs),
)

That is instructing the consumer to be taught a brand new op as an alternative of utilizing the if-else clause they’re accustomed to, which is unhealthy. In compensation, it brings vital enchancment in coaching pace.

Right here is the catch of the flexibleness of PyTorch. When you ever wanted to optimize the reminiscence and pace of your mannequin, you would need to do it by your self utilizing the next APIs and new ideas to take action, together with the inplace arguments for the ops, the parallel op APIs, and specific machine placement. It introduces a reasonably excessive studying curve for the customers.

torch.relu(x, inplace=True)
x = torch._foreach_add(x, y)
torch._foreach_add_(x, y)
x = x.cuda()

Another good examples are keras.ops, tensorflow.numpy, jax.numpy. They’re only a reimplementation of the numpy API. When introducing some cognitive load, simply reuse what folks already know. Each framework has to supply some low-level ops in these frameworks. As a substitute of letting folks be taught a brand new set of APIs, which can have 100 capabilities, they only use the preferred present API for it. The numpy APIs are well-documented and have tons of Stack Overflow questions and solutions associated to it.

The worst factor you are able to do with consumer expertise is to trick the customers. Trick the consumer to imagine your API is one thing they’re accustomed to however it isn’t. I’ll give two examples. One is on PyTorch. The opposite one is on TensorFlow.

What ought to we cross because the pad argument in F.pad() perform if you wish to pad the enter tensor of the form (100, 3, 32, 32) to (100, 3, 1+32+1, 2+32+2) or (100, 3, 34, 36)?

import torch.nn.purposeful as F
# pad the 32x32 photos to (1+32+1)x(2+32+2)
# (100, 3, 32, 32) to (100, 3, 34, 36)
out = F.pad(
torch.empty(100, 3, 32, 32),
pad=???,
)

My first instinct is that it ought to be ((0, 0), (0, 0), (1, 1), (2, 2)), the place every sub-tuple corresponds to one of many 4 dimensions, and the 2 numbers are the padding measurement earlier than and after the prevailing values. My guess is originated from the numpy API.

Nevertheless, the right reply is (2, 2, 1, 1). There isn’t a sub-tuple, however one plain tuple. Furthermore, the size are reversed. The final dimension goes the primary.

The next is a nasty instance from TensorFlow. Are you able to guess what’s the output of the next code snippet?

worth = True@tf.perform
def get_value():
return worth
worth = False
print(get_value())

With out the tf.perform decorator, the output ought to be False, which is fairly easy. Nevertheless, with the decorator, the output is True. It is because TensorFlow compiles the perform and any Python variable is compiled into a brand new fixed. Altering the outdated variable’s worth wouldn’t have an effect on the created fixed.

It tips the consumer into believing it’s the Python code they’re accustomed to, however really, it isn’t.

Precept 3: Interplay over documentation

Nobody likes to learn lengthy documentation if they will determine it out simply by operating some instance code and tweaking it by themselves. So, we attempt to make the consumer workflow of the software program observe the identical logic.

Right here is an efficient instance proven within the following code snippet. In PyTorch, all strategies with the underscore are inplace ops, whereas those with out will not be. From an interactive perspective, these are good, as a result of they’re simple to observe, and the customers don’t must test the docs every time they need the inplace model of a way. Nevertheless, after all, they launched some cognitive load. The customers must know what does inplace means and when to make use of them.

x = x.add(y)
x.add_(y)
x = x.mul(y)
x.mul_(y)

One other good instance is the Keras layers. They strictly observe the identical naming conference as proven within the following code snippet. With a transparent naming conference, the customers can simply keep in mind the layer names with out checking the documentation.

from keras import layerslayers.MaxPooling2D()
layers.GlobalMaxPooling1D()
layers.GlobalAveragePooling3D()

One other essential a part of the interplay between the consumer and the software program is the error message. You can’t count on the consumer to write down every part appropriately the very first time. We must always all the time do the required checks within the code and attempt to print useful error messages.

Let’s see the next two examples proven within the code snippet. The primary one has not a lot info. It simply says tensor form mismatch. The
second one incorporates rather more helpful info for the consumer to search out the bug. It not solely tells you the error is due to tensor form mismatch, however it additionally reveals what’s the anticipated form and what’s the incorrect form it obtained. When you didn’t imply to cross that form, you’ve a greater thought
of the bug now.

# Dangerous instance:
increase ValueError("Tensor form mismatch.")# Good instance:
increase ValueError(
"Tensor form mismatch. "
"Anticipated: (batch, num_features). "
f"Acquired: {x.form}"
)

The most effective error message can be instantly pointing the consumer to the repair. The next code snippet reveals a basic Python error message. It guessed what was incorrect with the code and instantly pointed the consumer to the repair.

import mathmath.sqr(4)
"AttributeError: module 'math' has no attribute 'sqr'. Did you imply: 'sqrt'?"

Ultimate phrases

Up to now we’ve launched the three most useful software program design rules I’ve discovered when contributing to the deep studying frameworks. First, write end-to-end workflows to find extra consumer expertise issues. Second, scale back cognitive load and don’t educate the consumer something until crucial. Third, observe the identical logic in your API design and throw significant error messages in order that the customers can be taught your software program by interacting with it as an alternative of continually checking the documentation.

Nevertheless, there are numerous extra rules to observe if you wish to make your software program even higher. You possibly can check with the Keras API design tips as an entire API design information.