Introduction to Deep Learning Using Keras and Tensorflow — Part1

Rhnyewale
7 min readNov 7, 2020

We’ll cover some key concepts of deep learning with a hands-on example:

What is Deep Learning?

https://rhnyewale.medium.com/introduction-to-deep-learning-bcbd0214224e

1) Single Neuron — Linear Unit

So let’s begin with the fundamental component of a neural network: the individual neuron. As a diagram, a neuron (or unit) with one input looks like

The input is x. Its connection to the neuron has a weight which is w. Whenever a value flows through a connection, you multiply the value by the connection’s weight. For the input x, what reaches the neuron is w * x. A neural network “learns” by modifying its weights.

The b is a special kind of weight we call the bias. The bias doesn’t have any input data associated with it; instead, we put a 1 in the diagram so that the value that reaches the neuron is just b (since 1 * b = b). The bias enables the neuron to modify the output independently of its inputs.

The y is the value the neuron ultimately outputs. To get the output, the neuron sums up all the values it receives through its connections. This neuron’s activation is y = w * x + b, or as a formula y=wx+b .

It’s an equation of a line! It’s the slope-intercept equation, where w is the slope, and b is the y-intercept.

Example — The Linear Unit as a Model

Though individual neurons will usually only function as part of a larger network, it’s often useful to start with a single neuron model as a baseline. Single neuron models are linear models.

Let’s think about how this might work on a dataset like 80 Cereals. Training a model with ‘sugars’ (grams of sugars per serving) as input and ‘calories’ (calories per serving) as output, we might find the bias is b=90 and the weight is w=2.5. We could estimate the calorie content of cereal with 5 grams of sugar per serving like this:

Multiple Inputs

The 80 Cereals dataset has many more features than just ‘sugars’. What if we wanted to expand our model to include things like fiber or protein content? That’s easy enough. We can just add more input connections to the neuron, one for each additional feature. To find the output, we would multiply each input to its connection weight and then add them all together.

The formula for this neuron would be y=w0x0+w1x1+w2x2+b. A linear unit with two inputs will fit a plane, and a unit with more inputs than that will fit a hyperplane.

Define a linear model in Keras

The easiest way to create a model in Keras is through keras.Sequential, which creates a neural network as a stack of layers. We can create models like those above using a dense layer (which we’ll learn more about in the next lesson).

We could define a linear model accepting three input features (‘sugars’, ‘fiber’, and ‘protein’) and producing a single output (‘calories’) like so:

With the first argument, units, we define how many outputs we want. In this case, we are just predicting ‘calories’, so we’ll use units=1.

With the second argument, input_shape, we tell Keras the dimensions of the inputs. Setting input_shape=[3] ensures the model will accept three features as input (‘sugars’, ‘fiber’, and ‘protein’).

Weights

Internally, Keras represents the weights of a neural network with tensors. Tensors are basically TensorFlow’s version of a Numpy array with a few differences that make them better suited to deep learning. One of the most important is that tensors are compatible with GPU and TPU) accelerators. TPUs, in fact, are designed specifically for tensor computations.

A model’s weights are kept in its weights attribute as a list of tensors. Get the weights of the model you defined above.

(If you want, you could display the weights with something like: print(“Weights\n{}\n\nBias\n{}”.format(w, b))).

Do you see how there’s one weight for each input (and a bias)? Notice though that there doesn’t seem to be any pattern to the values the weights have. Before the model is trained, the weights are set to random numbers (and the bias to 0.0). A neural network learns by finding better values for its weights.

(By the way, Keras represents weights as tensors but also uses tensors to represent data. When you set the input_shape argument, you are telling Keras the dimensions of the array it should expect for each example in the training data. Setting input_shape=[3] would create a network accepting vectors of length 3, like [0.2, 0.4, 0.6].)

Let’s see how we can build neural networks capable of learning the complex kinds of relationships deep neural nets are famous for.

The key idea here is modularity, building up a complex network from simpler functional units. We’ve seen how a linear unit computes a linear function — now we’ll see how to combine and modify these single units to model more complex relationships.

2) Deep Neural Networks

Layers

Neural networks typically organize their neurons into layers. When we collect together linear units having a common set of inputs we get a dense layer.

You could think of each layer in a neural network as performing some kind of relatively simple transformation. Through a deep stack of layers, a neural network can transform its inputs in more and more complex ways. In a well-trained neural network, each layer is a transformation getting us a little bit closer to a solution.

Many Kinds of Layers A “layer” in Keras is a very general kind of thing. A layer can be, essentially, any kind of data transformation. Many layers, like the convolutional and recurrent layers, transform data through the use of neurons and differ primarily in the pattern of connections they form. Others though are used for feature engineering or just simple arithmetic.

The Activation Function

It turns out, however, that two dense layers with nothing in between are no better than a single dense layer by itself. Dense layers by themselves can never move us out of the world of lines and planes. What we need is something nonlinear. What we need are activation functions.

An activation function is simply some function we apply to each of a layer’s outputs (its activations). The most common is the rectifier function max(0,x) .

When we attach the rectifier to a linear unit, we get a rectified linear unit or ReLU. (For this reason, it’s common to call the rectifier function the “ReLU function”.) Applying a ReLU activation to a linear unit means the output becomes max(0, w * x + b), which we might draw in a diagram like:

Stacking Dense Layers

Now that we have some nonlinearity, let’s see how we can stack layers to get complex data transformations.

The layers before the output layer are sometimes called hidden since we never see their outputs directly. And though we haven’t shown them in this diagram each of these neurons would also be receiving a bias (one bias for each neuron).

Now, notice that the final (output) layer is a linear unit (meaning, no activation function). That makes this network appropriate to a regression task, where we are trying to predict some arbitrary numeric value. Other tasks (like classification) might require an activation function on the output.

Define a Model with Hidden Layers and Activation Function

In the Concrete dataset, your task is to predict the compressive strength of concrete manufactured according to various recipes.

The target for this task is the column ‘CompressiveStrength’. The remaining columns are the features we’ll use as inputs

create a model with three hidden layers, each having 512 units and the ReLU activation. Be sure to include an output layer of one unit and no activation, and also input_shape as an argument to the first layer.

We learned how to build fully-connected networks out of stacks of dense layers. When first created, all of the network’s weights are set randomly — the network doesn’t “know” anything yet.

Next we’re going to see how to train a neural network; we’re going to see how neural networks learn.

--

--