2 minute read

Pytorch uses tensors instead, similar to a numpy array

import torch
import torchvision
ts = torch.tensor([[1,2,3],[4,5,6]]) # Tensor

ts.shape
> torch.Size([2,3])

type(ts)
> <class 'torch.Tensor'>

Converting to/from np Arrays from/to Tensors

# Conversion
np_array = np.array([1, 2, 3])
np_to_ts = torch.from_numpy(m_np) #Convert a numpy array to a Tensor

np_array = ts.numpy() #Tensor to numpy
np_array[1] = -1 #Numpy and Tensor share the same memory
np_to_ts[1] = -1

Assigning and splitting works the same as np. Masking

ts = torch.tensor([[1,2,3],[4,5,6]])
mask = ts > 1
new_array = ts[mask]
print(new_array)
> ([2,3,4,5,6])

Same meaning, different syntax. Likely a dunder method for add

x + y = torch.add(x,y) = torch.add(x, y, out=result_add)
x - y = torch.sub(x,y)
x * y = torch.mul(x,y)

Division with ints returns ints (NOT floats like with numpy). Convert to floats first if you want decimal points.

a = torch.tensor([4,4])
b = torch.tensor([3,3])
a/b = ([1,1])

Pytorch has its own dataloader (Thank goodness. Writing one is a pain). It automatically converts everything to pytorch tensors.

from torch.utils.data import DataLoader

pytorch_dataloader = DataLoader(our_csv_dataset, batch_size=batch_size)

# We can use the exact same way to iterate over samples
for i, item in enumerate(pytorch_dataloader):
    print('Starting item {}'.format(i))
    print('item contains')
    for key in item:
        print(key)
        print(type(item[key]))
        print(item[key].shape)
    if i+1 >= 1:
        break

## Output ##
> Starting item 0
> item contains
features  # key
<class 'torch.Tensor'> # type of item
torch.Size([4, 2]) # shape of item
target # key 
<class 'torch.Tensor'> # type of item
torch.Size([4, 1]) # shape of item

Torchvision

Torchvision comes with dataloaders for common datasets like Imagenet, FashionMNIST. That way you don’t end up having to write boilerplate code.

  • transforms.Compose let’s you do a series of transformations in a row.

  • transforms.ToTenser convert PIL image or numpy.ndarray
    (H × W × C) in the range 0,255 to a torch.FloatTensor of shape
    (C × H × W) in the range 0.0, 1.0.

  • transforms.Normalize normalize a tensor image with mean and standard deviation.

  • datasets.FashionMNIST to download the Fashion MNIST datasets and transform the data. train=True if we want to get the training set; otherwise set train=False to get the test set.

  • torch.utils.data.Dataloader takes our training data or test data with parameter batch_size and shuffle. batch_size defines how many samples per batch to load. shuffle=True makes the data reshuffled at every epoch.

Useful for looking at a select one or few images in the datset.

fashion_mnist_dataloader = DataLoader(fashion_mnist_dataset, batch_size=8)
def imshow(img):
    img = img / 2 + 0.5 # unormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# get some random training images
dataiter = iter(fashion_mnist_dataloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(8)))