ยท 3 min read

Dataset Handling

Dataset Class

2 general kinds of Dataset, one that models list (that support __len__) the other models iterator (that support __next__)

Data spliting

dataset = MNIST(root = './data', train = train, transform = transform, download=True)
train_set, val_set = torch.utils.data.random_split(dataset, [50000, 10000])

Data Augmentation and Transform

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(rgb_mean, rgb_std),
])

kwargs = {'num_workers': 2, 'pin_memory': True} if args.cuda else {}
train_loader = torch.utils.data.DataLoader(datasets.CIFAR10(root=data_root, train=True, transform=transform_train,
    transforms.Normalize(rgb_mean, rgb_std),
]), download=False), batch_size=128, **kwargs)

Loss Function Design

Softmax is suitable for single output class prediction, if you have multi-output

Tensorboard Recording

writer = SummaryWriter(log_dir=outdir, flush_secs=180)
for n_iter in range(100):
    writer.add_scalar('Loss/train', np.random.random(), n_iter)
    writer.add_scalar('Loss/test', np.random.random(), n_iter)
    writer.add_scalar('Accuracy/train', np.random.random(), n_iter)
    writer.add_scalar('Accuracy/test', np.random.random(), n_iter)

Note on windows, passing multiple run using --logdir is not working, so use --logdir_spec instead!

Recording the inner state of network requires to detach them from the network and clone their value.

Visualization

Tensorboard support figures, so making pretty visualization of inner structure of the network and record them throughout training is very useful.

Note if you want to visualize every single image you generated, you want the --samples_per_plugin flag to be set as 1 or sth.

Learning Rate Scheduling

Learning rate schedule is important for training! Several ways to do it.

Gradient and Weight Inspection

Inspecting and analyzing gradient in pytorch is super easy, each variable in the computational graph store its gradient in t.grad. And this value will not be erased until you perform optimizer.zero_grad(). So you could inspect the statistics of gradient during validation phase.

One useful thing is by inspecting the relative proportion of gradient w.r.t. current weight, you will get a proper scale of learning rate.

def weight_grad_stats(model):
    for name, submod in model.named_children():
        classname = submod._get_name()
        print(name, classname)
        for subname, subsubmod in submod.named_children():
            subclassname = subsubmod._get_name()
            print('\t', subname, subclassname)
            for param in subsubmod.parameters():
                if param.grad is not None:
                    print("\t\t%s : w %.4E, grad %.4E" % (
                    str(list(param.shape)), param.abs().mean(), param.grad.abs().mean()))
                else:
                    print("\t\t%s : w %.4E, grad not reached" % (str(list(param.shape)), param.abs().mean()))
    return

pwc = PWC_Net(model_path='models/chairs-things.pytorch').cuda()
optimizer = optim.SGD(pwc.parameters(), lr=0.00001, )  # weight_decay=0.0004)
alpha_w = [None, 0.005, 0.01, 0.02, 0.08, 0.32]
optimizer.zero_grad()
im1, im2, trflow = SintelClean[101]
trflow_pyr = resize_pyramid(trflow.unsqueeze(0).cuda())
predflow_pyr = pwc(im1.unsqueeze(0).cuda(), im2.unsqueeze(0).cuda())
loss = loss_fun(FLOW_SCALE * predflow_pyr[5] - trflow_pyr[5 - 1])
loss.backward()
weight_grad_stats(pwc)

Model Summary

https://github.com/Animadversio/pytorch-summary

A personal simple version of model summary is

def model_size_summary(model):
    param_num = 0
    for param in model.parameters():
        param_num += np.prod(list(param.shape))
        print(param.shape, "num %d" % np.prod(list(param.shape)))
    print(param_num, " in total, %.2fmb"%(param_num * 4 / 1024**2))

model_size_summary(mt)

Initialization

Weight initialization and reinitialization can be important.

https://pytorch.org/docs/stable/nn.init.html

Multiplex forward function

forward function of a module could be multiplexed by some flags. like train eval and so on. You can add your own and let it return some internal representation if the flag is on.

Command Line Application

Many cases, you may want to use your model on cluster or gpu server. Those headless machines need to use command line script to interact with. So many developer will build in this command line parser in their model.

def get_argument_parser(description=None):
    parser = argparse.ArgumentParser(description)
    parser.add_argument("-m", "--model_dir", type=str, required=True,
            help="The directory for a trained model is saved.")
    parser.add_argument("-c", "--conf", dest="configs", default=[], nargs="*",
            help="A list of configuration items. "
                 "An item is a file path or a 'key=value' formatted string. "
                 "The type of a value is determined by applying int(), float(), and str() "
                 "to it sequencially.")
    return parser

parser = custom.get_argument_parser()
args = parser.parse_args()

Experimental Note

Just like experiments in any kinds of science, it’s advisable to keep track of hyperparameters and record all of them in a table.

One way is just to code experimental time and setting into the folder and save tensorboard log into it.

current_time = datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
train_log_dir = 'logs/'+config.experiment+'/'+current_time+'/train'
eval_log_dir = 'logs/'+config.experiment+'/'+current_time+'/eval'

train_summary_writer = SummaryWriter(train_log_dir)
eval_summary_writer = SummaryWriter(eval_log_dir)