DeepLabCut Trouble Shooting
@(Ponce Lab)
Install DLC
Windows machine, follow the steps in install tutorial to establish the whole conda environment in the machine.
Fail at first step
Many of us just fail at first step, some error message like
Solving environment: failed
ResolvePackageNotFound:
- msvc_runtime
According to some reference, you can just hack the YAML file and move the error related lines down to pip:
part. It will look like this
name: dlc-windowsGPU
dependencies:
- python=3.6
# - msvc_runtime
- tensorflow-gpu==1.13.1
- cudnn=7
- wxpython
- jupyter
- pytables==3.4.4
- pip:
- deeplabcut
- msvc_runtime
For other errors like conflicting requirements, you should do
Error message to be added
This
Runtime TroubleShooting
Jupyter notebook environment checking
Note DeepLabCut
lives in a conda
environment so we have to check if Jupyter notebook is using python and packages from the correct environment! Or the import will be problematic.
So normally we activate the environment first and then start the notebook.
conda activate dlc-windowsGPU
D:
jupyter notebook
First we would like to install the extension for notebook conda install nb_conda
so that you will have link in jupyter to start a notebook in the environment
Check Training progress
Change into the log directory (which contains all the events.out.tfevents
files) and use tensorboard
to check the log file.
cd D:\MacaqueFaceRecogition\MacaqueFace-DLC2\Head-Free-Viewing-CRP-2019-07-01\dlc-models\iteration-0\Head-Free-ViewingJul1-trainset95shuffle1\train\log
tensorboard --logdir=.
DLC installation issue
If there is a previous version of DLC reside in somewhere in conda, you’d better uninstall it totally. Use
import deeplabcut as dlc
dlc.__file__
To check you are importing dlc
from the right place. If not you may encounter an error here. And you’d better reinstall the deeplabcut and the environment.
Pytables Issue
When running the label_frames
or
ImportError Traceback (most recent call last)
C:\ProgramData\Anaconda3\envs\dlc-windowsGPU\lib\site-packages\pandas\io\pytables.py in __init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
444 try:
--> 445 import tables # noqa
446 except ImportError as ex: # pragma: no cover
C:\Users\ponce\AppData\Roaming\Python\Python36\site-packages\tables\__init__.py in <module>()
92 # Necessary imports to get versions stored on the cython extension
---> 93 from .utilsextension import (
94 get_pytables_version, get_hdf5_version, blosc_compressor_list,
ImportError: DLL load failed: The specified procedure could not be found.
The result is it will save your points coordinates in a csv
file but cannot save it in the h5
file which is needed for training and evaluation.
It’s majorly a pytables
version problem.
Current solution
conda uninstall pytables
conda install -c conda-forge pytables=3.4.3
pip install --upgrade deeplabcut
And don’t panic if you encounter this bug, your label coordinate does not lost but is stored in the csv
file. So using dlc.convertcsv2h5(path_config_file)
will rescue the csv
file and generate h5
file. You may debug the pytables
problem and then run this command.
CUDA Issue
When running train_network
you may encounter the error
InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
https://blog.kovalevskyi.com/multiple-version-of-cuda-libraries-on-the-same-machine-b9502d50ae77
Solution: Install CUDA 10.x and most recent GPU driver and restart your system
Additional Note: How to specify CUDA version for the environment
In some system, we may need to set the environment variable to notify Tensorflow of the correct CUDA to use in the current environment.
#!/bin/sh
export LD_LIBRARY_PATH=$ORIGINAL_LD_LIBRARY_PATH
unset ORIGINAL_LD_LIBRARY_PATH
set ORIGINAL_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
set export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/extras/CUPTI/lib64:/lib/nccl/cuda-9:$LD_LIBRARY_PATH
https://stackoverflow.com/questions/31598963/how-to-set-specific-environment-variables-when-activating-conda-environment
Test GPU availability to Tensorflow
To test the the GPU acceleration for Tensorflow is working I recommend this scipt.
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.DEBUG)
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
with tf.Session() as sess:
print (sess.run(c))
Then you will see things in the notebook or the console like this, showing that the GPU is utilized and visible to tensorflow.
2019-07-02 12:54:46.169115: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-07-02 12:54:46.328165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2019-07-02 12:54:46.332011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-07-02 12:54:46.780035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-02 12:54:46.782437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-07-02 12:54:46.784090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-07-02 12:54:46.785529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4716 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
Similarly using this line can also tell you what device is available
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
Output like
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5904775635136544004, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4945621811
locality {
bus_id: 1
links {
}
}
incarnation: 6075035587783531498
physical_device_desc: "device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1"]
Show debugging information from Tensorflow
If you cannot see the debugging information either in notebook or in the console, you may need to add this line to the front of the code.
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.DEBUG)
We should do this line before importing the deeplabcut
(DLC imported tensorflow inside the package, so import it again will not change the settings.). Thus we will see the debugging information in the console!