Abstract: Being talked around docker with machine learning, I finally get the chance to learn and set it up. This post is about what is docker and how to use it.
Docker in short words has two confusing items: image and container.
- An Image is your environoment snapshot, this if fixed unless you update the image every time. It is like a Class()
- A Container is the run time instance of an Image, just like a = Class(). It handles all run time configs, such as IP address or Port numbers.
Use docker for App deployment is easy, but for code development is HARD.
This is because any code changes go away after you delete the instance (call exit in a terminal). You can
ctrl-p ctrl-qto detach from a terminal but this won't make a big difference.
docker attach <container_name>to get back the environment
The way I found to make this work is you create a image that only mounts your code folders as an external drive path inside that image and then you run the instance of ths Image along with your envioronment containers.
The official site is pretty good. Make sure you install the CE verison.
And do not over doing this:
Solve the Docker Permission
To reduce the time of adding 'sudo' to a docker command. Run this in your favourite shell and then completely log out of your account and log back in (or exit your SSH session and reconnect, if in doubt, reboot the computer you are trying to run docker on!):
sudo usermod -a -G docker $USER
Install PyTorch: The examples
In this case, I want to use Fackbook's Detectron as an example.
First thing first: I don't have a fully configed docker image at the end of this post, if I made one in the future, I will link it here.
Also, if your GPU has a small ram, you won't be able to run the test at the end.
[E net_async_base.cc:377] [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /pytorch/caffe2/core/context_gpu.cu:415: out of memory
This is the detectron:
The docker makefile it contains does not work on my machine so I found another tutorial.
What do you need:
- Docker CE version (click to see instructions)
- A PyTorch GPU-enabled docker container
- A Nvida GPU pass-through docker container
- A Peng's Code pass-through docker container
Once you get docker CE installed, go for PyTorch but you need to be careful about the version:
This version has Caffe2 bundled and Cuda9 (because Cuda10 breaks things)
sudo docker pull pytorch/pytorch:nightly-devel-cuda9.2-cudnn7
Once you get the pytorch, go to here and install the Nvida GPU pass-through
After these you can create the environment container by:
sudo nvidia-docker run --name mytorch --rm -it pytorch/pytorch:nightly-devel-cuda9.2-cudnn7
We can then verify PyTorch is correctly installed and works fine with the GPU.
python -c "import torch;print(torch.cuda.get_device_name(0))"
To verify Caffe2 is installed correctly:
pip install protobuf pip install future python -c "from caffe2.python import workspace; print(workspace.NumCudaDevices())"
There is really no detectron installation, you just clone it. But you want to clone it to the code container otherwise this will be gone after you exits the container. (code container will come in future)
git clone https://github.com/facebookresearch/Detectron pip install -r ./Detectron/requirements.txt cd Detectron && make
Fix the missing dependency and verify OpenCV is fine.
credit goes to : https://www.kaggle.com/c/inclusive-images-challenge/discussion/70226
**You need to type lower case 'y' at the end, not 'Y' as asked"
apt-get update apt-get install libgtk2.0-dev python -c "import cv2; print(cv2.__version__)"
Install this testing helper
git clone https://github.com/cocodataset/cocoapi.git cd cocoapi/PythonAPI make install
Test Detectron with online database
This database is 500M, maybe download it first, manually.
Make sure you are at
python tools/infer_simple.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --output-dir /tmp/detectron-visualizations --image-ext jpg --wts https://dl.fbaipublicfiles.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl demo