TX1入门教程软件篇-安装TensorFlow(1.0.1)

TX1入门教程软件篇-安装TensorFlow(1.0.1)

说明：

介绍如何在TX1上安装TensorFlow 1.0.1版本，1.0版本以上可以支持更多功能实现。

准备：

利用Jetpack安装如下：
- L4T 24.2.1 an Ubuntu 16.04 64-bit variant (aarch64)
- CUDA 8.0
- cuDNN 5.1.5
- TensorFlow安装需要用到CUDA和cuDNN
- TensorFlow占用比较多空间，TX1通常空间不足，最好增加64G+的U盘作为root分区启动，增加交换分区大小为8G+

安装：

安装Java：

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

安装依赖，(使用Python 2.7)

sudo apt-get install zip unzip autoconf automake libtool curl zlib1g-dev maven -y
sudo apt-get install python-numpy swig python-dev python-pip python-wheel -y

安装Bazel(0.5.0版本)
- 下载 bazel-0.4.5-dist.zip

wget https://github.com/bazelbuild/bazel/releases/download/0.4.5/bazel-0.4.5-dist.zip

解压，进入

cd bazel-0.4.5-dist

修改：

vim src/main/java/com/google/devtools/build/lib/util/CPU.java

其中28行：ARM("arm", ImmutableSet.of("arm","armv7l"))
修改为：ARM("arm", ImmutableSet.of("aarch64", "arm","armv7l"))

编译：

./compile.sh

sudo cp output/bazel /usr/local/bin

创建swap文件
- 创建8G swap文件：

fallocate -l 8G swapfile

修改权限

chmod 600 swapfile

创建swap区

mkswap swapfile

激活

swapon swapfile

确认

swapon -s

安装TensorFlow
- 克隆

git clone https://github.com/tensorflow/tensorflow.git

checkout 新版本

cd tensorflow
git checkout v1.0.1

修改：tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
找到：static int TryToReadNumaNode(conststring &pci_bus_id,intdevice_ordinal)
添加：#if defined(__APPLE__)

#ifdef __aarch64__
    LOG(INFO) << "ARM64 does not support NUMA - returning NUMA node zero";
    return 0;
#elif defined(__APPLE__)

增加头文件

sudo cp /usr/include/cudnn.h /usr/lib/aarch64-linux-gnu/include/cudnn.h

编译：

./configure

配置：

ubuntu@tegra-ubuntu:~/tensorflow$ ./configure 
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python2.7
Please specify optimization flags to use during compilation [Default is -march=native]: 
Do you wish to use jemalloc as the malloc implementation? (Linux only) [Y/n] y
jemalloc enabled on Linux
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] n
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] y
XLA JIT support will be enabled for TensorFlow
Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] n
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 
Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 
Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
Extracting Bazel installation...
.......................
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.......................
INFO: All external dependencies fetched successfully.
Configuration finished

bazel 编译：

bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

bazel生成whl文件

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

保存whl文件

mv /tmp/tensorflow_pkg/tensorflow-1.0.1-cp27-cp27mu-linux_aarch64.whl $HOME/

安装

sudo pip install $HOME/tensorflow-1.0.1-cp27-cp27mu-linux_aarch64.whl

重启

sudo reboot

测试：

ubuntu@tegra-ubuntu:~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
>>> x = tf.constant(1.0)
>>> y = tf.constant(2.0)
>>> z = x + y
>>> with tf.Session() as sess:
...     print z.eval()
... 
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM has no NUMA node, hardcoding to return zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 6.79GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 6.45G (6929413888 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.81G (6236472320 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.23G (5612825088 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.70G (5051542528 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.23G (4546387968 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.81G (4091749120 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices:
I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): <undefined>, <undefined>
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices:
I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): GP10B, Compute Capability 6.2
3.0

问题：

错误：

tensorflow/stream_executor/BUILD:39:1: C++ compilation of rule '//tensorflow/stream_executor:cuda_platform' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command

解决：

https://github.com/tensorflow/tensorflow/issues/2559
https://github.com/tensorflow/tensorflow/issues/2556

修改：tensorflow/stream_executor/cuda/cuda_blas.cc, 在

#if CUDA_VERSION >= 7050
#define EIGEN_HAS_CUDA_FP16
#endif

增加定义：

#if CUDA_VERSION >= 8000
#define CUBLAS_DATA_HALF CUDA_R_16F
#endif

获取最新文章: 扫一扫右上角的二维码加入“创客智造”公众号