跳转至

从源码编译安装

知乎地址:

https://zhuanlan.zhihu.com/p/572757265?

源码安装地址

GAS
https://tensorflow.juejin.im/install/install_sources.html

有所支持的显卡的机器上安装tensorflow和CUDA。

这里我使用的源代码是r2.10,不同的源代码需要使用的pythoncmake,bazel的版本可能不同。

Python
commit 359c3cdfc5fabac82b3c70b3b6de2b0a8c16874f (HEAD -> r2.10, tag: v2.10.0, origin/r2.10)
Merge: 203b3338195 724308fadba
Author: Mihai Maruseac <mihaimaruseac@google.com>
Date:   Fri Sep 2 15:59:55 2022 -0700

    Merge pull request #57609 from tensorflow/vinila21-patch-6

    Update estimator and keras version in TF 2.10 branch for 2.10.0.

https://cloud-atlas.readthedocs.io/zh_CN/latest/machine_learning/tensorflow/build_tensorflow.html

我使用的安装环境是8核,8G内存,swap 16G内存。

1.编译

先把packaging装上,最后装包时要用。

GAS
pip3 install packaging
pip2 install packaging

1.1 configure

Python
./configure
# 没有cuda的话,不要编译cuda

1.2 Build(看最后我用的命令行)

仅支持 CPU 的情况下,调用下面的命令建立一个 TenslorFlow pip 包:

GAS
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package

支持 GPU 的情况下,调用下面的命令建立一个 TensorFlow pip 包:

GAS
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package 

1.3 生成whl安装文件

出现以下信息表示build成功

GAS
INFO: Found applicable config definition build:dynamic_kernels in file /home/ken/workspace/tensorflow/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 5.072s, Critical Path: 4.12s
INFO: 3 processes: 1 internal, 2 local.
INFO: Build completed successfully, 3 total actions

bazel build 命令执行一个叫 build_pip_package 的脚本,该脚本在第5行所有的位置。执行这个脚本将会在 /tmp/tensorflow_pkg 目录下构建一个 .whl 文件。

Python
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg  // 可以改成~/tensorflow_pkg

# 以下是我执行该命令后的输出
~/workspace/tensorflow/tensorflow/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow ~/workspace/tensorflow/tensorflow
~/workspace/tensorflow/tensorflow
/tmp/tmp.4uUFP30gwO/tensorflow/include ~/workspace/tensorflow/tensorflow
~/workspace/tensorflow/tensorflow
2022 10 12 星期三 18:12:49 CST : === Building wheel
/usr/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'long_description_content_type'
  warnings.warn(msg)
warning: no files found matching 'README'
warning: no files found matching '*.pyd' under directory '*'
warning: no files found matching '*.pyi' under directory '*'
warning: no files found matching '*.pd' under directory '*'
warning: no files found matching '*.dylib' under directory '*'
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*.csv' under directory '*'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*.proto' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
2022 10 12 星期三 18:13:25 CST : === Output wheel file is in: /home/ken/tensorflow_pkg

1.4 安装whl文件(这里最好是使用virtualenv进行安装)

Python
pip3 --user install /tmp/tensorflow_pkg/tensorflow-1.8.0-py3-none-any.whl
pip2 --user install /tmp/tensorflow_pkg/tensorflow-1.8.0-py2-none-any.whl

如果编译时用的是python3,名字中会有py3,使用pip3命令。

如果编译时用的是python2,名字中会有py2,使用pip命令。

1.5 验证是否安装成功

按照以下说明验证 TensorFlow 是否安装成功:

启动终端。

GAS
python

在 python 交互式 shell 中输入以下代码:

TF1.x hello world:

Python
import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(msg))

TF2.x hello world:

Python
import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
tf.print(msg)

如果系统输出以下内容,你就可以开始编写 TensorFlow 程序了:

GAS
Hello, TensorFlow!

2.问题解决

https://blog.csdn.net/surtol/article/details/97638399

这里是以r2.10为例。

2.1 bazel版本错误

需要安装指定版本的bazel,根据报错信息选择指定版本,r2.10需要使用bazel-5.1.1

GAS
https://github.com/bazelbuild/bazel/releases?page=3
下载指定版本的 bazel-5.1.1-installer-linux-x86_64.sh

即可安装完成

安装bazel

GAS
chmod a+x *.sh
sudo ./bazel****.sh

2.2 依赖无法下载

这是在安装tensorflow中最大的问题。很多依赖都是从githubgoogle下载的,国内有gfw导致无法下载而失败。

依赖的解决有三种方式,前两种方式是首选,我用前两种方式解决的所有依赖问题,第三种方式没试过

https://stackoverflow.com/questions/42918662/bazel-error-building-tensorflow-with-local-llvm-repository

1 --over_repository方式

先将所需有依赖根据报错的链接用wget下下来,下载指定的报错的commit版本,一般会有google链接和github链接。如果没办法下载的话,通过下载源码然后使用git reset commit --hard 的方式。

下载好之后,解压到某个位置,然后用--over_repository进行指定。如tf_runtime

GAS
--override_repository=tf_runtime=/home/ken/workspace/tf_dep/runtime-6ca793b5d862ef6c50f242d77a811f06cce9b60a

2. 自己搭建http服务器

  • 下载所需的依赖打包文件。放到某个位置。

  • 搭建http服务器

GAS
sudo python3 -m http.server

这里,执行该命令的目录为home目录,可以在网页上输入http://localhost:8000进行测试。

最好直接在自己的$HOME目录执行该操作。这里我将依赖文件放于/home/ken/workspace/tf_dep/0538e5431afdb1fa05bdcedf70ee502ccfcd112a.tar.gz

  • 将链接加到对应的workspace.bzl文件中
Python
用以下命令清空bazel build cache
bazel clean --expunge

~/workspace/tensorflow/tensorflow$ git diff third_party/llvm/workspace.bzl
diff --git a/third_party/llvm/workspace.bzl b/third_party/llvm/workspace.bzl
index c9760c3ab38..75308a26ec4 100644
--- a/third_party/llvm/workspace.bzl
+++ b/third_party/llvm/workspace.bzl
@@ -13,6 +13,7 @@ def repo(name):
         strip_prefix = "llvm-project-{commit}".format(commit = LLVM_COMMIT),
         urls = [
             "https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/{commit}.tar.gz".format(commit = LLVM_COMMIT),
+            "http://localhost:8000/workspace/tf_dep/0538e5431afdb1fa05bdcedf70ee502ccfcd112a.tar.gz",
             "https://github.com/llvm/llvm-project/archive/{commit}.tar.gz".format(commit = LLVM_COMMIT),
       ],
         build_file = "//third_party/llvm:llvm.BUILD",

这样,会自动下载该依赖。

3. 使用native.local_repository

该方法还没研究清楚具体怎么用,没用过,不保证是对的

Python
Changed the temp_workaround_http_archive rule in //tensorflow/workspace.bzl to:

  native.local_repository (
    name = "llvm",
    path = "/git/llvm/",
  )

In /git/llvm I added the file WORKSPACE containing:

  workspace( name = "llvm" )

2.3 mlir依赖报错

Python
DIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /home/ken/workspace/tensorflow/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/ken/.cache/bazel/_bazel_ken/7958fc4ffdc3fda57f47c66f5611f525/external/llvm-project/mlir/BUILD.bazel:6592:10: Compiling mlir/tools/mlir-tblgen/DirectiveCommonGen.cpp failed: undeclared inclusion(s) in rule '@llvm-project//mlir:mlir-tblgen':
this rule is missing dependency declarations for the following files included by 'mlir/tools/mlir-tblgen/DirectiveCommonGen.cpp':
  'bazel-out/k8-opt-exec-50AE0418/bin/external/llvm-project/llvm/Demangle.cppmap'
  'bazel-out/k8-opt-exec-50AE0418/bin/external/llvm_terminfo/terminfo.cppmap'
  'bazel-out/k8-opt-exec-50AE0418/bin/external/llvm_zlib/zlib.cppmap'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 136.286s, Critical Path: 29.42s
INFO: 487 processes: 199 internal, 288 local.
FAILED: Build did NOT complete successfully

此种情况可能是使用了clang/clang++进行编译,换成gcc/g++后,该问题得到解决。

2.4 gcc: internal compiler error: Killed (program cc1plus)

GAS
 Configuration: 3ae23968b857c69df50fb9f45895d8d26e5e5626ba667099462d25a3ebb9874f
# Execution platform: @local_execution_config_platform//:platform
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
Target //tensorflow/tools/pip_package:build_pip_package failed to build

可能是资源不够用。

使用free -m或者htop查看资源使用情况

如果是swap分区不够用,如果是ubuntu的话,用下面的方法对swap分区进行扩容。

https://blog.csdn.net/AlexWang30/article/details/90341172

2.5 SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA

Python
2022-10-12 17:29:57.576173: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

这不是个错误 ,只是告诉你你可以加AVX等相关的选项去提高性能。

2.6 packaing not found

Python
from packaging import version as packaging_version  # pylint: disable=g-bad-import-order
ModuleNotFoundError: No module named 'packaging'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
ERROR: /home/ken/workspace/tensorflow/tensorflow/tensorflow/lite/python/BUILD:68:10 Middleman _middlemen/tensorflow_Slite_Spython_Stflite_Uconvert-runfiles failed: (Exit 1): bash failed: error executing command
GAS
pip3 install packaging
pip2 install packaging

3 编译命令

下面是我用的编译命令,--override_repository是提前下好解决的文件。这里只针对r2.10版本,其它版本所需要的依赖commit可能是不同的。

使用4GB内存,4个核来编译。htop来看有多少核,我这里有8G内存,8个核,只用了一半ram,另外设置了16G的swapp空间。

--local_ram_resources=4096 --local_cpu_resources=6, 不设置,则所有资源可用,可能会异常中止.

如果使用clang来编译,需要使用,建议不要用clang

GAS
CC=clang bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package 
--verbose_failures出错的话,会显示详细的出错信息
GAS
CC=gcc CXX=g++ bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package  --override_repository=tf_runtime=/home/ken/workspace/tf_dep/runtime-6ca793b5d862ef6c50f242d77a811f06cce9b60a --override_repository=upb=/home/ken/workspace/tf_dep/upb-9effcbcb27f0a665f9f345030188c0b291e32482 --override_repository=io_bazel_rules_go=/home/ken/workspace/tf_dep/rules_go-0.18.5 --override_repository=envoy_api=/home/ken/workspace/tf_dep/data-plane-api-c83ed7ea9eb5fb3b93d1ad52b59750f1958b8bde --override_repository=rules_java=/home/ken/workspace/tf_dep/rules_java-981f06c3d2bd10225e85209904090eb7b5fb26bd --override_repository=XNNPACK=/home/ken/workspace/tf_dep/XNNPACK-6b409ac0a3090ebe74d0cdfb494c4cd91485ad39 --override_repository=rules_cc=/home/ken/workspace/tf_dep/rules_cc-081771d4a0e9d7d3aa0eed2ef389fa4700dfb23e --local_ram_resources=4096 --local_cpu_resources=6  --verbose_failures