代码拉取完成,页面将自动刷新
同步操作将从 jiujiangxueyuan/DevOps-Bash-tools 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
#!/usr/bin/env bash
# vim:ts=4:sts=4:sw=4:et
#
# Author: Hari Sekhon
# Date: 2019-11-27 16:09:34 +0000 (Wed, 27 Nov 2019)
#
# https://github.com/harisekhon/bash-tools
#
# License: see accompanying Hari Sekhon LICENSE file
#
# If you're using my code you're welcome to connect with me on LinkedIn and optionally send me feedback
#
# https://www.linkedin.com/in/harisekhon
#
set -euo pipefail
[ -n "${DEBUG:-}" ] && set -x
usage(){
cat <<EOF
Recurses HDFS path arguments outputting portable CRC32 checksums for each file
(can be used for HDFS vs local comparisons, whereas default MD5-of-MD5 cannot)
Calls HDFS command which is assumed to be in \$PATH
Capture stdout > file.txt for comparisons
Make sure to kinit before running this if using a production Kerberized cluster
This is slow because the HDFS command startup is slow and is called once per file path
Setting environment variable SKIP_ZERO_BYTE_FILES to any value will skip files with zero bytes to save time since
they always return the same checksum anyway.
Caveats:
This is slow because the HDFS command startup is slow and is called once per file path so doesn't scale well
If you want to skip zero byte files, set environment variable \$SKIP_ZERO_BYTE_FILES to any value
See Also:
hadoop_hdfs_files_native_checksums.jy
from the adjacent GitHub rep (outputs MD5-of-MD5 not CRC32 though):
https://github.com/HariSekhon/DevOps-Python-tools
I would have written this version in Python but the Snakebite library doesn't support checksum extraction
usage: ${0##*/} <file_or_directory_paths>
EOF
exit 3
}
if [[ "${1:-}" =~ ^- ]]; then
usage
fi
skip_zero_byte_files(){
if [ -n "${SKIP_ZERO_BYTE_FILES:-}" ]; then
awk '{if($5 != 0) print }'
else
cat
fi
}
hdfs dfs -ls -R "$@" |
grep -v '^d' |
skip_zero_byte_files |
awk '{ $1=$2=$3=$4=$5=$6=$7=""; print }' |
#sed 's/^[[:space:]]*//' |
while read -r filepath; do
hdfs dfs -Ddfs.checksum.combine.mode=COMPOSITE_CRC -checksum "$filepath"
done
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。