Adapt run_full_export according to swh cli conventions
This adapts the existing script to:
- use click which autodocuments the cli
- add default values for less important parameters
- switch to logging instead of print statements
- allows to provide another image (default to bbaldassari/maven-index-exporter).
This also adapts the documentation about the script accordingly.
Related to swh/infra/sysadm-environment#3746 (closed)
Test Plan
- scripts/test_docker_image.sh is happy
- actual run of the script is happy too:
$ cd docker/
# build the image
$ docker build -f Dockerfile -t $USER/maven-index-exporter .
Sending build context to Docker daemon 23.55kB
Step 1/8 : FROM adoptopenjdk/openjdk11:alpine-jre
---> b9a979a572aa
Step 2/8 : ADD https://github.com/javasoze/clue/releases/download/release-6.2.0-1.0.0/clue-6.2.0-1.0.0.jar /opt/
Downloading [==================================================>] 18MB/18MB
---> Using cache
---> 9e3136d449b6
Step 3/8 : ADD https://repo1.maven.org/maven2/org/apache/maven/indexer/indexer-cli/6.0.0/indexer-cli-6.0.0.jar /opt/
Downloading [==================================================>] 14.91MB/14.91MB
---> Using cache
---> 5d0e575fb7bd
Step 4/8 : COPY extract_indexes.sh /opt/
---> Using cache
---> 777ca2fa6853
Step 5/8 : WORKDIR /work/
---> Using cache
---> 8e291c569bd1
Step 6/8 : RUN ls /opt/
---> Using cache
---> ef435da9603e
Step 7/8 : RUN ls -R /work/
---> Using cache
---> 5146a6df8a47
Step 8/8 : CMD ["sh", "/opt/extract_indexes.sh", "/work/nexus-maven-repository-index.gz"]
---> Using cache
---> 40af3ac1add7
Successfully built 40af3ac1add7
Successfully tagged tony/maven-index-exporter:latest
$ cd ../scripts
$ python3 run_full_export.py --base-url https://repo.maven.apache.org/maven2/ --docker-image $USER/maven-index-exporter
INFO:__main__:Script: run_full_export
INFO:__main__:Timestamp: 2022-03-22 18:00:52
INFO:__main__:* URL: https://repo.maven.apache.org/maven2/
INFO:__main__:* Working directory: /tmp/maven-index-exporter/
INFO:__main__:* Publish directory: /tmp/maven-index-exporter/publish/
INFO:__main__:Work_Dir /tmp/maven-index-exporter/ exists. Reusing it.
INFO:__main__:Downloading all required indexes
INFO:__main__: - Downloading /tmp/maven-index-exporter/nexus-maven-repository-index.properties.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.732.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.733.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.734.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.735.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.736.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.737.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.738.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.742.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.739.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.743.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.740.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.744.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.741.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.745.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.746.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.747.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.748.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.749.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.750.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.751.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.722.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.723.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.724.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.725.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.726.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.727.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.728.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.729.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.730.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.731.gz exists, skipping download.
INFO:__main__: - File /tmp/maven-index-exporter/nexus-maven-repository-index.gz exists, skipping download.
INFO:__main__:Docker: Found image <Image: 'maven-index-exporter:latest', 'tony/maven-index-exporter:latest'> locally, ID is sha256:40af3ac1add75d24738839ccc1971f338192261927cc287460eb6892a8812092.
INFO:__main__:Docker log:
Docker Script started on 2022-03-22 17:00:52.
# Checks..
* Content of /opt:
total 30469
-rw------- 1 root root 18000742 Dec 8 07:19 clue-6.2.0-1.0.0.jar
-rw-r--r-- 1 root root 2830 Feb 17 13:56 extract_indexes.sh
-rw------- 1 root root 14914610 Nov 28 2017 indexer-cli-6.0.0.jar
drwxr-xr-x 3 root root 3 Mar 20 19:38 java
* Content of /work:
total 1671296
drwxrwxrwx 2 root root 4096 Mar 22 16:33 export
drwxrwxrwx 2 root root 12288 Mar 22 16:11 indexes
-rw-r--r-- 1 1000 1000 5097757 Mar 22 15:52 nexus-maven-repository-index.722.gz
-rw-r--r-- 1 1000 1000 1836882 Mar 22 15:52 nexus-maven-repository-index.723.gz
-rw-r--r-- 1 1000 1000 3870944 Mar 22 15:53 nexus-maven-repository-index.724.gz
-rw-r--r-- 1 1000 1000 6913907 Mar 22 15:53 nexus-maven-repository-index.725.gz
-rw-r--r-- 1 1000 1000 6706671 Mar 22 15:53 nexus-maven-repository-index.726.gz
-rw-r--r-- 1 1000 1000 8135404 Mar 22 15:53 nexus-maven-repository-index.727.gz
-rw-r--r-- 1 1000 1000 10113194 Mar 22 15:53 nexus-maven-repository-index.728.gz
-rw-r--r-- 1 1000 1000 9004362 Mar 22 15:53 nexus-maven-repository-index.729.gz
-rw-r--r-- 1 1000 1000 8548614 Mar 22 15:53 nexus-maven-repository-index.730.gz
-rw-r--r-- 1 1000 1000 6347214 Mar 22 15:53 nexus-maven-repository-index.731.gz
-rw-r--r-- 1 1000 1000 6820245 Mar 22 15:52 nexus-maven-repository-index.732.gz
-rw-r--r-- 1 1000 1000 12821159 Mar 22 15:52 nexus-maven-repository-index.733.gz
-rw-r--r-- 1 1000 1000 7003185 Mar 22 15:52 nexus-maven-repository-index.734.gz
-rw-r--r-- 1 1000 1000 2413908 Mar 22 15:52 nexus-maven-repository-index.735.gz
-rw-r--r-- 1 1000 1000 6380653 Mar 22 15:52 nexus-maven-repository-index.736.gz
-rw-r--r-- 1 1000 1000 14646697 Mar 22 15:52 nexus-maven-repository-index.737.gz
-rw-r--r-- 1 1000 1000 13275279 Mar 22 15:52 nexus-maven-repository-index.738.gz
-rw-r--r-- 1 1000 1000 2210698 Mar 22 15:52 nexus-maven-repository-index.739.gz
-rw-r--r-- 1 1000 1000 14045180 Mar 22 15:52 nexus-maven-repository-index.740.gz
-rw-r--r-- 1 1000 1000 7083099 Mar 22 15:52 nexus-maven-repository-index.741.gz
-rw-r--r-- 1 1000 1000 11225591 Mar 22 15:52 nexus-maven-repository-index.742.gz
-rw-r--r-- 1 1000 1000 1419693 Mar 22 15:52 nexus-maven-repository-index.743.gz
-rw-r--r-- 1 1000 1000 4036562 Mar 22 15:52 nexus-maven-repository-index.744.gz
-rw-r--r-- 1 1000 1000 5782895 Mar 22 15:52 nexus-maven-repository-index.745.gz
-rw-r--r-- 1 1000 1000 7869621 Mar 22 15:52 nexus-maven-repository-index.746.gz
-rw-r--r-- 1 1000 1000 6477544 Mar 22 15:52 nexus-maven-repository-index.747.gz
-rw-r--r-- 1 1000 1000 6774157 Mar 22 15:52 nexus-maven-repository-index.748.gz
-rw-r--r-- 1 1000 1000 9752927 Mar 22 15:52 nexus-maven-repository-index.749.gz
-rw-r--r-- 1 1000 1000 11490402 Mar 22 15:52 nexus-maven-repository-index.750.gz
-rw-r--r-- 1 1000 1000 10019047 Mar 22 15:52 nexus-maven-repository-index.751.gz
-rw-r--r-- 1 1000 1000 1483183512 Mar 22 15:55 nexus-maven-repository-index.gz
-rw-r--r-- 1 1000 1000 1130 Mar 22 17:00 nexus-maven-repository-index.properties
drwxr-xr-x 2 1000 1000 4096 Mar 22 16:52 publish
* Will read files from [/work/nexus-maven-repository-index.gz].
* Found file [/work/nexus-maven-repository-index.gz].
* Found indexer [/opt/indexer-cli-6.0.0.jar].
* Found clue [/opt/clue-6.2.0-1.0.0.jar].
* Java version:.
openjdk version "11.0.14.1" 2022-02-08
OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)
#############################
Found /work/indexes, skipping index generation.
6.6G /work/indexes/
Unpacking finished on 2022-03-22 17:00:52.
#############################
Found /work/export, skipping index export.
total 17G
-rwxrwxrwx 1 root root 16.7G Mar 22 16:18 _n.fld
-rwxrwxrwx 1 root root 0 Mar 22 16:11 write.lock
Exporting finished on 2022-03-22 17:00:52.
#############################
Cleaning useless files.
Size before cleaning:
16.7G /work/export
6.6G /work/indexes
4.9M /work/nexus-maven-repository-index.722.gz
1.8M /work/nexus-maven-repository-index.723.gz
3.7M /work/nexus-maven-repository-index.724.gz
6.6M /work/nexus-maven-repository-index.725.gz
6.4M /work/nexus-maven-repository-index.726.gz
7.8M /work/nexus-maven-repository-index.727.gz
9.6M /work/nexus-maven-repository-index.728.gz
8.6M /work/nexus-maven-repository-index.729.gz
8.2M /work/nexus-maven-repository-index.730.gz
6.1M /work/nexus-maven-repository-index.731.gz
6.5M /work/nexus-maven-repository-index.732.gz
12.2M /work/nexus-maven-repository-index.733.gz
6.7M /work/nexus-maven-repository-index.734.gz
2.3M /work/nexus-maven-repository-index.735.gz
6.1M /work/nexus-maven-repository-index.736.gz
14.0M /work/nexus-maven-repository-index.737.gz
12.7M /work/nexus-maven-repository-index.738.gz
2.1M /work/nexus-maven-repository-index.739.gz
13.4M /work/nexus-maven-repository-index.740.gz
6.8M /work/nexus-maven-repository-index.741.gz
10.7M /work/nexus-maven-repository-index.742.gz
1.4M /work/nexus-maven-repository-index.743.gz
3.9M /work/nexus-maven-repository-index.744.gz
5.5M /work/nexus-maven-repository-index.745.gz
7.5M /work/nexus-maven-repository-index.746.gz
6.2M /work/nexus-maven-repository-index.747.gz
6.5M /work/nexus-maven-repository-index.748.gz
9.3M /work/nexus-maven-repository-index.749.gz
11.0M /work/nexus-maven-repository-index.750.gz
9.6M /work/nexus-maven-repository-index.751.gz
1.4G /work/nexus-maven-repository-index.gz
4.0K /work/nexus-maven-repository-index.properties
16.7G /work/publish
* Removing useless exports.
Keeping only fld text extract.
Size after cleaning:
16.7G /work/export
6.6G /work/indexes
4.9M /work/nexus-maven-repository-index.722.gz
1.8M /work/nexus-maven-repository-index.723.gz
3.7M /work/nexus-maven-repository-index.724.gz
6.6M /work/nexus-maven-repository-index.725.gz
6.4M /work/nexus-maven-repository-index.726.gz
7.8M /work/nexus-maven-repository-index.727.gz
9.6M /work/nexus-maven-repository-index.728.gz
8.6M /work/nexus-maven-repository-index.729.gz
8.2M /work/nexus-maven-repository-index.730.gz
6.1M /work/nexus-maven-repository-index.731.gz
6.5M /work/nexus-maven-repository-index.732.gz
12.2M /work/nexus-maven-repository-index.733.gz
6.7M /work/nexus-maven-repository-index.734.gz
2.3M /work/nexus-maven-repository-index.735.gz
6.1M /work/nexus-maven-repository-index.736.gz
14.0M /work/nexus-maven-repository-index.737.gz
12.7M /work/nexus-maven-repository-index.738.gz
2.1M /work/nexus-maven-repository-index.739.gz
13.4M /work/nexus-maven-repository-index.740.gz
6.8M /work/nexus-maven-repository-index.741.gz
10.7M /work/nexus-maven-repository-index.742.gz
1.4M /work/nexus-maven-repository-index.743.gz
3.9M /work/nexus-maven-repository-index.744.gz
5.5M /work/nexus-maven-repository-index.745.gz
7.5M /work/nexus-maven-repository-index.746.gz
6.2M /work/nexus-maven-repository-index.747.gz
6.5M /work/nexus-maven-repository-index.748.gz
9.3M /work/nexus-maven-repository-index.749.gz
11.0M /work/nexus-maven-repository-index.750.gz
9.6M /work/nexus-maven-repository-index.751.gz
1.4G /work/nexus-maven-repository-index.gz
4.0K /work/nexus-maven-repository-index.properties
16.7G /work/publish
* Make files modifiable by the end-user.
Docker Script execution finished on 2022-03-22 17:00:52.
INFO:__main__:Export directory has the following files:
INFO:__main__: - write.lock size 0
INFO:__main__: - _n.fld size 17982862850
INFO:__main__:Found fld file: _n.fld
INFO:__main__:Copying files to /tmp/maven-index-exporter/publish/export.fld.
INFO:__main__:Script finished on 2022-03-22 18:01:12
# at the end of it all, the export.fld file exists with the massaged data
$ head -20 /tmp/maven-index-exporter/publish/export.fld
doc 0
field 0
name u
type string
value com.redhat.rhevm.api|rhevm-api-powershell-jaxrs|1.0-rc1.16|javadoc|jar
field 1
name m
type string
value 1321264789727
field 2
name i
type string
value jar|1320743675000|768291|2|2|1|jar
field 10
name n
type string
value RHEV-M API Powershell Wrapper Implementation JAX-RS
field 13
name 1
type string
...
Migrated from D7412 (view on Phabricator)