Skip to content

Adapt run_full_export according to swh cli conventions

This adapts the existing script to:

  • use click which autodocuments the cli
  • add default values for less important parameters
  • switch to logging instead of print statements
  • allows to provide another image (default to bbaldassari/maven-index-exporter).

This also adapts the documentation about the script accordingly.

Related to swh/infra/sysadm-environment#3746 (closed)

Test Plan

  • scripts/test_docker_image.sh is happy
  • actual run of the script is happy too:
$ cd docker/
# build the image
$ docker build -f Dockerfile -t $USER/maven-index-exporter .
Sending build context to Docker daemon  23.55kB
Step 1/8 : FROM adoptopenjdk/openjdk11:alpine-jre
 ---> b9a979a572aa
Step 2/8 : ADD https://github.com/javasoze/clue/releases/download/release-6.2.0-1.0.0/clue-6.2.0-1.0.0.jar /opt/
Downloading [==================================================>]     18MB/18MB

 ---> Using cache
 ---> 9e3136d449b6
Step 3/8 : ADD https://repo1.maven.org/maven2/org/apache/maven/indexer/indexer-cli/6.0.0/indexer-cli-6.0.0.jar /opt/
Downloading [==================================================>]  14.91MB/14.91MB

 ---> Using cache
 ---> 5d0e575fb7bd
Step 4/8 : COPY extract_indexes.sh /opt/
 ---> Using cache
 ---> 777ca2fa6853
Step 5/8 : WORKDIR /work/
 ---> Using cache
 ---> 8e291c569bd1
Step 6/8 : RUN ls /opt/
 ---> Using cache
 ---> ef435da9603e
Step 7/8 : RUN ls -R /work/
 ---> Using cache
 ---> 5146a6df8a47
Step 8/8 : CMD ["sh", "/opt/extract_indexes.sh", "/work/nexus-maven-repository-index.gz"]
 ---> Using cache
 ---> 40af3ac1add7
Successfully built 40af3ac1add7
Successfully tagged tony/maven-index-exporter:latest
$ cd ../scripts
$ python3 run_full_export.py --base-url https://repo.maven.apache.org/maven2/ --docker-image $USER/maven-index-exporter
INFO:__main__:Script: run_full_export
INFO:__main__:Timestamp: 2022-03-22 18:00:52
INFO:__main__:* URL: https://repo.maven.apache.org/maven2/
INFO:__main__:* Working directory: /tmp/maven-index-exporter/
INFO:__main__:* Publish directory: /tmp/maven-index-exporter/publish/
INFO:__main__:Work_Dir /tmp/maven-index-exporter/ exists. Reusing it.
INFO:__main__:Downloading all required indexes
INFO:__main__:  - Downloading /tmp/maven-index-exporter/nexus-maven-repository-index.properties.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.732.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.733.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.734.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.735.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.736.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.737.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.738.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.742.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.739.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.743.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.740.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.744.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.741.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.745.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.746.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.747.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.748.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.749.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.750.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.751.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.722.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.723.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.724.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.725.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.726.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.727.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.728.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.729.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.730.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.731.gz exists, skipping download.
INFO:__main__:  - File /tmp/maven-index-exporter/nexus-maven-repository-index.gz exists, skipping download.
INFO:__main__:Docker: Found image <Image: 'maven-index-exporter:latest', 'tony/maven-index-exporter:latest'> locally, ID is sha256:40af3ac1add75d24738839ccc1971f338192261927cc287460eb6892a8812092.
INFO:__main__:Docker log:
Docker Script started on 2022-03-22 17:00:52.
# Checks..
* Content of /opt:
total 30469
-rw-------    1 root     root      18000742 Dec  8 07:19 clue-6.2.0-1.0.0.jar
-rw-r--r--    1 root     root          2830 Feb 17 13:56 extract_indexes.sh
-rw-------    1 root     root      14914610 Nov 28  2017 indexer-cli-6.0.0.jar
drwxr-xr-x    3 root     root             3 Mar 20 19:38 java
* Content of /work:
total 1671296
drwxrwxrwx    2 root     root          4096 Mar 22 16:33 export
drwxrwxrwx    2 root     root         12288 Mar 22 16:11 indexes
-rw-r--r--    1 1000     1000       5097757 Mar 22 15:52 nexus-maven-repository-index.722.gz
-rw-r--r--    1 1000     1000       1836882 Mar 22 15:52 nexus-maven-repository-index.723.gz
-rw-r--r--    1 1000     1000       3870944 Mar 22 15:53 nexus-maven-repository-index.724.gz
-rw-r--r--    1 1000     1000       6913907 Mar 22 15:53 nexus-maven-repository-index.725.gz
-rw-r--r--    1 1000     1000       6706671 Mar 22 15:53 nexus-maven-repository-index.726.gz
-rw-r--r--    1 1000     1000       8135404 Mar 22 15:53 nexus-maven-repository-index.727.gz
-rw-r--r--    1 1000     1000      10113194 Mar 22 15:53 nexus-maven-repository-index.728.gz
-rw-r--r--    1 1000     1000       9004362 Mar 22 15:53 nexus-maven-repository-index.729.gz
-rw-r--r--    1 1000     1000       8548614 Mar 22 15:53 nexus-maven-repository-index.730.gz
-rw-r--r--    1 1000     1000       6347214 Mar 22 15:53 nexus-maven-repository-index.731.gz
-rw-r--r--    1 1000     1000       6820245 Mar 22 15:52 nexus-maven-repository-index.732.gz
-rw-r--r--    1 1000     1000      12821159 Mar 22 15:52 nexus-maven-repository-index.733.gz
-rw-r--r--    1 1000     1000       7003185 Mar 22 15:52 nexus-maven-repository-index.734.gz
-rw-r--r--    1 1000     1000       2413908 Mar 22 15:52 nexus-maven-repository-index.735.gz
-rw-r--r--    1 1000     1000       6380653 Mar 22 15:52 nexus-maven-repository-index.736.gz
-rw-r--r--    1 1000     1000      14646697 Mar 22 15:52 nexus-maven-repository-index.737.gz
-rw-r--r--    1 1000     1000      13275279 Mar 22 15:52 nexus-maven-repository-index.738.gz
-rw-r--r--    1 1000     1000       2210698 Mar 22 15:52 nexus-maven-repository-index.739.gz
-rw-r--r--    1 1000     1000      14045180 Mar 22 15:52 nexus-maven-repository-index.740.gz
-rw-r--r--    1 1000     1000       7083099 Mar 22 15:52 nexus-maven-repository-index.741.gz
-rw-r--r--    1 1000     1000      11225591 Mar 22 15:52 nexus-maven-repository-index.742.gz
-rw-r--r--    1 1000     1000       1419693 Mar 22 15:52 nexus-maven-repository-index.743.gz
-rw-r--r--    1 1000     1000       4036562 Mar 22 15:52 nexus-maven-repository-index.744.gz
-rw-r--r--    1 1000     1000       5782895 Mar 22 15:52 nexus-maven-repository-index.745.gz
-rw-r--r--    1 1000     1000       7869621 Mar 22 15:52 nexus-maven-repository-index.746.gz
-rw-r--r--    1 1000     1000       6477544 Mar 22 15:52 nexus-maven-repository-index.747.gz
-rw-r--r--    1 1000     1000       6774157 Mar 22 15:52 nexus-maven-repository-index.748.gz
-rw-r--r--    1 1000     1000       9752927 Mar 22 15:52 nexus-maven-repository-index.749.gz
-rw-r--r--    1 1000     1000      11490402 Mar 22 15:52 nexus-maven-repository-index.750.gz
-rw-r--r--    1 1000     1000      10019047 Mar 22 15:52 nexus-maven-repository-index.751.gz
-rw-r--r--    1 1000     1000     1483183512 Mar 22 15:55 nexus-maven-repository-index.gz
-rw-r--r--    1 1000     1000          1130 Mar 22 17:00 nexus-maven-repository-index.properties
drwxr-xr-x    2 1000     1000          4096 Mar 22 16:52 publish
* Will read files from [/work/nexus-maven-repository-index.gz].
*   Found file [/work/nexus-maven-repository-index.gz].
*   Found indexer [/opt/indexer-cli-6.0.0.jar].
*   Found clue [/opt/clue-6.2.0-1.0.0.jar].
* Java version:.
openjdk version "11.0.14.1" 2022-02-08
OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)
#############################
Found /work/indexes, skipping index generation.
6.6G    /work/indexes/
Unpacking finished on 2022-03-22 17:00:52.
#############################
Found /work/export, skipping index export.
total 17G
-rwxrwxrwx    1 root     root       16.7G Mar 22 16:18 _n.fld
-rwxrwxrwx    1 root     root           0 Mar 22 16:11 write.lock
Exporting finished on 2022-03-22 17:00:52.
#############################
Cleaning useless files.
Size before cleaning:
16.7G   /work/export
6.6G    /work/indexes
4.9M    /work/nexus-maven-repository-index.722.gz
1.8M    /work/nexus-maven-repository-index.723.gz
3.7M    /work/nexus-maven-repository-index.724.gz
6.6M    /work/nexus-maven-repository-index.725.gz
6.4M    /work/nexus-maven-repository-index.726.gz
7.8M    /work/nexus-maven-repository-index.727.gz
9.6M    /work/nexus-maven-repository-index.728.gz
8.6M    /work/nexus-maven-repository-index.729.gz
8.2M    /work/nexus-maven-repository-index.730.gz
6.1M    /work/nexus-maven-repository-index.731.gz
6.5M    /work/nexus-maven-repository-index.732.gz
12.2M   /work/nexus-maven-repository-index.733.gz
6.7M    /work/nexus-maven-repository-index.734.gz
2.3M    /work/nexus-maven-repository-index.735.gz
6.1M    /work/nexus-maven-repository-index.736.gz
14.0M   /work/nexus-maven-repository-index.737.gz
12.7M   /work/nexus-maven-repository-index.738.gz
2.1M    /work/nexus-maven-repository-index.739.gz
13.4M   /work/nexus-maven-repository-index.740.gz
6.8M    /work/nexus-maven-repository-index.741.gz
10.7M   /work/nexus-maven-repository-index.742.gz
1.4M    /work/nexus-maven-repository-index.743.gz
3.9M    /work/nexus-maven-repository-index.744.gz
5.5M    /work/nexus-maven-repository-index.745.gz
7.5M    /work/nexus-maven-repository-index.746.gz
6.2M    /work/nexus-maven-repository-index.747.gz
6.5M    /work/nexus-maven-repository-index.748.gz
9.3M    /work/nexus-maven-repository-index.749.gz
11.0M   /work/nexus-maven-repository-index.750.gz
9.6M    /work/nexus-maven-repository-index.751.gz
1.4G    /work/nexus-maven-repository-index.gz
4.0K    /work/nexus-maven-repository-index.properties
16.7G   /work/publish
* Removing useless exports.
  Keeping only fld text extract.
  Size after cleaning:
16.7G   /work/export
6.6G    /work/indexes
4.9M    /work/nexus-maven-repository-index.722.gz
1.8M    /work/nexus-maven-repository-index.723.gz
3.7M    /work/nexus-maven-repository-index.724.gz
6.6M    /work/nexus-maven-repository-index.725.gz
6.4M    /work/nexus-maven-repository-index.726.gz
7.8M    /work/nexus-maven-repository-index.727.gz
9.6M    /work/nexus-maven-repository-index.728.gz
8.6M    /work/nexus-maven-repository-index.729.gz
8.2M    /work/nexus-maven-repository-index.730.gz
6.1M    /work/nexus-maven-repository-index.731.gz
6.5M    /work/nexus-maven-repository-index.732.gz
12.2M   /work/nexus-maven-repository-index.733.gz
6.7M    /work/nexus-maven-repository-index.734.gz
2.3M    /work/nexus-maven-repository-index.735.gz
6.1M    /work/nexus-maven-repository-index.736.gz
14.0M   /work/nexus-maven-repository-index.737.gz
12.7M   /work/nexus-maven-repository-index.738.gz
2.1M    /work/nexus-maven-repository-index.739.gz
13.4M   /work/nexus-maven-repository-index.740.gz
6.8M    /work/nexus-maven-repository-index.741.gz
10.7M   /work/nexus-maven-repository-index.742.gz
1.4M    /work/nexus-maven-repository-index.743.gz
3.9M    /work/nexus-maven-repository-index.744.gz
5.5M    /work/nexus-maven-repository-index.745.gz
7.5M    /work/nexus-maven-repository-index.746.gz
6.2M    /work/nexus-maven-repository-index.747.gz
6.5M    /work/nexus-maven-repository-index.748.gz
9.3M    /work/nexus-maven-repository-index.749.gz
11.0M   /work/nexus-maven-repository-index.750.gz
9.6M    /work/nexus-maven-repository-index.751.gz
1.4G    /work/nexus-maven-repository-index.gz
4.0K    /work/nexus-maven-repository-index.properties
16.7G   /work/publish
* Make files modifiable by the end-user.
Docker Script execution finished on 2022-03-22 17:00:52.

INFO:__main__:Export directory has the following files:
INFO:__main__:  - write.lock size 0
INFO:__main__:  - _n.fld size 17982862850
INFO:__main__:Found fld file: _n.fld
INFO:__main__:Copying files to /tmp/maven-index-exporter/publish/export.fld.
INFO:__main__:Script finished on 2022-03-22 18:01:12

# at the end of it all, the export.fld file exists with the massaged data
$ head -20 /tmp/maven-index-exporter/publish/export.fld
doc 0
  field 0
    name u
    type string
    value com.redhat.rhevm.api|rhevm-api-powershell-jaxrs|1.0-rc1.16|javadoc|jar
  field 1
    name m
    type string
    value 1321264789727
  field 2
    name i
    type string
    value jar|1320743675000|768291|2|2|1|jar
  field 10
    name n
    type string
    value RHEV-M API Powershell Wrapper Implementation JAX-RS
  field 13
    name 1
    type string
...

Migrated from D7412 (view on Phabricator)

Merge request reports