Commits · debian/0.5.1-1_swh1_bpo10+1 · Jérémy Bobbio (Lunar) / swh-loader-cvs

Oct 17, 2022

Updated backport on buster-swh from debian/0.5.1-1_swh1 (unstable-swh) · 6e272157
Jenkins for Software Heritage authored 2 years ago

debian/0.5.1-1_swh1_bpo10+1

6e272157
Merge tag 'debian/0.5.1-1_swh1' into debian/buster-swh · a17b458d
Jenkins for Software Heritage authored 2 years ago

a17b458d
Updated debian changelog for version 0.5.1 · 70f98d2f
Jenkins for Software Heritage authored 2 years ago

debian/0.5.1-1_swh1

70f98d2f
Update upstream source from tag 'debian/upstream/0.5.1' · ec55779c
Jenkins for Software Heritage authored 2 years ago
```
Update to upstream version '0.5.1'
with Debian dir fba11b26a65fa5caefd6aa21e85f19e12c2e798a
```
ec55779c
New upstream version 0.5.1 · 3be7939d
Jenkins for Software Heritage authored 2 years ago

debian/upstream/0.5.1

3be7939d
test_cvsclient: Mock subprocess · 3bf543ba
Antoine Lambert authored 2 years ago
```
This fixes debian package builds.
```
v0.5.1

3bf543ba
Updated debian changelog for version 0.5.0 · b58dfe6e
Jenkins for Software Heritage authored 2 years ago

debian/0.5.0-1_swh1

b58dfe6e
Update upstream source from tag 'debian/upstream/0.5.0' · 36309788
Jenkins for Software Heritage authored 2 years ago
```
Update to upstream version '0.5.0'
with Debian dir 126ab15041de3c0fe8c1af8d037d69f8bc060353
```
36309788
New upstream version 0.5.0 · d3520d31
Jenkins for Software Heritage authored 2 years ago

debian/upstream/0.5.0

d3520d31
debian/control: Bump python3-swh.model · 4c72bb7e
Antoine Lambert authored 2 years ago

4c72bb7e

loader: Yield only modified objects in process_cvs_changesets · c23d4250

Antoine Lambert authored 2 years ago

Previously, after each revision replay all files and directories of the
CVS repository being loaded were collected and sent to the storage.
This is a real bottleneck in terms of loading performances as it delegates
the filtering of new objects to archive to the storage filtering proxy.

As we known exactly the set of paths that have been modified in a CVS
revision, prefer to do that filtering on the loader side and only
send modified objects to storage instead of the whole set of contents
and directories from the reconstructed filesystem.

This should greatly improve loading performance for large repositories
but also reduce loader memory consumption.

c23d4250

loader: Reconstruct repo filesystem incrementally at each revision · b976aa6a

Antoine Lambert authored 2 years ago

Instead of creating a from_disk.Directory instance after each replayed
CVS revision by recursively scanning all directories of the repository,
prefer to have a single one as class member kept synchronized with the
recontructed filesystem after each revision replay.

This should improve loader in terms of performance, especially when
delaing with large repositories.

b976aa6a

rlog: Skip rlog entry with missing header in RlogConv.parse_rlog · 734207ba

Antoine Lambert authored 2 years ago

CVS rlog for a given module sent by server is a concatenation of
rlog entries. Each entry has a header containing the path to a
RCS file plus other info.

It exist cases where a rlog entry header is empty which makes the
rlog parsing fail.

So instead of stopping rlog parsing by raising an exception, prefer
to skip that entry and process the next one.

Closes T4629

734207ba

Oct 14, 2022

loader, cvsclient: Read files line by line to reduce memory consumption · cfe7507a

Antoine Lambert authored 2 years ago

Instead of using the readlines method on file objects that retrieve all
lines of a file and store them in memory, prefer to read files line
by line by using the lazy generator of lines from file objects.

This significantly reduce loader memory consumption when processing
a large rlog output stored in a file.

cfe7507a

Oct 13, 2022

loader: Raise NotFound for missing CVS module when using pserver or ssh · 965c3de4
Antoine Lambert authored 2 years ago
```
That case was handled when using rsync protocol but not when using pserver
or ssh protocol.

Closes T4631
```
965c3de4

cvsclient: Handle error in fetch_rlog when path does not exist · 356dfa27

Antoine Lambert authored 2 years ago

When attempting to fetch the rlog for a path that does not exist in
the repository, the CVS server will respond with the following lines:

E cvs rlog: could not read RCS file for <path>
ok

That error case was not handled in fetch_rlog so ensure it returns None
when encountering it.

The issue was spotted when the loader attempts to fetch more rlog data from
Attic directories. The paths of these Attic directories are computed from
those of the files in the repositories but it exist cases where those
directories do not exist.

356dfa27

Sep 19, 2022
- Updated backport on buster-swh from debian/0.4.1-1_swh1 (unstable-swh) · 13105406
  Jenkins for Software Heritage authored 2 years ago
  
  debian/0.4.1-1_swh1_bpo10+1
  
  13105406
- Merge tag 'debian/0.4.1-1_swh1' into debian/buster-swh · 8d654606
  Jenkins for Software Heritage authored 2 years ago
  
  8d654606
- Updated debian changelog for version 0.4.1 · e7b19a80
  Jenkins for Software Heritage authored 2 years ago
  
  debian/0.4.1-1_swh1
  
  e7b19a80
- Update upstream source from tag 'debian/upstream/0.4.1' · 46f1666c
  Jenkins for Software Heritage authored 2 years ago
```
Update to upstream version '0.4.1'
with Debian dir 5179ae3f511f7227d4200d0f28b138ea25210440
```
  46f1666c
- New upstream version 0.4.1 · 4357596f
  Jenkins for Software Heritage authored 2 years ago
  
  debian/upstream/0.4.1
  
  4357596f
Sep 15, 2022
- setup.py: Fix debian unstable package build · 83fae9c7
  Antoine Lambert authored 2 years ago
```
Since Python 3.10, support for PyArg_ParseTuple() # formats
requires PY_SSIZE_T_CLEAN macro to be defined.
```
  v0.4.1
  
  83fae9c7
- Updated debian changelog for version 0.4.0 · 3d042ef8
  Jenkins for Software Heritage authored 2 years ago
  
  debian/0.4.0-1_swh1
  
  3d042ef8
- Update upstream source from tag 'debian/upstream/0.4.0' · 5064e50d
  Jenkins for Software Heritage authored 2 years ago
```
Update to upstream version '0.4.0'
with Debian dir 3a57a357af5c42f6cafbb304e6ecab6132ccfd05
```
  5064e50d
- New upstream version 0.4.0 · e8736888
  Jenkins for Software Heritage authored 2 years ago
  
  debian/upstream/0.4.0
  
  e8736888
Jul 11, 2022

rlog, cvs2gitdump: Fix handling of revision number greater than one · 386a68f1

Antoine Lambert authored 2 years ago

It exists CVS respositories where revision numbers greather than 1.x
are used to version files.

Previous loader implementation was raising an error when encountering
such kind of revision so ensure it will be processed as the other ones.

Also fix tag names extraction from rlog output.

Related to T4043

386a68f1

Jul 08, 2022
- loader: Use utf-8 instead of ascii to decode rsync output · 5a79a325
  Antoine Lambert authored 2 years ago
```
It exists cases where rsync output will not be ascii decodable so prefer
to use utf-8 instead.
```
  5a79a325
Jul 07, 2022
- loader: Ensure to strip trailing slash from origin URL · e5215cf1
  Antoine Lambert authored 2 years ago
```
It makes the loading process fail otherwise.
```
  e5215cf1
Jul 06, 2022

Fix loading of CVS repositories with non valid UTF-8 paths · d89f8d13

Antoine Lambert authored 2 years ago

Some CVS repositories have paths which are non valid UTF-8 (typically
ISO-8859-1 ones) but the loader implementation assumed all paths can
be safely encoded to UTF-8 and was raising UnicodeEncodeError when
attempting to encode non UTF-8 paths.

That commit modifies the way CVS paths are handled by the loader by
using their raw bytes representation instead of their UTF-8 decoded
string representation.

Also rcsparse.rcsfile constructor has been modified to take bytes path
as argument instead of an unicode one in order to be able to successfully
open non UTF-8 paths.

Such CVS repositories can now be successfully loaded, either using rsync
or pserver protocol.

Related to T3980

d89f8d13

Jun 17, 2022

cvsclient: Retry pserver connection three times in case of failure · b35a9769

Antoine Lambert authored 2 years ago

Connection to an existing pserver might sometimes fail (on SourceForge
for instance), retrying the operation usally fixes the issue.

b35a9769

cvsclient: Fix pserver error: "protocol error: <path> is not absolute" · 089e2fd0

Antoine Lambert authored 2 years ago

Some CVS servers (SourceForge and OSDN for instance) return an error if
the path sent with the "Directory" pserver request is not absolute.

So fix that issue to ensure loading of such CVS repositories.

089e2fd0

cvsclient: Allow to connect to a pserver URL without password · e382aeb0

Antoine Lambert authored 2 years ago

The CVS client was raising an error when trying to connect to such pserver
URL: pserver://anonymous@cvs.example.org/cvsroot/project/module

But numerous CVS pserver URLs that can be found in the wild (notably on
SourceForge and OSDN) are in that form.

So add support for such URL form in the CVS client.

Also remove use of external dependency urllib3.util.parse_url and prefer
to use urllib.parse.urlparse from standard Python library.

e382aeb0

May 20, 2022
- cvs2gitdump: Fix local variable 'expkw' referenced before assignment · d52686be
  Antoine Lambert authored 2 years ago
  
  d52686be
May 11, 2022
- Updated backport on buster-swh from debian/0.3.0-1_swh1 (unstable-swh) · 6e73cb4e
  Jenkins for Software Heritage authored 2 years ago
  
  debian/0.3.0-1_swh1_bpo10+1
  
  6e73cb4e
- Merge tag 'debian/0.3.0-1_swh1' into debian/buster-swh · 3c0dacd9
  Jenkins for Software Heritage authored 2 years ago
  
  3c0dacd9
- Updated debian changelog for version 0.3.0 · 96be6aa5
  Jenkins for Software Heritage authored 2 years ago
  
  debian/0.3.0-1_swh1
  
  96be6aa5
- Update upstream source from tag 'debian/upstream/0.3.0' · 3e0e3a8b
  Jenkins for Software Heritage authored 2 years ago
```
Update to upstream version '0.3.0'
with Debian dir a838aee4a15705419ac1e8872dad306b4c52120d
```
  3e0e3a8b
- New upstream version 0.3.0 · 29fe5130
  Jenkins for Software Heritage authored 2 years ago
  
  debian/upstream/0.3.0
  
  29fe5130
May 10, 2022
- cvs.loader: Decrease log level verbosity to debug · 3909cb79
  Antoine R. Dumont authored 2 years ago
```
A change in the base loader will allow to increase it punctually if needed in debugging
mode.
```
  v0.3.0
  
  3909cb79
May 09, 2022
- add strict asyncio_mode in pytest.ini · 5b391527
  Pratyush authored 2 years ago
  
  5b391527