tar.loader: Make the loader-tar able to download remote artifact
Also:
- Keep the legacy behavior (use the revision and snapshot's branch provided by the caller, e.g. the deposit loader depends on this)
- Increase code coverage (50 -> 93%)
- Remove dir loader inheritance (nightmare to maintain). We still need the dir loader for some functions but the main issue is resolved).
- Remove deprecated producer code (we use the scheduler now)
Related T1351 Related T1431 Depends on !9 (closed)
Use sample:
>>> url = 'https://ftp.gnu.org/gnu/8sync/8sync-0.1.0.tar.gz'
>>> origin = {'url': url, 'type': 'tar'}
>>> visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
>>> last_modified = '2016-04-22 16:35'
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>>
>>> from swh.loader.tar.tasks import LoadTarRepository
>>> l = LoadTarRepository()
>>> l.run_task(origin=origin, visit_date=visit_date,
... last_modified=last_modified)
DEBUG:swh.scheduler.task.LoadTarRepository:Creating tar origin for https://ftp.gnu.org/gnu/8sync/8sync-0.1.0.tar.gz
DEBUG:swh.scheduler.task.LoadTarRepository:Done creating tar origin for https://ftp.gnu.org/gnu/8sync/8sync-0.1.0.tar.gz
DEBUG:swh.scheduler.task.LoadTarRepository:Creating origin_visit for origin 3 at time Tue, 3 May 2017 17:16:32 +0200
DEBUG:swh.scheduler.task.LoadTarRepository:Done Creating origin_visit for origin 3 at time Tue, 3 May 2017 17:16:32 +0200
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): ftp.gnu.org:443
DEBUG:urllib3.connectionpool:https://ftp.gnu.org:443 "GET /gnu/8sync/8sync-0.1.0.tar.gz HTTP/1.1" 200 221837
INFO:swh.scheduler.task.LoadTarRepository:Uncompress /tmp/swh.loader.tar._nh29j5e-18075/8sync-0.1.0.tar.gz to /tmp/swh.loader.tar._nh29j5e-18075/swh.loader.tar-7wox5_rx
DEBUG:swh.scheduler.task.LoadTarRepository:Sending 28 contents
DEBUG:swh.scheduler.task.LoadTarRepository:Done sending 28 contents
DEBUG:swh.scheduler.task.LoadTarRepository:Sending 7 directories
DEBUG:swh.scheduler.task.LoadTarRepository:Done sending 7 directories
DEBUG:swh.scheduler.task.LoadTarRepository:Sending 1 revisions
DEBUG:swh.scheduler.task.LoadTarRepository:Done sending 1 revisions
DEBUG:swh.scheduler.task.LoadTarRepository:Updating origin_visit for origin 3 with status full
DEBUG:swh.scheduler.task.LoadTarRepository:Done updating origin_visit for origin 3 with status full
DEBUG:swh.scheduler.task.LoadTarRepository:Clean up /tmp/swh.loader.tar._nh29j5e-18075
{'status': 'eventful'}
Test Plan
tox
Migrated from D795 (view on Phabricator)
Merge request reports
Activity
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/49/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/49/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/50/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/50/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/51/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/51/console
147 def cleanup(self): 148 """Clean up temporary disk folders used. 59 149 60 150 """ 61 # Shortcut super() as we use different arguments than the DirLoader. 62 return BufferedLoader.load(self, tar_path=tar_path, origin=origin, 63 visit_date=visit_date, revision=revision, 64 branch_name=branch_name) 151 if self.debug: 152 self.log.warn('%s Will not clean up temp dir %s' % ( 153 DEBUG_MODE, self.temp_directory 154 )) 155 return 156 if os.path.exists(self.temp_directory): 157 self.log.debug('Clean up %s' % self.temp_directory) 158 shutil.rmtree(self.temp_directory) Build is green See https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/53/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/54/ for more details.
Some references in the commit message have been migrated:
- T1431 is now swh/meta#1431 (closed)
Some references in the commit message have been migrated:
- T1351 is now swh/meta#1351 (closed)
- T1431 is now swh/meta#1431 (closed)
48 'microseconds': 830834 68 49 }) 69 50 70 mock_os.lstat.assert_called_once_with('some/path') 71 72 @patch('swh.loader.tar.build.os') 73 def test_time_from_path_with_int(self, mock_os): 74 class MockStat: 75 st_mtime = 1445348286 76 77 mock_os.lstat.return_value = MockStat() 78 79 actual_time = build._time_from_path('some/path') 51 def test_time_from_last_modified_with_int(self): 52 actual_time = build._time_from_last_modified( 53 '2015-10-20T13:38:06+00:00') 56 41 57 @patch('swh.loader.tar.build.os') 58 def test_time_from_path_with_float(self, mock_os): 59 class MockStat: 60 st_mtime = 1445348286.8308342 61 mock_os.lstat.return_value = MockStat() 62 63 actual_time = build._time_from_path('some/path') 42 def test_time_from_last_modified_with_float(self): 43 actual_time = build._time_from_last_modified( 44 '2015-10-20T13:38:06.830834+00:00') 64 45 65 46 self.assertEqual(actual_time, { 66 47 'seconds': 1445348286, 67 'microseconds': 8308342 48 'microseconds': 830834 Build is green See https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/55/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/56/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/57/ for more details.
154 )) 155 return 156 if os.path.exists(self.temp_directory): 157 self.log.debug('Clean up %s' % self.temp_directory) 158 shutil.rmtree(self.temp_directory) 65 159 66 160 def prepare_origin_visit(self, *, origin, visit_date=None, **kwargs): 161 """Prepare the origin visit information. 162 163 Args: 164 origin (dict): Dict with keys {url, type} 165 visit_date (str): Date representing the date of the 166 visit. None by default will make it the current time 167 during the loading process. 168 169 """ 124 @click.option('--archive-path', required=1, help='Archive path to load') 125 @click.option('--origin-url', required=1, help='Origin url to associate') 126 @click.option('--visit-date', default=None, 127 help='Visit date time override') 128 def main(archive_path, origin_url, visit_date): 129 """Loading archive tryout.""" 130 import datetime 131 origin = {'url': origin_url, 'type': 'tar'} 132 commit_time = int(datetime.datetime.now( 133 tz=datetime.timezone.utc).timestamp()) 134 swh_person = { 135 'name': 'Software Heritage', 136 'fullname': 'Software Heritage', 137 'email': 'robot@softwareheritage.org' 216 objects = self.objects 217 self.maybe_load_contents(objects['content'].values()) 253 """ 254 self.last_modified = last_modified 255 256 def get_tarball_url_to_retrieve(self): 257 return self.origin['url'] 258 259 def build_revision(self, filepath, nature, hashes): 260 """Build the revision with identifier 261 262 We use the `last_modified` date provided by the caller to 263 build the revision. 264 265 """ 266 return { 267 **compute_revision(filepath, self.last_modified), 268 'metadata': { 64 } 65 66 def download(self, url): 67 """Download the remote tarball url locally. 68 69 Args: 70 url (str): Url (file or http*) 71 72 Raises: 73 ValueError in case of failing to query 74 75 Returns: 76 Tuple of local (filepath, hashes of filepath) 77 78 """ 79 url_parsed = urlparse(url) mentioned in issue swh-deposit#1676 (closed)
- tar.loader: Rename method with clearer name
- tar.loader: Explicit the need for an implementation
- tar.loader: Fix typo
- tar.loader: Improve class name and use alias for retro-compatibility
- tar.loader: Improve class docstrings
- loader: Only explicit needed params & adapt docstring accordingly
- tar.loader: Improve docstring
Build is green See https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/58/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/59/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DLDTAR/job/tox/60/ for more details.