Skip to content
Snippets Groups Projects

tar.loader: Make the loader-tar able to download remote artifact

9 unresolved threads

Also:

  • Keep the legacy behavior (use the revision and snapshot's branch provided by the caller, e.g. the deposit loader depends on this)
  • Increase code coverage (50 -> 93%)
  • Remove dir loader inheritance (nightmare to maintain). We still need the dir loader for some functions but the main issue is resolved).
  • Remove deprecated producer code (we use the scheduler now)

Related T1351 Related T1431 Depends on !9 (closed)

Use sample:

>>> url = 'https://ftp.gnu.org/gnu/8sync/8sync-0.1.0.tar.gz'
>>> origin = {'url': url, 'type': 'tar'}
>>> visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
>>> last_modified = '2016-04-22 16:35'
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>>
>>> from swh.loader.tar.tasks import LoadTarRepository
>>> l = LoadTarRepository()
>>> l.run_task(origin=origin, visit_date=visit_date,
...            last_modified=last_modified)
DEBUG:swh.scheduler.task.LoadTarRepository:Creating tar origin for https://ftp.gnu.org/gnu/8sync/8sync-0.1.0.tar.gz
DEBUG:swh.scheduler.task.LoadTarRepository:Done creating tar origin for https://ftp.gnu.org/gnu/8sync/8sync-0.1.0.tar.gz
DEBUG:swh.scheduler.task.LoadTarRepository:Creating origin_visit for origin 3 at time Tue, 3 May 2017 17:16:32 +0200
DEBUG:swh.scheduler.task.LoadTarRepository:Done Creating origin_visit for origin 3 at time Tue, 3 May 2017 17:16:32 +0200
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): ftp.gnu.org:443
DEBUG:urllib3.connectionpool:https://ftp.gnu.org:443 "GET /gnu/8sync/8sync-0.1.0.tar.gz HTTP/1.1" 200 221837
INFO:swh.scheduler.task.LoadTarRepository:Uncompress /tmp/swh.loader.tar._nh29j5e-18075/8sync-0.1.0.tar.gz to /tmp/swh.loader.tar._nh29j5e-18075/swh.loader.tar-7wox5_rx
DEBUG:swh.scheduler.task.LoadTarRepository:Sending 28 contents
DEBUG:swh.scheduler.task.LoadTarRepository:Done sending 28 contents
DEBUG:swh.scheduler.task.LoadTarRepository:Sending 7 directories
DEBUG:swh.scheduler.task.LoadTarRepository:Done sending 7 directories
DEBUG:swh.scheduler.task.LoadTarRepository:Sending 1 revisions
DEBUG:swh.scheduler.task.LoadTarRepository:Done sending 1 revisions
DEBUG:swh.scheduler.task.LoadTarRepository:Updating origin_visit for origin 3 with status full
DEBUG:swh.scheduler.task.LoadTarRepository:Done updating origin_visit for origin 3 with status full
DEBUG:swh.scheduler.task.LoadTarRepository:Clean up /tmp/swh.loader.tar._nh29j5e-18075
{'status': 'eventful'}

Test Plan

tox


Migrated from D795 (view on Phabricator)

Merge request reports

Approved by

Closed by Phabricator Migration userPhabricator Migration user 6 years ago (Dec 10, 2018 11:23am UTC)

Merge details

  • The changes were not merged into generated-differential-D795-target.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
1 1 #!/usr/bin/make -f
2 2
3 3 export PYBUILD_NAME=swh.loader.tar
4 export export PYBUILD_TEST_ARGS=--with-doctest -sv -a !db,!fs
4 export PYBUILD_TEST_ARGS=-m 'not db and not fs'
  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 147 def cleanup(self):
    148 """Clean up temporary disk folders used.
    59 149
    60 150 """
    61 # Shortcut super() as we use different arguments than the DirLoader.
    62 return BufferedLoader.load(self, tar_path=tar_path, origin=origin,
    63 visit_date=visit_date, revision=revision,
    64 branch_name=branch_name)
    151 if self.debug:
    152 self.log.warn('%s Will not clean up temp dir %s' % (
    153 DEBUG_MODE, self.temp_directory
    154 ))
    155 return
    156 if os.path.exists(self.temp_directory):
    157 self.log.debug('Clean up %s' % self.temp_directory)
    158 shutil.rmtree(self.temp_directory)
  • Dependent diff landed so trigger the new build for this

    • tar.tasks: Improve docstring and test the task
    • README: Update documentation and remove obsolete documentation
  • Remove wrong comment

  • 48 'microseconds': 830834
    68 49 })
    69 50
    70 mock_os.lstat.assert_called_once_with('some/path')
    71
    72 @patch('swh.loader.tar.build.os')
    73 def test_time_from_path_with_int(self, mock_os):
    74 class MockStat:
    75 st_mtime = 1445348286
    76
    77 mock_os.lstat.return_value = MockStat()
    78
    79 actual_time = build._time_from_path('some/path')
    51 def test_time_from_last_modified_with_int(self):
    52 actual_time = build._time_from_last_modified(
    53 '2015-10-20T13:38:06+00:00')
  • 56 41
    57 @patch('swh.loader.tar.build.os')
    58 def test_time_from_path_with_float(self, mock_os):
    59 class MockStat:
    60 st_mtime = 1445348286.8308342
    61 mock_os.lstat.return_value = MockStat()
    62
    63 actual_time = build._time_from_path('some/path')
    42 def test_time_from_last_modified_with_float(self):
    43 actual_time = build._time_from_last_modified(
    44 '2015-10-20T13:38:06.830834+00:00')
    64 45
    65 46 self.assertEqual(actual_time, {
    66 47 'seconds': 1445348286,
    67 'microseconds': 8308342
    48 'microseconds': 830834
  • 64 69 """
    65 70 # given
    66 71 origin = {
    67 'url': 'file:///tmp/sample-folder',
    68 'type': 'dir'
    72 'url': self.repo_url,
    73 'type': 'tar'
    74 }
  • Heads up, it's cool and all but:

    • the diff is too big so i'll split this in multiple ones.
    • i need to update the diff (done locally and tests ok \m/) to add back the legacy behavior as deposit depends on it.
    • loader.tar: Add back the legacy behavior (deposit depends on it)
    • tar.loader: Fix typos
    • resources/producer: Remove dead configuration
  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 154 ))
    155 return
    156 if os.path.exists(self.temp_directory):
    157 self.log.debug('Clean up %s' % self.temp_directory)
    158 shutil.rmtree(self.temp_directory)
    65 159
    66 160 def prepare_origin_visit(self, *, origin, visit_date=None, **kwargs):
    161 """Prepare the origin visit information.
    162
    163 Args:
    164 origin (dict): Dict with keys {url, type}
    165 visit_date (str): Date representing the date of the
    166 visit. None by default will make it the current time
    167 during the loading process.
    168
    169 """
  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 124 @click.option('--archive-path', required=1, help='Archive path to load')
    125 @click.option('--origin-url', required=1, help='Origin url to associate')
    126 @click.option('--visit-date', default=None,
    127 help='Visit date time override')
    128 def main(archive_path, origin_url, visit_date):
    129 """Loading archive tryout."""
    130 import datetime
    131 origin = {'url': origin_url, 'type': 'tar'}
    132 commit_time = int(datetime.datetime.now(
    133 tz=datetime.timezone.utc).timestamp())
    134 swh_person = {
    135 'name': 'Software Heritage',
    136 'fullname': 'Software Heritage',
    137 'email': 'robot@softwareheritage.org'
    216 objects = self.objects
    217 self.maybe_load_contents(objects['content'].values())
  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 253 """
    254 self.last_modified = last_modified
    255
    256 def get_tarball_url_to_retrieve(self):
    257 return self.origin['url']
    258
    259 def build_revision(self, filepath, nature, hashes):
    260 """Build the revision with identifier
    261
    262 We use the `last_modified` date provided by the caller to
    263 build the revision.
    264
    265 """
    266 return {
    267 **compute_revision(filepath, self.last_modified),
    268 'metadata': {
  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 64 }
    65
    66 def download(self, url):
    67 """Download the remote tarball url locally.
    68
    69 Args:
    70 url (str): Url (file or http*)
    71
    72 Raises:
    73 ValueError in case of failing to query
    74
    75 Returns:
    76 Tuple of local (filepath, hashes of filepath)
    77
    78 """
    79 url_parsed = urlparse(url)
  • Merge request was accepted

  • vlorentz approved this merge request

    approved this merge request

    • tar.loader: Rename method with clearer name
    • tar.loader: Explicit the need for an implementation
    • tar.loader: Fix typo
    • tar.loader: Improve class name and use alias for retro-compatibility
    • tar.loader: Improve class docstrings
    • loader: Only explicit needed params & adapt docstring accordingly
    • tar.loader: Improve docstring
    • tar.loader: Use urllib.parse.urlparse to actually parse url
  • Rebase

  • Merge request was merged

  • Please register or sign in to reply
    Loading