Create a swh-based model (using Hugging face training dataset) to establish a baseline for later comparison with an enriched dataset