Skip to content
Snippets Groups Projects

Workshop "Using Software Heritage to harvest institution’s entries" report.

Merged Elias Chetouane requested to merge chetouae/workshop-2024:main into main
1 file
+ 24
10
Compare changes
  • Side-by-side
  • Inline
+ 24
10
@@ -2,21 +2,35 @@
Lead: Violaine Louvet and Elias Chetouane. Contributor: Valentin Lorentz
Name of the note taker:
Name of the note taker: Elias
## List of attendees
## Context
This workshop is a follow-up to the "UGA Open research data monitor" project, which aims to retrieve data published by all researchers affiliated
to the Université Grenoble Alpes. This project uses various data warehouses (Zenodo, RechercheDataGouv...) and Datacite to obtain DOIs. The aim is to
analyze these results in order to adapt the data policy and support UGA researchers in managing their data and publishing it.
## Workshop description
The idea is to understand how we can harvest SWH Database in order to identify entries related to a specific institution
This project is available on git :
https://gricad-gitlab.univ-grenoble-alpes.fr/mlarrieu/open-research-data-monitor-back
## 3 key messages for the report back
Having done this for the data, the next step is to do a similar job for the codes. The question addressed at the workshop was "how to retrieve
research codes produced by researchers at a particular institution" (UGA in our case).
### Idea #1
## Possible solutions
### Idea #2
- use the HAL API and search for codes published on HAL for which at least one of the authors is affiliated with the institution you are looking for.
- use the SWH API to obtain codes originating from an institutional software forge (for example, the UGA gitlab instance should contain only UGA
research codes).
- research on the metadata retrieved by SWH from the "codemeta.json" file, that is present in some repositories. This metadata is not publicly
accessible from the API, so you need to ask to SWH staff to update the query result regularly.
- for maximum coverage (but with a much higher computational requirement and research time), it would be possible to retrieve all the README.md files
from all the projects harvested by SWH and run a text-based metadata retrieval program on them to hopefully obtain author affiliations.
One such program is SOMEF :
https://github.com/KnowledgeCaptureAndDiscovery/somef
### Idea #3
## Conclusion
## Pictures (optional)
No picture of people but picture of posters, any printed material, etc. that can help to understand the work done.
These are the ideas that were discussed during the workshop, but other solutions could also be be considered. In fact, each of these solutions does
not make it possible to obtain all the codes produced by an an institution. So it's best to implement several of these solutions and continue to
encourage the application of best practices in terms of disseminating research products, such as creating a codemeta.json in the git repository,
archiving code on SWH and using institutional forges for software development.
Loading