Elias Chetouane · 7e3a61ef
--- a/notes-workshop-harvesting.md

+ 24

− 10
+++ b/notes-workshop-harvesting.md

+ 24

− 10
 @@ -2,21 +2,35 @@

 Lead: Violaine Louvet and Elias Chetouane. Contributor: Valentin Lorentz

-Name of the note taker:
+Name of the note taker: Elias

-## List of attendees
+## Context

+This workshop is a follow-up to the "UGA Open research data monitor" project, which aims to retrieve data published by all researchers affiliated 
+to the Université Grenoble Alpes. This project uses various data warehouses (Zenodo, RechercheDataGouv...) and Datacite to obtain DOIs. The aim is to 
+analyze these results in order to adapt the data policy and support UGA researchers in managing their data and publishing it.

-## Workshop description 
-The idea is to understand how we can harvest SWH Database in order to identify entries related to a specific institution
+This project is available on git :
+https://gricad-gitlab.univ-grenoble-alpes.fr/mlarrieu/open-research-data-monitor-back

-## 3 key messages for the report back
+Having done this for the data, the next step is to do a similar job for the codes. The question addressed at the workshop was "how to retrieve 
+research codes produced by researchers at a particular institution" (UGA in our case).

-### Idea #1
+## Possible solutions

-### Idea #2
+- use the HAL API and search for codes published on HAL for which at least one of the authors is affiliated with the institution you are looking for.
+- use the SWH API to obtain codes originating from an institutional software forge (for example, the UGA gitlab instance should contain only UGA 
+research codes).
+- research on the metadata retrieved by SWH from the "codemeta.json" file, that is present in some repositories. This metadata is not publicly 
+accessible from the API, so you need to ask to SWH staff to update the query result regularly.
+- for maximum coverage (but with a much higher computational requirement and research time), it would be possible to retrieve all the README.md files 
+from all the projects harvested by SWH and run a text-based metadata retrieval program on them to hopefully obtain author affiliations.
+One such program is SOMEF :
+https://github.com/KnowledgeCaptureAndDiscovery/somef 

-### Idea #3
+## Conclusion

-## Pictures (optional)
-No picture of people but picture of posters, any printed material, etc. that can help to understand the work done.
+These are the ideas that were discussed during the workshop, but other solutions could also be be considered. In fact, each of these solutions does 
+not make it possible to obtain all the codes produced by an an institution. So it's best to implement several of these solutions and continue to 
+encourage the application of best practices in terms of disseminating research products, such as creating a codemeta.json in the git repository, 
+archiving code on SWH and using institutional forges for software development.