DANS
SATIFYD

Self-Assessment Tool to Improve the FAIRness of Your Dataset


Welcome to SATIFYD: the DANS Self-Assessment Tool to Improve the FAIRness of Your Dataset. This tool will show you how FAIR (Findable, Accessible, Interoperable, Reusable) your dataset is and will provide you with tips to score (even) higher on FAIRness. Ideally, you use this tool prior to the deposit in EASY.

The 12 questions touch upon the FAIR data principles but do not strictly follow them. While answering the questions, the score per letter will be displayed underneath each letter. The more ‘blue’ the letters get, the more FAIR your dataset is. An overall score is provided at the end of the page.

Some questions are posed more than once (e.g. on metadata and data standards or usage licences), because the topics are relevant in more than one letter.

Want to know more? Please click here 

If you have any questions, please let us know by sending an e-mail 



Controlled vocabularies
Taxonomies (thesauri)
Ontologies
There are no standards for my discipline

Readme file
Versioning
Provenance

  I can't find this information in EASY







Persistent Identifier(s)
Reference to other datasets
Reference to publications
No contextual metadata

Origin of data
Citations for reused data
Workflow description for collecting data (machine readable)
Processing history of data
Version history of data



INTRODUCTION
SATIFYD stands for: Self-Assessment Tool to Improve the FAIRness of Your Dataset. It is designed for manual assessment by the user, prior to the deposit of a dataset.

12 questions

The 12 questions of the tool are divided into three questions per letter, are based on the FAIR data principles. Those aspects of the FAIR principles which are already covered by EASY (e.g. whether or not the dataset will have a Persistent Identifier assigned), are left out this tool. Questions which typically concern machine readability are those about (meta)data standards such as controlled vocabularies, ontologies and taxonomies. Some questions are posed more than once (e.g. on metadata and data standards or usage licences), because the topics are relevant in more than one letter. Explanatory texts can be found with each question, by clicking on the “i” symbol.

Scores

After answering the questions, the letters are filled with blue color according to the percentage. An overall score is provided at the end of the page. The degree to which a dataset can be made FAIR is different for each discipline. Some have more standards available than others. In order to do justice to this difference, options to indicate that standards are not available have been added to the tool. Tips to improve the FAIRness of the dataset can be found at each letter, by clicking on ‘Want to improve?’.

Background

A prototype of a FAIR data assessment tool was established and first published in the summer of 2017 (see blog by Emily Thomas). The tool gives a 5-star rating for the Findability, Accessibility, Interoperability and Reusability of a dataset as well as a score of its overall FAIRness. After having received feedback from a broad range of users, it became clear that a step towards a full version of the FAIRdat tool meant to reformulate questions, provide more overview and reconsider the level of FAIR assessment (repository, dataset, files).

In 2018, a simple checklist to ‘roughly’ evaluate the FAIRness of datasets was created to give researchers with little experience in sharing their research resources an idea of what FAIR data sharing means. The questions are kept simple, the corresponding paragraphs containing explanations of terms and concepts short and concise. The checklist titled “FAIR enough? Checklist to evaluate FAIRness of data(sets)” which was first presented during the EOSC Stakeholder Forum in Vienna in November 2018, is accessible here.

Credits

SATIFYD was established by DANS: Eliane Fankhauser, Jerry de Vries, Nina Westzaan, Vesa Åkerman in 2019. Its layout is based on and inspired by the ARDC FAIR self assessment tool. The initial idea and a first prototype were established by Peter Doorn, Eleftheria Tsoupra and Emily Thomas at DANS in 2017-2018.
Now that you have finished your research project, you are on the brink of depositing your research data in a trustworthy long-term repository. Findability is one of the four pillars of the FAIR Guiding Principles . If you take care of the findability of your data, you will enable search engines to find it and possibly also link it to related sources on the web. Moreover, you will improve the exposure of your research and help researchers to find and potentially reuse your data.

Findability generally comes down to giving a proper description of your dataset. This description can be divided into three elements:
Metadata is information that describes an object such as a dataset. It gives context to the research data, providing information about the URL where the dataset is stored, creator, provenance, purpose, time, geographic locations, access conditions, and terms of use of a data collection. The extent to which metadata is provided for a dataset can vary greatly and has an effect on how findable a dataset is. In EASY a minimum number of metadata fields are required in order to successfully deposit a dataset. The minimum metadata, however, is not comprehensive enough to fulfil the requirements of FAIR. The following list provides a comprehensive list of items that should be covered when aiming for sufficient metadata:

Many of the items on this list also relate to the accessibility, interoperability and reusability of the dataset. These aspects will be dealt with in the respective sections of this tool.

You can document your research on metadata level and on dataset level. In order to make your metadata interoperable and machine actionale, use standardised controlled vocabularies, thesauri, ontologies. On the dataset level you should provide a project description and a dataset description. For example, add a codebook to make your data understandable for other researchers, add provenance information and a data/workflow process description. If you want to get to know more about standards, see the second question under Findable.

Click here if you want to know more about the term metadata.
Click here if you want to know more about the term interoperability.
To make your (meta)data findable we encourage the use of controlled vocabularies, taxonomies and/or ontologies.

A controlled vocabulary is an organized and standardized list of terms and can be used to describe data. Controlled vocabularies are mostly discipline-specific and therefore very useful for describing your data. By using controlled vocabularies your metadata becomes much more understandable for machines and users and therefore they improve the findability of your data.

A taxonomy is a classification of entities in an ordered system. A taxonomy is mostly domain specific and is used to identify the content/data by adding terms from the taxonomy to the content/data description. Identifying content in a structured way gives search engines the opportunity to optimize their search functionality. In this way more relevant data can be found based on a single search query. Therefore adding taxonomy terms to your dataset description the findability of your dataset will improve.

An ontology is a formal description of knowledge. This knowledge is described as a set of concepts and relations between these concepts within a specific domain. Ontologies are created to organize information into data and knowledge. An ontology attempts to represent entities, ideas and events, with all their interdependent properties and relations, according to a system of categories. By applying existing ontologies to describe your data, your data becomes more understandable for machines and thus improves the findability of your data.

From ontologies it is a small step to linked open data. Making use of linked open data means that your data is interlinked with other data, that your data is openly accessible and that your data can be shared within the semantic web. In this way your data is published in a structured and understandable way. Linked (open) data is described as a set of triples; following the RDF data structure. triple is a basic set of a subject, a predicate and an object. For example, a subject is “grass”, its corresponding predicate “has color” and the object is “green”. By linking your data to other data, more knowledge and information and links to your data becomes available. This will help to increase the findability of your data.

It is true that standardized controlled vocabularies, taxonomies or ontologies are not equally developed in the disciplines. For some disciplines a broad range of standards are available whilst others have none yet. There are, however, general standards, such as the Getty Thesaurus for geographical names, which can be used across disciplines.

In EASY some metadata sections already offer (domain specific) controlled vocabularies. For instance, to describe the subject and the temporal coverage you can select term from the “Archeologisch Basis Register” (ABR) and the newer version ABRplus. These vocabularies are maintained by the Dutch “Rijksdienst voor het Cultureel Erfgoed”

For the ones among you who are interested in more technical background information: The EASY/DANS datavault offers standardised terms for creator and contributor, these are derived from DataCite . To specify a Relation the standard specification of Dublin Core is used. For language the ISO 639 standard is used and for date fields the ISO 8601 standard is used. And to specify the format of the dataset you can make use of this standardised list.

Click here if you want to know more about linked data.
Click here if you want to know more about the semantic web.
Click here if you want to know more about RDF data structure.
Additional information is information that helps users to assess the content and the relevance of the dataset they are viewing. The most important means to provide additional information is a so-called readme file in which topics like the structure of the dataset are addressed. Questions like how many files does the dataset contain and how are they related to each other? Which software has to be used to assess the data? How many versions of data are contained in the dataset? Help users to assess and contextualize the dataset. Other topics to address include but are by no means limited to methodologies used, a detailed summary of the project in which the data was collected, information about whether and how the data was cleaned, how many versions of the dataset were made etc. Information about the provenance and the versioning of your data, moreover, can be added in addition to the readme file. If you have covered most of the items on the metadata list (see explanatory text Question 1) you already provide a satisfactory amount of additional information. Nevertheless, it is important to supplement your metadata with more contextual information.

This question also relates to the letter R (reusability) of FAIR.

Click here if you want to know more about readme flies.
CONTRADICTION
You answered question 1 with “no metadata”. This won’t allow you to answer the following two questions in F.
Advice to improve Findability
You filled in all or almost all of the optional fields on the Content Description page. This makes your data findable for other researchers and users. Question 2 concerns the use of standards to describe your data which enables machines to find and interlink your data.
You filled in some information on the Content Description page. In order to make your dataset more findable for researchers and users, check again if you can fill in more of the optional fields on the page. The more metadata you provide, the more findable (and reusable) your data will be.
On the Content Description page, additional fields like Relations to projects, internet pages or researchers, Format types, Languages, Sources on which the dataset is based can be filled in. Adding additional, rich metadata to your dataset will help other researchers to find but also to reuse (see questions under letter R) your data.
Fill in the required fields on the primary information page. Then, go to the Content Description page and check which additional metadata you could add to make your dataset more findable. The more metadata you provide, the more findable (and reusable, see question under letter R) your data will be.
Fill in the required fields on the primary information page. Then, go to the Content Description page and check which additional metadata you could add to make your dataset more findable. The more metadata you provide, the more findable (and reusable, see question under letter R) your data will be.
Check whether there are standards in your domain or field or generic standards that you can use to describe your dataset. Use them in the description (metadata). It is possible that there are no standards available in your field. If that is the case, make use of generic standards.
Using ontologies and taxonomies will improve the automated findability of your dataset. To increase the findability of your dataset, you can also use domain specific ontologies and linked open data if they are available.
Be aware that there are generic and domain-specific controlled ontologies, vocabularies and taxonomies.
You included the most important standards to make your dataset findable. Be aware of the fact that, within your domain, there could be specific controlled vocabularies, taxonomies or ontologies.
Using controlled vocabularies and ontologies will improve the automated findability of your dataset. You can increase the findability of your dataset even more by also making use of taxonomies, if available for you specific domain.
Using domain-specific controlled vocabularies and ontologies will improve the automated findability of your dataset.
Using taxonomies and ontologies to describe you data, will improve the automated findability of your dataset. You can also add terms from (domain-specific) controlled vocabularies to your data description to increase the findability of your dataset.
Using domain-specific controlled vocabularies and taxonomies will improve the automated findability of your dataset.
Add documentation about the datasets will improve the findability of your dataset. Think of a readme file, versioning, or the provenance of the data.
Next to the readme file, consider adding information about the provenance of the data and the versioning.
Consider also adding information about the provenance of your data.
You added rich and detailed information to your dataset by not only providing a readme file but also giving information about the provenance and the versioning of your data. If seen as an addition to rich and detailed metadata (question F2), it makes your dataset more findable and reusable.
Consider also adding information about the versioning.
Next to the versioning, consider adding information about the provenance of the data and a readme file.
Consider also adding a readme file to your dataset.
Next to the provenance, consider adding a readme file and information about the versioning.

The accessibility of a dataset and its corresponding metadata is essential for researchers to assess and potentially reuse a dataset. The questions that you will find under accessibility concern the accessibility of the metadata over time, meaning that the repository guarantees that the metadata will be available even if the data itself is no longer available, and the usage license chosen for the dataset. The latter determines to what extent or under which circumstances the dataset can be accessed. EASY has a number of usage licenses from which to choose, depending on the content of the data. In the FAIR Principles, the automated accessibility of metadata and data by machines is also covered under Accessibility. As EASY makes use of the Dublin Core metadata schema and provides a President Identifier (see Findability) for each dataset, the machine actionability of the metadata is covered. There is no question about this technical aspect in this part.

Metadata as described in Question 1 is the description of your data. As such it is associated with your dataset. For the accessibility but also for the findability of your data it is essential that the metadata of the dataset remains accessible even if the data itself is not available anymore. It is the repository you deposit in, in our case EASY, which should ensure that this is the case. With this question we would like to encourage you to check whether the metadata in EASY is publicly accessible even if the dataset is no longer available.
The extent to which you can make your dataset openly available depends on whether your dataset contains personal data. If it contains personal data, it is clear that you will have to restrict the access to your dataset. In question six you can further specify which usage license you intend to choose. In the explanatory text of Question six, the different usage licenses provided by EASY are listed.
On the EASY page where detailed information is provided about how the data should be deposited (“During depositing”), there is a list with licenses that you can choose from. Depending on the data and on whether or not the data contains personal data (see question 5) you can choose:

CONTRADICTION
If your data contains personal data, you won’t be able to choose the CC0 licence for your dataset.
Advice to improve Accessibility
All metadata in EASY is made openly accessible and will be available even after the data(set) is no longer available. Adding information to the metadata on your affiliation at the time of your research, provides users with a contact point to consult if they would like to track the availability of your data.
Check the source of your information about the availability of metadata in EASY again. All metadata in EASY is made openly accessible and will be available even after the data(set) is no longer available. Adding information to the metadata on your affiliation at the time of your research, provides users with a contact point to consult if they would like to track the availability of your data.
All metadata in EASY is made openly accessible and will be available even after the data(set) is no longer available. Information about metadata in EASY can be found on the DANS page "During Depositing" where the information is given under 4. Adding information to the metadata on your affiliation at the time of your research, provides users with a contact point to consult if they would like to track the availability of your data.
You have chosen the right license if your dataset doesn't contain personal data. Personal data is any information that relates to an identified or identifiable living individual. Different pieces of information, which together can lead to the identification of a particular person, also constitute personal data. Personal data that has been anonymised in such a way that the individual is not or no longer identifiable is no longer considered personal data. For data to be truly anonymised, the anonymisation must be irreversible.
You have chosen the right license if your data contains personal data. According to EASY policy, data which contains personal, sensitive data can only be deposited with restricted access for users. Personal data is any information that relates to an identified or identifiable living individual. Different pieces of information, which together can lead to the identification of a particular person, also constitute personal data. Personal data that has been de-identified, encrypted or pseudonymised but can be used to re-identify a person remains personal data and falls within the scope of the GDPR.
You have chosen the right license if your dataset cannot be accessed yet due to unpublished papers or an ongoing project. Be aware: EASY only allows for this option if you have a good reason. EASY allows a maximum number of 24 month to have your dataset under embargo. Your dataset will score low under Accessabilty, as users cannot access it.
You have chosen the right license if your dataset cannot be accessed yet due to unpublished papers or an ongoing project. Be aware, EASY only allows for this option if you have a good reason for choosing it. The maximum number of months for the embargo to last is 24. Your dataset will score low under Accessabilty, as users cannot access it.
Go to legal information and read the information about the different license types. Then assess which of these licenses apply to your type of data. See also the website of the European Commission for information about what personal data is: data protection .
Go to legal information and read the information about the different license types. Then assess which of these licenses apply to your type of data. See also the website of the European Commission for information about what personal data is: data protection .
If you want other researchers to reuse your data, it is important that your data can be integrated in other data(sets). This process of exchanging information between different information systems such as applications, storage or workflows is called interoperability. The following actions will improve the interoperability of your data:

Preferred formats not only give a higher certainty that your data can be read in the future, they will also help to increase the reusability and interoperability. Preferred formats are formats that are widely used and supported by the most commonly used software and tools. Using preferred formats enables data to be loaded directly into the software and tools used for data analysis. It makes it possible to easily integrate your data with other data using the same preferred format. The use of preferred formats will also help to transform the format to a newer one, in case a preferred format gets outdated.

Click here for the list of preferred formats in EASY.
The more interoperable your dataset is, the better it will be understood and processed by machines. Complementary information about your dataset can be stored in multiple other datasets. Therefore, it is essential to add context or contextual knowledge to your dataset by adding meaningful links to relevant resources. For instance, you should specify if your dataset builds on any other dataset or whether other, external datasets are needed to complete your dataset. If present, use Persistent Identifiers (see Question 2) to link to these online available (meta)data.
In order to increase the interoperability of your dataset, you should enrich its contextual knowledge. Contextual knowledge is information about how your data(set) was created and how it is composed. You can describe the contextual knowledge by adding links to all other (meta)data you have used when you collected your data. With the help of these links, other researchers will know which other datasets are needed in order to have the complete set of your data. It is also possible that complementary information is stored somewhere else or in another dataset. You need to describe all these scientific links, by properly citing related datasets. If these datasets have a unique and Persistent Identifiers , use it to link them.
Advice to improve Interoperability
Using preferred formats does increase the interoperability of your data!
Before depositing your data, try to convert your data (if possible) to preferred formats. Not only will this increase the interoperability of your dataset but also the accessibility and reusability.
Before depositing your data, try to convert your data (if possible) to preferred formats. Not only will this increase the interoperability of your dataset but also the accessibility and reusability.
Preferred formats are file formats of which DANS is confident that they will be stable enough in the long term to ensure accessibility, interoperability and reusability.
To increase te interoperability of your dataset we advice to use preferred formats.
Linking to other metadata will increase the interoperability of your dataset. If you link to other metadata which is online available, if possible, always make use of a Persistent Identifier (PID) to refer to this metadata.
It is advisable to link to other (meta)data even though these are not accessible online. You can add a description to the (meta)data you have linked to in your own dataset.
Linking your dataset with other metadata / datasets will increase the interoperability of your dataset. You can add a link via a Persistent Identifier (PID). Examples for PIDs are DOI, URN, ORCiD.
Linking your dataset with other metadata / datasets will increase the interoperability of your dataset. You can add a link via a Persistent Identifier (PID). Examples for PIDs are DOI, URN, ORCiD.
Adding contextual information to your dataset will increase the interoperability of your dataset. You can think of adding references to related / own publications and / or datasets. Moreover, you can add links via Persistent Identifiers (PIDs). Examples for PIDs are DOI, URN, ORCiD.
It is useful to enrich your metadata with your ORCiD (persistent digital identifier for people). Also, adding links to related publications, if possible with their Persistent Identifier (PID), will improve the quality of the contextual information. Also refer to other publications. If possible, always make use of a Persistent Identifier (PID) to link to publications.
To increase the interoperability of your dataset add references to relevant publications and related datasets. Also add Persistent Identifiers such as your ORCiD (persistent digital identifier for people).
Adding as much contextual information as possible will increase the interoperability of your dataset
To increase the interoperability of your dataset add references to relevant related datasets. Also add Persistent Identifiers such as your ORCiD (persistent digital identifier for people).
It is useful to enrich your metadata with your ORCiD (persistent digital identifier for people). Also, adding links to related publications, if possible with their Persistent Identifier (PID), will improve the quality of the contextual information.
It is useful to enrich your metadata with your ORCiD (persistent digital identifier for people) and other Persistent Identifiers that relate to your research and your dataset.
It is highly recommended to refer to other publications. If possible, always make use of a Persistent Identifier (PID) to link to publications. You can also add ORCiDs (persistent digital identifier for people) to your dataset.
The ultimate goal in making data FAIR is to foster reusability. Whether or not datasets are reusable by other researchers is dependent on a number of aspects. One of the preconditions is that the dataset has a usage license which clarifies under which circumstances the data may be reused. Because of the importance of this aspect, the question about the licenses, which you already answered under Accessible, is repeated here. In order to gain insight into the process of data generation, it is important to describe the data and metadata as detailed as possible. Think of questions like Under which circumstances did I / we collect the data? Where does the data come from? Moreover, similar to aspects in Findable, Accessible and Interoperable, it is important that you meet the standards in your discipline when describing your data and metadata.
To let other researchers make use of your dataset, it is essential to explain the origin of your data and what steps you have taken to produce the dataset. Therefore it is very important to provide provenance information with your dataset. This provenance information can consist of for instance the description of the origin of the data; How did you collect your data, did you reuse other data? In that case add the right citations to your dataset. Or did you create your own data? Describe the workflow for the data creation and describe the processing of the data. If you have used any versioning in your data, add this versioning information to your dataset.
You already answered this question under Accessible. Nevertheless, we consider it important that choosing the right usage license is highlighted under Reusable, too, as it is one of the key elements the may or may not allow other researchers to reuse a dataset. On the EASY page where detailed information is provided about how the data should be deposited (“During depositing”), there is a list with licenses that you can choose from. Depending on the data and on whether or not the data contains personal data (see question 5) you can choose:

It is more likely that other researchers reuse your data if the metadata contains domain-specific standards, i.e. (meta)data has the same type, is organised in a standardized way, follows a commonly used or community accepted template, etc. Within different communities and domains minimal standards have been described but, unfortunately, not every domain has standards yet. More generic standards that you could use if there are no domain-specific standards are described in Question 2. Most of the standards come with instructions on how to use them.
Advice to improve Reusability
Adding provenance information to your dataset will increase the reusability. The more provenance information you provide, the better. Information about provenance includes but is not limited to: Origin of data, citations for reused data, workflow description for collecting data (machine readable), processing and version history of data.
You have chosen the right license if your dataset does not contain personal data. Personal data is any information that relates to an identified or identifiable living individual. Different pieces of information, which together can lead to the identification of a particular person, also constitute personal data. Personal data that has been anonymised in such a way that the individual is not or no longer identifiable is no longer considered personal data. For data to be truly anonymised, the anonymisation must be irreversible.
You have chosen the right license if your data contains personal data. According to EASY policy, data which contains personal, sensitive data can only be deposited with restricted access for users. Personal data is any information that relates to an identified or identifiable living individual. Different pieces of information, which together can lead to the identification of a particular person, also constitute personal data. Personal data that has been de-identified, encrypted or pseudonymised but can be used to re-identify a person remains personal data and falls within the scope of the GDPR.
You have chosen the right license if your dataset cannot be accessed yet due to unpublished papers or an ongoing project. Be aware: EASY only allows for this option if you have a good reason. EASY allows a maximum number of 24 month to have your dataset under embargo. Your dataset will score low under Accessabilty, as users cannot access it.
You have chosen the right license if your dataset cannot be accessed yet due to unpublished papers or an ongoing project. Be aware, EASY only allows for this option if you have a good reason for choosing it. The maximum number of months for the embargo to last is 24. Your dataset will score low under Accessabilty, as users cannot access it.
Go here to read information about the different license types. Then assess which of these licenses apply to your type of data. See also the website of the European Commission for information about what personal data is.
Go here to read information about the different license types. Then assess which of these licenses apply to your type of data. See also the website of the European Commission for information about what personal data is.
Not only has it become easier to find your data (see unddf 'F'), your metadata now also meets the requirements for proper and correct reuse. This is an important step for your data to become more FAIR.
Your data meet domain standards to a certain extent. FAIR data should at least contain minimal information standards. Try to look for ways to improve. The more your data and metadata are organized in a standardized way, the better they are suited for re-use! Always try to keep in mind the user-perspective.
Generic metadata standards are widely adopted. Domain standards, however, are much richer in vocabulary and structure and therefore will help researchers within your discipline to reuse your data. Check whether your domain has specific metadata standards.
FAIR data should at least contain minimal information standards. Did you check whether there are metadata standards available for your domain? The more your metadata and data are organized in a standardized way, the better they are suited for re-use. Always try to keep in mind the user-perspective.
FAIR data should at least contain minimal information standards. Did you check whether there are metadata standards available for your domain? The more your metadata and data are organized in a standardized way, the better they are suited for re-use. Always try to keep in mind the user-perspective.