Preserving your data
Data that is the underlying evidence for a publication, or are individual outputs of the project should be preserved. In order to ensure access in the future, either by the owners of the data, or by others who wish to reuse it, these data must be prepared for long-term storage.
- selecting data for preservation.
- formatting the data appropriately for reuse. For example using open rather than proprietary formats.
- storing the data in an appropriate location, either through a managed service within the institution, or through an external third party service.
- describing the data by creating a record in Elements and uploading this record (with or without the data files) to QMRO.
What to preserve:
- Data that supports published findings
Datasets supporting published findings in the scholarly literature should be described and stored in such a way that they can be easily found through an online search. This does not mean that the dataset can be openly accessed, only that a description of the data, how it was created, and how it can be accessed is provided.
This kind of data is created as part of the research, but does not specifically provide evidence for a publication, however it should still be considered for long term storage. You need to appraise this data as follows:
- Does your funding body require you to keep all or some of your data for a fixed period?
- Does the data have potential for reuse by yourself or others at a later stage?
- Does the data constitute a vital component of a project that would help give context to the research and its findings?
- Does documentation and appropriate metadata (descriptive information) exist (or is it able to be created) in order for others to understand what the data is, how it was created/collected, and what it represents?
Where to deposit data
Where preserved data is stored is affected by several factors. This includes: how sensitive the data is, whether it can be easily shared, the volume/size and type of data.
Queen Mary Research Online is an institutional research repository, datasets can be stored here for long term access and sharing. A metadata (descriptive) record of the dataset and access terms are provided along with a service to mediate access to restricted datasets. Datasets do not have to be fully open access to be stored here.
There are lots of external data storage services; some examples follow, with details of the criteria for data deposit, and the terms of service. For more data services, including disciplinary or subject data repositories, try the Registry of Research Data Repositories.
Dryad accepts datasets that support publication in the life science disciplines, with particular emphasis on the scientific and medical. There is a limit of 10 GB of data per publication, but if more storage is required this can be arranged (charges apply). More information on their collections policy is available online.
Figshare is a data repository for any and all datasets. It offers unlimited storage for open datasets, and a maximum of 1 GB of private data storage. Open datasets are published under a Creative Commons CC0 license allowing the widest possible use and reuse without attribution. Other output types (such as posters, images, and other media objects) are published under a CC BY license. Storage is provided by the Amazon Web Service and hosted in the United States.