Skip to Main Content

Data Management

This guide will assist you through the process for developing a data management plan, as well as helping you to identify and address research funders' requirements.

Managing Files

File management or file format refers to the organization of digital information that is read and processed through computer software. The format and software selected to organize research data are typically determined by how researchers choose to collect and analyze their data or by standard norms practiced within a discipline.

Why is file structure important?  After completing a long project, have you ever asked yourself the following:

Now where did I put that?  Why can't I open that document?

File structure or organization is important because it helps you to keep track of the multitude of formats and versions over time.  Spending time now on good file structure will help you to avoid loss of data/files/your work in the future and will save you time.  It will also help you to easily share data with others.

Now, there is no single way to do this, so you need to establish and document a structure that works for you and at the same time, have a clear system to that others can understand.  Keep in mind the 5 Cs: be Clear, Concise, Consistent, Correct and Conformant.

There are two basic types of organizing digital materials: Hierarchical or Tag-Based

In a Hierarchical system, items are organized into folders and sub-folders.  This is a very common form of arrangement of digital files.

In a Tag-Based system, items are organized by assigning tags and using the tags to form categories.  MIT Libraries has an excellent guide to Tagging and Finding Your Files at: http://libguides.mit.edu/metadataTools.

Remember that you need to find a structure that works for you and your research needs.  This structure can be a combination of a hierarchical system and a tag-based system.

Having a good file structure is a good start, but you should also think about file format, file naming conventions, version control and storage.

Which file format to select is usually determined by existing standards or whichever file format is commonly used within a discipline or based on a particular type of project.  Even if files are saved in a commonly-used format, access to all types of data is in jeopardy due to the obsolescence of hardware and software over time. Although backward compatibility and previous software versions can often overcome outdated formatting, these solutions are not completely infallible. Therefore, converting data into standard formats can more safely guarantee long-term data access, sharing, and future transformation. Be aware that media can degrade quickly, unexpectedly, and inconsistently.

Formats more likely to be accessible in the future are:

  • adherent to an open or documented standard
  • in common usage by the research community
  • non-proprietary
  • unencrypted
  • uncompressed

Data should be saved in a non-proprietary file format whenever possible, but you may want to consider saving that data in both an open format and in the proprietary format to preserve formatting.

The Library of Congress has issued a Recommended Formats Standards document that identifies formats which best meet the needs of all format concerns.  It is available at: http://www.loc.gov/preservation/resources/rfs.  

Examples of recommended file formats include:

  • ODF, not Word
  • ASCII, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

For names of files, use the following guidelines:

  • Create unique file names.  Duplicate file names will cause problems.  Include your full name (last, first, middle) and educational program/department along with a short word describing the work and the date.  For example:
    • Smith_Jane_Anne_History_Thesis_2014.pdf
    • Smith_John_Henry_Physical_Education_APP__2012.pdf
    • Smith_John_David_Honors_Seminar_Paper_2013.pdf
    • Smith_James_Richard_Art_History_Senior_Thesis_2014.pdf
    • Smith_Rebecca_Lynn_URS_Poster_2011.pdf
  • Avoid using special characters.  Special characters are often reserved for use by the operating system.
  • Use underscores (_) or dashes (-) to represent spaces.  Spaces are often reserved for operating system functions and might be misread.
  • Use leading zeros with the numbers 0-9 to facilitate proper sorting and file management.  Use leading zeros if the file name includes numbers that use zeros as placeholders.  For example, if there is a collection with 897 images submitted, start with 001, 002, 003, etc.
    • Smith_LeAnn_Joyce_Research_Germany_Photograph_001.tif
    • Smith_LeAnn_Joyce_Research_Germany_Photograph_002.tif
    • Smith_LeAnn_Joyce_Research_Germany_Photograph_003.tif
    • Smith_LeAnn_Joyce_Research_Germany_Photograph_897.tif
  • Dates should follow the ISO 8601 standard of YYYY-MM-DD or YYYYMMDD.  Variations include YYYY, YYYY-MM, YYYY-YYYY.  This maintains chronological order.  For example:
    • Smith_Steven_Robert_Education_Dataset_2002-03-20.csv
    • Smith_Steven_Robert_Education_Dataset_2002-04-12.csv
    • Smith_Steven_Robert_Education_Dataset_2012-05-30.csv
  • Always include the three character file extension preceded with a period (Example: .pdf or .tif).

Throughout the research process, multiple versions of documents or files can be created, so it is important to have version control in order to differentiate between the various versions.

Version control is the process of naming and distinguishing between a series of draft documents and a final version.  Version control should be used where more than one version of a document exists or where this is likely to be the case in the future (ex. manipulation of data sets).  Versioning can also refer to saving new copies of your files when you make changes to the master file.  So whenever you make changes to a file, it is recommended that you record what changes are being made and that you give the files a unique name.

  • First be sure to follow good naming conventions.
  • Then, consider including a version number such as v1, v2, v.2.5, v.3 or 0-1,0-2, 0-3, etc.
  • Next, include a statement to indicate the status of the file, such as draft, final, etc.
  • Finally, you can include information about what changes were made to the file, such as normalized, cropped, etc.

There are various ways to indicate versions, so make sure to be consistent and consider creating a guide to your version control.  Additionally, some organizations and institutions have recommendations on version control, so be sure to follow their guidelines. 

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License
.