File management or file format refers to the organization of digital information that is read and processed through computer software. The format and software selected to organize research data are typically determined by how researchers choose to collect and analyze their data or by standard norms practiced within a discipline.
Why is file structure important? After completing a long project, have you ever asked yourself the following:
Now where did I put that? Why can't I open that document?
File structure or organization is important because it helps you to keep track of the multitude of formats and versions over time. Spending time now on good file structure will help you to avoid loss of data/files/your work in the future and will save you time. It will also help you to easily share data with others.
Now, there is no single way to do this, so you need to establish and document a structure that works for you and at the same time, have a clear system to that others can understand. Keep in mind the 5 Cs: be Clear, Concise, Consistent, Correct and Conformant.
There are two basic types of organizing digital materials: Hierarchical or Tag-Based
In a Hierarchical system, items are organized into folders and sub-folders. This is a very common form of arrangement of digital files.
In a Tag-Based system, items are organized by assigning tags and using the tags to form categories. MIT Libraries has an excellent guide to Tagging and Finding Your Files at: http://libguides.mit.edu/metadataTools.
Remember that you need to find a structure that works for you and your research needs. This structure can be a combination of a hierarchical system and a tag-based system.
Having a good file structure is a good start, but you should also think about file format, file naming conventions, version control and storage.
Which file format to select is usually determined by existing standards or whichever file format is commonly used within a discipline or based on a particular type of project. Even if files are saved in a commonly-used format, access to all types of data is in jeopardy due to the obsolescence of hardware and software over time. Although backward compatibility and previous software versions can often overcome outdated formatting, these solutions are not completely infallible. Therefore, converting data into standard formats can more safely guarantee long-term data access, sharing, and future transformation. Be aware that media can degrade quickly, unexpectedly, and inconsistently.
Formats more likely to be accessible in the future are:
Data should be saved in a non-proprietary file format whenever possible, but you may want to consider saving that data in both an open format and in the proprietary format to preserve formatting.
The Library of Congress has issued a Recommended Formats Standards document that identifies formats which best meet the needs of all format concerns. It is available at: http://www.loc.gov/preservation/resources/rfs.
Examples of recommended file formats include:
For names of files, use the following guidelines:
Throughout the research process, multiple versions of documents or files can be created, so it is important to have version control in order to differentiate between the various versions.
Version control is the process of naming and distinguishing between a series of draft documents and a final version. Version control should be used where more than one version of a document exists or where this is likely to be the case in the future (ex. manipulation of data sets). Versioning can also refer to saving new copies of your files when you make changes to the master file. So whenever you make changes to a file, it is recommended that you record what changes are being made and that you give the files a unique name.
There are various ways to indicate versions, so make sure to be consistent and consider creating a guide to your version control. Additionally, some organizations and institutions have recommendations on version control, so be sure to follow their guidelines.