What's the purpose of archiving your electronic data?
Preservation. Simply put, archiving, in this sense, is preservation. Of memories (vacation photos, wedding videos, the Family Reunion Of Which You Do Not Speak Of, etc.) and data (like photographs, business documents, scripts, novels, music etc.) that are static and won't change. Revisions and retouching aside, archiving stabilizes and stores a set of information that you can use for reference, proof and perhaps most vital of all, connection.
- Need the very first "final version" of the company handbook before the current set of revisions got too bulky? Here you go.
- A lawsuit is threatening to come down on your company due to what the opposition claims is shoddy work? You pull out the pics to prove it just ain't so.
- A dispute rises over the contract? You got it in writing, and with the digital signatures all in place.
- You have a lifetime's worth of photographs. You want it to last beyond your lifetime, so your great-grands can get to know you even after you're gone.
- Want to show your kids that yes, Mama was a natural blonde before puberty hit and her hair got darker? Sure.
- Your in-laws want to see the ultrasound? You want your as-yet-unbaked bun-in-the-oven to get to see it? You got it.
- You have something you need to say, something you want the world to know. Share it across time.
Insurance (as back-up) is secondary (but a
close second) in this case. If you've done your back ups according to plan, you're covering those bases. Archiving is simply making sure that any data you have that's important to you is stored, safely, uncorrupted, and available in a way/medium that will
survive time and technological changes.Think of the rate of change you've experienced just in your lifetime. There are people who have never experienced watching videotapes back when they were
tapes. (Betamax, anyone?) There are people you know who have never experienced life without the Internet, and the Internet itself as we know it is barely a quarter-century old. And see what changes
that phenomenon spurred in communication alone. Tech changes come in waves, but the need to keep important things safe is an age-old imperative. You only have visit museums to see that need expressed in material form.
So, back to being SMART (remember SMART?)...
Preserve
what e-data? Aside from the obvious static part (original versions, for example), basically, anything and everything you value and want to keep with you (or at the very least pass on for posterity and historical-cultural value, if nothing else.)
You've probably experienced one or more of the following scenarios:
- The frustration of trying to decipher cryptic file names and labeling systems that made sense at the time. Back then, you were in a real hurry,it was convenient... and now you're paying for your hastiness.
- Giving up after you've forgotten the password to a VIP file. Of master passwords.
- Having to surrender all hope of retrieving a file because the program used to create and open it is obsolete. Or the medium used as storage is obsolete, or was improperly stored and was damaged without you knowing.
- Losing childhood memories because the tapes deteriorated while in storage, to the point that playing them would destroy them.
There is no one perfect set-up. Unless you have the wherewithal to have a climate controlled vault for all your treasures, you have to make do with what you've got and with the technology available, adjusting when the tech changes enough to warrant another transfer to a new medium. For now, anyway, the most affordable option for the man on the street are hard-drives and optical media to capture and store data.
How do you set it up?The ideal archive set-up would have order built in. Logical, concise, neat. Easily accessed and accessible -- to a point. Some archivist argue that the best archives, once verified and checked corruption-free, should be left entirely alone to preserve the pristine security of the data. "That's why you archive them. Seal them away, don't touch. Use your
other master-copy."
There's also the question of versioning -- keeping copies that changed over time -- as well as thinking of optimal physical separation and placement. An entire neighborhood can be affected by fire and flood. How far do you want to go to keep your archive safe? In another house, your neighbor's, perhaps? In a bank vault? Would the master copies be separated by state? By
continent? Don't laugh, there are people making good money thinking of things like this.
Where would you store it?Remember the 3-2-1 rule? 3 copies, 2 mediums, 1 copy off-site. Aside from extra hard drives, the simplest, easiest, cheapest storage medium for archiving electronic data are CD's and DVD's.
Simple copy means burning to CD/DVD. Get one of the better quality brands. With carrels coming to $20.00-30.00 for say, 100 pieces, you get data insurance
for pennies.
Pennies. People whose job it is to know about things like this will
tell you to invest in quality recording optical media like Taiyo Yuden
(touted as simply the best quality optical media for people who really
want to make sure their data will remain safe and incorruptible for a
very long time -- with the proper care of course).
Who will you trust with your archive?Where, and with whom will you put your precious data? You have a lot off options: In the bank, with a friend, out of state. If you're really serious about it, in a climate controlled vault with positive air-flow, to keep the contaminants out. Or, you can store your optical data
like the people at the Smithsonian do:
(Images) are stored redundantly on DVD-R/DVD+R format
optical media with a minimum of 2 offline copies: a preservation
‘master’ and a preservation ‘backup’. DVD-R/DVD+R are recognized
"Write-Once-Read-Many" formats which ensures the integrity of the files
placed on that disk.
An
additional set may be created for reference/public access which may
exist online, if permissions/rights issues and internal policies allow.
The preservation master DVD disk is to be stored in appropriate offsite
archival storage. Technical information regarding creation date of the
disks and software used for disk creation is to accompany the disks.
Digitization
projects with preservation master storage requirements over 250
gigabytes should consult with the SIA IT Archivist for preferred
storage medium details.
Important
No labels are to be affixed directly to the DVD disk. No markers are to
be used to label the DVD disk. Both of these activities have been
proven in national studies to degrade the archival life of the DVD.
DVD
disks are to be stored in ANSI certified inert sleeves separately from
paper and in an upright position. Labels are to be affixed to the
outside of the sleeve to avoid direct contact with the disk.
For how long can you store your archive?For CD's and DVD's, no one really knows. The best case proven scenario is 20-25 years, with evidence, both anecdotal and researched, that some of the the better quality early 80's CD's are still going strong. The best hoped for is 50 years, but the technology of optical media quite simply isn't that old,
yet.
Issues:
Obsolescence , unprepared for, can result in unreadable mediums and unreadable formats due to changes and updates in technological interfaces and hardware.
Some
physical threats can be addressed with proper storage and handling, as identified concisely by this
Digital Preservation Management Workshops and Tutorial. They also have recommendations for CD's and DVD's
as preferred storage mediums. For an amusing break, visit the
Chamber of Horrors: Obsolete and Endangered Media to see various computing storage media that went the way of the Dodo and the Great Auk.
Caring for Optical MediaIn the
December 2008 edition (PDF) of
Communications of the ACM, the
monthly magazine of the Association for Computing Machinery, Dr. Fran
Berman, director of the San Diego Supercomputer Center (SDSC) at the
University of California, San Diego, provided a guide for surviving
what has become known as the "data deluge."
His top 10 tips?
1.
Make a plan. Create an explicit strategy for stewardship and
preservation for your data, from its inception to the end of its
lifetime; explicitly consider what that lifetime may be.
2. Be
aware of data costs and include them in your overall IT budget. Ensure
that all costs are factored in, including hardware, software, expert
support, and time. Determine whether it is more cost-effective to
regenerate some of your information rather than preserve it over a long
period.
3. Associate metadata with your data. Metadata is
needed to be able to find and use your data immediately and for years
to come. Identify relevant standards for data/metadata content and
format, following them to ensure the data can be used by others.
4. Make multiple copies of valuable data. Store some of them off-site and in different systems.
5.
Plan for the transition of digital data to new storage media ahead of
time. Include budgetary planning for new storage and software
technologies, file format migrations, and time. Migration must be an
ongoing process. Migrate data to new technologies before your storage
media becomes obsolete.
6. Plan for transitions in data
stewardship. If the data will eventually be turned over to a formal
repository, institution, or other custodial environment, ensure it
meets the requirements of the new environment and that the new steward
indeed agrees to take it on.
7. Determine the level of "trust"
required when choosing how to archive data. Are the resources of the
U.S. National Archives and Records Administration necessary or will
Google do?
8. Tailor plans for preservation and access to the
expected use. Gene-sequence data used daily by hundreds of thousands of
researchers worldwide may need a different preservation and access
infrastructure from, for example, digital photos viewed occasionally by
family members.
9. Pay attention to security. Be aware of what you must do to maintain the integrity of your data.
10.
Know the regulations. Know whether copyright, the Health Insurance
Portability and Accountability Act of 1996, the Sarbanes-Oxley Act of
2002, the U.S. National Institutes of Health publishing expectations,
or other policies and/or regulations are relevant to your data,
ensuring your approach to stewardship and publication is compliant.