February 7, 2018

A note about the archive files

The current Bay Guardian Archive a snapshot of an ongoing project, and reflects an effort to collect, digitize, cross-reference and present Bay Guardian stories from editorial sources that were never meant to cohabitate. Or at any rate, these sources—which comprise SQL backups, .doc files, images, spreadsheets, .html files, CMS files, PDF scans, and physical papers—have never before collected or presented in the same format. What we’re hoping to accomplish is to present the entire run of the Bay Guardian’s 50-year publishing history in as complete an online format as possible.

This project started immediately after the Guardian’s sudden shutdown in October 2014, with server and editorial prepress files speedily backed up and preserved by Leighton Walter Kille, without whom most of what you’re reading now would probably be permanently lost. As of January 2015, the Bay Guardian‘s offline document archives consisted of coverage of the paper’s output from 1987 to 2014. In all, 68,754 files in 1,400 folders, or 1.56GB of space.

Those 1987–2014 files are in several kinds of formats:

SQL .bak files from sfbg.com’s Drupal 6 database (2006–14)

At the time of its shutdown in 2014, sfbg.com ran on Drupal CMS. Most of the stories from the old website were preserved in the SQL database backups pulled from server files. Those backed up SQL tables held stories and blog posts from 2009–14. Phase 1 of the archive project has included examining and rebuilding as many of those stories as possible, moving them to this WordPress-based website.

Progress: We’ve recreated nearly 28,000 stories from sfbg.com, and have been able to preserve much of those stories’ metadata, with a couple hiccups. In some cases we’ve found errors in matching up correct attribution tables to writer names, and in other cases the original story utilized a nonstandard image gallery and we’re still working on identifying and reconnecting those images to the stories.

Word .doc backup files from sfbg.com’s pre-CMS and pre-web era (1987–2006)

We have editorial process files in the form of Word .doc files used internally by staff that reflect stories in their penultimate, pre-press stage. The archive of those files represents nearly all of the stories from 1987 to 2014, but we’ve been concerning ourselves most with the period of 1987–2006, where we didn’t already have access to digital versions of the stories.

Leighton’s notes below give more detail about the various formats and states of these files. Here’s the tl;dr version: They’re a bit of a headache.

We’re presenting them here in a barely edited format with this warning: what you’re reading reflects the state of the backups for these stories, which in many cases won’t match up 100% to the version that is in the final printed edition, which you’ll be able to read as a flip-through PDF edition.

Progress:  This phase of the project is, well, it’s a slog. Because these old files reflect a whole array of organizational concepts, format conventions, and editorial workflows, they have to be handled the old fashioned way: one at a time by humans.

Existing PDF files and scans (1966–76, 2006–14)

In the later years, PDFs of the Bay Guardian were created as part of the prepress workflow, and many of those have been available at Issuu.com. We’ve copied these PDFs over to our collection at Archive.org. These PDFs are now available in the archive as searchable flip through editions.
To this PDF collection, we are adding scans from extant printed copies of the paper from Tim Redmond’s collection.

Progress: We currently have all the issues from mid-2006 through 2014 available in PDF form, including many of the paper’s one-off special issues. We also have posted the scans of all the issues from volumes 1–10. Scanning on the other issues continues.

Physical copies of the Bay Guardian to be scanned (1976–2006)

Progress: We’re fortunate to have all the issues of the Bay Guardian in hard copy form. It will take some time, but we are working on scanning and OCR processing of all these issues, and will make them available here as they are ready.

For more detail on the .doc file archives, here are selections from Leighton’s notes:

Backup files were saved by copydesk immediately after documents were shipped to the production department. Initially these were saved on 5.25″ floppy discs (shifting from CP/M to MS-DOS), then 3.5″ DOS discs, Zip discs, and finally the organization’s various servers. The early files were in WordStar (up to V25-n05) and Word for DOS (V25-08 to V26-n50) formats. Then it was Word 2 and 6 (both Windows 3.1), Word 7 (Windows 95), and 2000, XP, and 2003. (Angry note to the future: Word 2007 and 2010, with their jackass “.docx” format, can go to hell.)

In production the files were laid out and then either printed out and pasted down on boards (until the mid-1990s) or transformed into PDFs and sent to the press (2000 and beyond). As of the late 1980s, the Guardian used QuarkXPress to lay out the paper, and in 1992 the Word files were fully stylesheeted in parallel with the Quark templates. The stylesheets, fonts, and text attributes were converted into XPress Tags using an in-house system of macros (written by yours truly), and the files were poured directly into Quark. In 2002 the Guardian switched to InDesign, and the same system was used, upgraded to handle Adobe’s tagging language and font standards. The initial set of macros was written in WordBasic (Word 2, 6, and 95) and upgraded substantially over the 10 years they were in use (1992-2002); they were then converted and upgraded to Visual Basic (Word 97 and beyond), and that version strengthened during its life (2002-2014).

It’s crucial to understand that these files, even when produced during periods when the Guardian staff was at the top of its game and entering final corrections prior to shipping, are not absolutely definitive: Additional changes could still be made on the boards or online, including headlines, facts and figures; paragraphs could be cut or substituted. Moreover, stories shipped to production could still be delayed or even killed, so their presence in the Word archive does not necessarily mean that they were actually published on a particular date, or at all it’s extremely likely, but not 100 percent certain. 

Notes:

    1. Volumes 21 and 22 have no issue 52. In those years, the Guardian didn’t publish an issue between Christmas and New Year’s.
    2. Up through Volume 25, Number 5, files included extensive typesetting codes. Understand that the coding was rough to start with and during the typesetting and proofing considerable work was performed. So bask in all the missing heads, mangled formatting, and typos. But at least the text is readable in its original form, it very much wasn’t.
    3. Typeset files were often sent over bundled one file would contains several stories, intended to be separated and pasted up after printing. 
    4. As of Volume 26, Number 50 (September 16, 1992), the copydesk switched to Word for Windows 2 and began consistently applying stylesheets to files. Over the next year the system became progressively more sophisticated, and the copydesk eventually began entering corrections and reshipping the files to production. Consequently, prior to V26-N50, the files do not reflect the final published version; after, they are much more likely to.
    5. Volumes 38 and 43 have 53 issues.