The devil in the metadata

Pub date November 16, 2006

The Rules Committee of the Board of Supervisors is considering whether or not the city should allow its departments to release electronic documents that include metadata. Although the Sunshine Ordinance Task Force has already hashed over the minutiae of this issue and ruled that metadata can and should be released, the mystery enshrouding what it is, and the lack of any specific policy or known precedent in other cities or states with public records laws has pushed the discussion upstream to where a formal legislation has become a possibility.
Freedom of information purists are saying all the parts and pieces of a document are part of the public domain, while the City Attorney’s Office is claiming another layer of protection may be required.
Metadata entered the realm of public discussion in San Francisco after citizens started making requests of electronic documents with a specific plea for metadata. Activists Allen Grossman and Kimo Crossman wanted copies of, ironically enough, the city’s Sunshine Ordinance, in its original Microsoft Word format. Grossman and Crossman wanted to use the advantages of technology to follow the evolving amendments the Sunshine Ordinance Task Force members were considering for the city’s public records law. These “tracked changes” are a common function in Word, and are, technically, metadata.
When Clerk of the Board Gloria Young received these specific requests for Word documents, not knowing what this “metadata” was or what to do about it, she turned to the office of City Attorney Dennis Herrera for advice.
Deputy City Attorney Paul Zarefsky initially gave oral advice to Young, and when pressed by the Sunshine Ordinance Task Force, issued a five-page memo in response, arguing that release of documents with metadata could pave a path for hackers into the city’s computer system, render documents dangerously vulnerable to cut-and-paste manipulation, and invite another unwelcome burden of reviewing and redacting for city officials. Young followed his advice and proffered the requested documents as PDFs.
A PDF, or “portable document format,” is essentially a photograph of the real thing, and contains none of the metadata that exists a couple clicks of the mouse away in a Word document. Evolving changes can’t be tracked, and PDFs don’t have the same searchability that Word docs have. So PDFs of the Sunshine Ordinance that Young provided didn’t have the functions that Crossman and Grossman were looking for, and were utterly useless for their purposes.
“It’s 92 pages,” Grossman said of the PDF Sunshine Ordinance. “I can’t search it electronically if I want to find something. This document I received is of no use to me.”

Meta-what?
Before delving too deep into the intricacies of current city politics, let’s pause for a moment to note that you don’t need to be a Luddite to have no idea what metadata is. It sounds like some diminutive or ethereal version of the real thing. In a sense, it is.
Simply put, metadata is data about data, and grows with weed-like tenacity in the electronic flora of the twenty-first century. Common examples include the track an email took from an outbox to an inbox, details about the owner of a computer program, or the laptop on which a Word document has been typed.
Metadata becomes cause for concern when there is something to hide. Not readily visible, metadata requires a little sleuthing to reveal, but in the past it’s been used to uncover deeper truths about a situation. For example, attorney Jim Calloway relates on his Law Practice Tips blog a divorce case where custody of the child was called into question because of the content of emails sent from the mother to the father. The mother denied she’d sent the emails, though the father vehemently insisted she had. A court forensics investigation found metadata showing that, in reality, the father had written the emails and sent them to himself.
“Metadata speaks the truth,” Calloway writes. “My position has always been that a tool is a tool. Whether a tool is used for good or evil is the responsibility of the one who uses the tool.”
Lawyers have historically advised that metadata be fiercely protected. Jembaa Cole, in the Shidler Journal for Law, Commerce and Technology wrote, “There have been several instances in which seemingly innocuous metadata has wreaked professional and political havoc.”
Cole goes on to cite a gaffe from Tony Blair’s administration – a document about weapons of mass destruction was available on the government’s web site, which claimed the information was original and current. Metadata showed that, not only had the information been plagiarized from a student thesis, it was more than ten years old.
Cole urges lawyers to take an aggressive tack against revealing metadata, by educating offices about its existence, making a practice of “scrubbing” it from documents, and providing “clean” documents in PDF or paper form.
The city attorney’s office has taken a similar stance. Spokesperson Matt Dorsey told us metadata has been a part of the continuing education of the city attorney’s office. However, all past case law of which they are aware focuses on metadata in the context of discovery and “the conclusion of most state bars is that they have the obligation, under attorney-client privilege, to review metadata prior to discovery,” he said. “The issue of metadata is a relatively new one in legal circuits. It isn’t a brand new issue to us, but it is in the context of Sunshine,” said Dorsey, who maintains that metadata could still fall within the standard redaction policies of the public records act.
Terry Franke, who runs the open-government group Californian Aware, argues that “the city attorney needs to complete this sentence: ‘Allowing the public to see metadata in Word documents would be a detriment because…’ What?”
“From the beginning of this discussion the city attorney has never provided a plausible, practical, understandable explanation of what is the kind and degree of harm in allowing metadata to be examined that justifies stripping it out,” Francke said.

To the task force
When Grossman and Crossman were denied the documents as they’d requested them, they filed complaints with the Sunshine Ordinance Task Force. In their cases, first heard on Sep. 26, they argued there should be no concern that the text of Word documents could be manipulated – anybody with a gluestick and a pair of scissors could do that to any piece of paper. That had been a consideration when the Sunshine Ordinance was drafted, and why the city always retains the undisputable original.
Thomas Newton, of the California Newspapers and Publishers Alliance, who was involved in drafting the state’s public records law, agreed with them. “If you follow his logic, you can’t release a copy of any public record because, oh my God, someone might change it,” Newton told us.
Crossman and Grossman also pointed out that to convert documents from Word to PDF invites even more work to a task that should be as burden-free as possible. It’s a regular practice for the clerk of the board to maintain documents as PDFs because that preserves signatures and seals of ratified legislation, but to make it a policy of all departments could invite a landslide of work, printing out documents and converting them to PDFs – not to mention undermining the notion of conserving paper.
Also, translation software and the “screen reader” feature that a blind person might employ to “read” an electronic document, don’t work with PDFs.
First amendment lawyers also offered written opinions on the issue. “Some of the city’s arguments have no support in the law whatsoever,” wrote Francke. “The fundamental problem for the city is that it has no authority to legislate a new general exception of exemption from the CPRA (California Public Records Act), and that’s what’s being advanced here.”
“The city’s scofflaw position represents the status quo ante, the old law that used to allow an agency to provide a copy of computer data ‘in a form determined by the agency.’ The city’s position has been directly and completely repudiated by the legislature. If the city disagrees with the law, it should come to Sacramento and get a bill,” wrote Thomas Newton, general counsel for the California Newspaper Publishers Association (CNPA).
As for the hacker scare, Zac Multrux, an independent technology consultant was invited to the Sep. 26 hearing by task force member Bruce Wolfe to speak about the dangers of metadata. He suggested a number of technological tools that are available for purchase or are free online, that will “scrub” metadata from documents. He said that while it’s true that someone with ill intent could mess with metadata, “I think someone would need a whole lot more than the name of a computer” to hack into the city’s system. “Personally, I don’t see it as a significant security risk,” he said.
It was also pointed out at the hearing that a variety of city, state, and federal departments already make Word and Excel documents available. Wolfe did a quick online search and found more than 96,000 Word documents on the State of California web site. “They’re not afraid to make Word documents public online,” he said.
Over the course of two hearings the task force found no basis for Zarefsky’s claims in either the city’s law or the California Public Records Act – both of which explicitly state a document should be released in whatever format is requested, as long as the document is regularly stored in that format or does not require any additional work to provide.
The task force found Young in violation of the ordinance and she was told to make the documents available in Word format. No restrictions or rulings were made for future requests, but task force member Sue Cauthen said, “I think this whole case is a test case for how the city provides documents electronically.”

What’s next?
As requested, Young had the Sunshine Ordinance, in Word format, pulled from the city’s files and posted on a separate server outside of the city’s system to be viewed. Crossman, noting the added labor and resources for that provision, wondered if that would happen to all public records requested in Word format, so he cooked up another request to test his theory.
He asked for all the pending and accepted legislation for the month of September from the Board of Supervisors, in Word format.
While the Sunshine Ordinance Task Force had found that withholding documents because of metadata was against the law, redaction of privileged information is still legally necessary, and Young continued to follow the city attorney’s advice that a PDF with no metadata was still the safest, easiest way to comply. She told us, “I don’t take their advice lightly.”
Zarefsky’s opinion said departments “may” provide PDFs instead of Word documents and that “metadata may include a wide variety of information that the City has a right — and, in some cases a legal duty — to redact. Young’s office does have pending legislation in Word format, she says it does not fall within the expertise of her staff to review and redact the metadata in those documents because they didn’t author them. “Since we don’t create the documents, how could we ever know whether the metadata should be released? We don’t know what it is,” she told us. “We couldn’t even hire expertise that would know.”
“I can’t imagine there’s so much toxic stuff in Board of Supervisors records they can’t let out,” Grossman told us. “This is a whole mystery to me.”
“It’s just data,” says Crossman. “City employees created it on our dime. Unless it falls under redaction discretion, entire documents should be provided.”
Young took the issue to the legislators who do draft the legislation, asking the November 2 meeting of the Rules Committee for further policy consideration. Miriam Morley spoke on behalf of the city attorney’s office, and said there was a sound legal basis for providing documents as PDFs, but that this was an evolving area of the law that the city attorney’s office wasn’t aware of until about 9 months ago. They could find no other cities currently grappling with the issue, but she said, “Our conclusion is that a court would likely hold a right to withhold a document in Word.”
The committee decided to research the issue further before making a ruling. Committee chair Ross Mirkarimi said he had been integral to the drafting of the Sunshine ordinance, and to rush a decision could be detrimental.
“It seems to me in the spirit of the Sunshine law this is something we should really look at,” Tom Ammiano said. It’s currently at the call of the Chair of Rules and no date has been set for the Rules Committee to hear it again.
A policy in San Francisco could set a real precedent for public records law, but according to many first amendment lawyers, for the Board to do so would be a violation of state law. “I know of no other city, county, or subdivision of state government or state agency that’s disregarding the clear intention of the law as some elements of San Francisco city and county government are planning to do,” Newton told us.
“It’s a debate that can’t really occur outside of a proposal to change the state law,” he said. “The Board of Supervisors can’t pick and choose which law to comply with,” and he said the state’s constitution and public records act trumps the city, which is reading the law too narrowly. “They’re required to give a broad interpretation of this access law. If they don’t like it they should come to Sacramento and get a bill,” he said.
“I think a lot of city departments, and policy and advisory bodies can save themselves a lot of headaches by declaring as policy that they will provide documents in their original formats,” task force member Richard Knee said. “With metadata.”