Statisticians, like artists, have the bad habit of falling in love with their models —  George Box

Frijtening fears of data security

Controversial economist Paul Fritjers is always a lively and thought provoking read. Recently at Club Troppo, he has posted on his top five economic reforms that make’ good economics in the sense of being in the interest of the long-run welfare of Australia.” One of them involves the ABS….

I have always found ABS phone staff pretty helpful and there is plenty of free stuff on their site that is reasonably easy to search once you get the hang of it. But, as someone who just paid the ABS $450 for 5 years of data on deaths broken down by exact age and gender, I am not presently disposed to feel kindly towards them. The births are totally free but not the deaths! The good news is gratis, but you have to pay for the bad news!? Moreover, it seems to me that the external spill-overs from open information are great enough that all the data should be provided free. That is not Paul’s main gripe however.

Paul wants to turn the ABS over to private providers. Here is what he says:

Reduce the budget of the Australian Bureau of Statistics by about 90%, reducing it to merely being in charge of running the Census, and instead commission private providers of statistics to generate surveys of Australian businesses and the population. This would involve a quick reduction of around 300 million a year in expenses and would immediately improve the data available for economic decision making.

Why does he think private providers would improve data availability for economic decision making?

The rational for cutting off the ABS is that it is completely secretive about the data it gathers: only ABS officials are trusted with using the full data by the ABS, not other government departments or Australian researchers. We are thus in the fairly ridiculous situation that those who devise the Australian budget in the Treasury do not have access to all the data gathered on the finances of individual industries. The ABS hides behind laws promising confidentiality to prevent anyone else from using its data, but similar laws on secrecy exist in other countries that have not been interpreted as ‘only people in our statistics organisation can be trusted’. Quite simply, the ABS has turned into a secretive rent-seeking organisation that draws huge subsidies but does not feel obliged to share its products with its paymasters. Why then should the Australian public pay for data that is not used to improve our knowledge of Australia? It might as well not exist and if it didn’t exist, the community would be free to buy data from other sources that are more consumer-friendly.

I have an anecdote about my own experience with the ABS and an opinion on why I think privatization is not the answer. About three years ago, I was giving a talk at the ABS in Brisbane. It was actually a talk about statisticians publicly engaging more. Part of the talk involved linking to this modest blog.

I arrive 45 minutes before the talk and managed to get through security. I then passed my USB containing the powerpoint slides to the fellow who had invited me. There was a look of horror followed by embarrassment. “You can’t use that in here” he said gravely. “The IT guys won’t allow it.”

All the PC’s in the building had software installed to prevented the use of USB’s. Apparently, I was supposed to have sent my slides to the IT department who would scan it and upload it to the presentation PC themselves. Their gravest angst was spyware entering their database. This single fear of data security seemed to have supplanted their main mission which is data collection and distribution.

Getting the files to the IT guys was not at all easy either. It was not a matter of walking down to the dungeon. But I will spare your the details.

So the talk did go ahead after a stressful beginning. And you can imagine what happened when I tried to link to Fishing in the Bay during the talk. The PC nearly crashed!

So Paul is absolutely right about the paranoid, obsessive privacy culture of the ABS. It is virtually impossible to get unit record data. The issue though is where that culture comes from and whether privatization would be the solution.

It is not just the ABS that is paranoid about privacy. Try ringing Telstra or Yarra Valley Water to check your bill. If the account happens to be in your partner’s name, you are likely to hit a wall. Imagine what someone could do with the personal information of my water consumption! The really annoying thing about Telstra’s cynical sham of a privacy policy is that they sold all our records to India years ago, which is why marketers can ring you up at dinner time and know your first name. Telstra have not kept up with the hackers, so they  make you think they are a data fortress by refusing to tell your details even to you!

Don’t just blame the companies though. Australian privacy legislation is pretty strict. And I think it has public support. Remember the crazy panic over the Australia Card 25 years ago? There is an unarticulated public fear of information security and privacy. And I think it has recently increased because of the quite real risk of identity theft and credit card fraud. Though people seem to forget that credit card fraud is mainly a problem for the banks, as the customer is not liable for the losses.

So, what about privatizing the ABS? There would be immediate howls of protest about privacy. The government could only outsource this function if they responded by imposing restrictions on data security that would possibly be even more extreme than the Brisbane branch of the ABS. We would end up in a worse situation than we are now, I fear.

In subsequent comments, it appears that Paul has something different in mind - that Government departments and research institutes take responsibility for dissemination, rather than the ABS. He contrasts his experience with obtaining data from the The Longitudinal Study of Australian Children which is not controlled by the ABS, though they collect the data. Virtually all data is made available to someone from a reputable institution who is prepared to sign a confidentiality agreement.

Which seems to imply that

* there is nothing about government provision per se which leads to poor service and obsession with secrecy

* there is nothing in privacy legislation that prevents the ABS from giving out unit record data.

Perhaps he is right about an entrenched culture at the ABS being specific to the ABS. Ho9w about htis for a simpler solution: the responsible minister (the assistant Treasurer) rewrites the ABS privacy policy using the policy of LSAC as a template?

 


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

AddThis Social Bookmark Button

6 Responses to “Frijtening fears of data security”

  1. Rachel Barker Says:

    Disclaimer: I am employed by the ABS but I do not speak as a representative of the ABS.

    You’re quite right that there’s nothing in general privacy legislation that prohibits ABS giving out unit record data. There’s something in ABS-specific legislation instead! In particular, the Census and Statistics Act (1905).

    Section 13 (Release of information) (3)
    Information of a personal or domestic nature relating to a person shall not be disclosed in accordance with a determination in a manner that is likely to enable the identification of that person.

    The same point is revisited a few times in the legislation. Nothing and No one sees identifiable unit record data given to ABS unless they actually belong to the ABS themselves, on pain of massive fines and jail … the ATO have similar penalties for staff violating confidentiality, but no other government department is that strict. It’s not ABS policy. It’s the law. And it’s been the law since 1905 (although other parts of the act have been amended in the meantime.)

    There’s a lot of ongoing effort to make useful data available that does not clash with the confidentiality requirement. We have a program of creating Confidentialized unit record files - where the categories are broad enough and the odd exceptional unit has been removed or modified, that there is no reasonable prospect of ever working out who supplied a given record or deducing something about someone you already know based on what’s in a given file or publication (i.e. we can’t admit that all plumbers in XTown earn $60-$80K because then you would realise that Joe who is a plumber in XTown must earn that amount.) Stiff constraints and we *know* it reduces the utility of the data. But confidentiality *is* sacred.

    Confidentiality is very deliberately emphasized in ABS culture, in the courses that new starters do, etc. We believe it’s key to ABS’s internationally enviable response rate - people will fill ABS forms in, and fill them in truthfully, and part of that is because they trust the ABS. So we get 97-98% response rates on most of our surveys. (Varies by survey - see the Quality Declaration.) Private sector surveys are lucky to get 20% response rates.

    Data security is part and parcel of the same thing. I know it’s a sod getting some data past the firewall - email is usually the easiest way for generalized stuff like presentations - but having a data-stealing virus loose in ABS would be about as bad as setting off a nuclear weapon in the office; who would trust us with their survey responses after that? The security efforts work: I’ve worked here since late 2003 and never lost a minute to viruses.

    Oh - and why you got charged for death data and not birth data? I think that’d be because the data we have already compiled is free, but the data that requires someone to do new work - to run a new analysis over the unit records, and check for publishability - is ‘cost-recovered’. I’m not 100% on our charging policies though.

  2. Chris Lloyd Says:

    Hi Rachel,

    Yes - I suspected that there might have been some specific legislation for the ABS. In this case, I am unclear how the LSAC data, which according to Paul is collected by ABS, is also distributed pretty freely. I imagine that this data would be more sensitive than your basic census data.

    Are not the high response rates driven by the census being compulsory? Also, how do you know that people fill it in truthfullY?

    You might be interested in posting comments on the Club Troppo discussion.

  3. Rachel Barker Says:

    I don’t think the LSAC is collected under the auspices of the Census and Statistics Act. I think ABS’s involvement is much more an advisory role - we do know how to design a good sample, and allow for losses between survey waves, and check data to find out where people have told us unbelievable things, and stuff like that. A brief search on “Longitudinal Survey of Australian Children” turned up other government websites before ABS:
    http://www.aifs.gov.au/growingup/
    I’m not sure any of the numbers are coming through ABS at all!
    But I’m not working on the LSAC myself so I could be wrong about how much we’re doing.

    That the Census and Statistics Act also gives us a stick to threaten respondents with (fill in this survey or by fined $110 per day until you do), undoubtedly also has a lot to do with our response rates. But we only actually do that a few times a year, across millions of survey forms.

    The census response rate is about 99%, and the missing 1% tend to be people it’s hard to find to give the forms to in the first place! (We run a “Post Enumeration Survey” where a selection of smaller areas are gone over with a really really fine-toothed comb, to find out who we’ve missed.)

    As for truthfulness … we run checks for internally contradictory or otherwise unreasonable data (which does trigger on a few genuine edge cases; according to one common check, it is not possible for teenagers to become parents any younger than 15.) We check business responses against information from previous months and other auxiliary data. We take a good hard (human) look at any units with unusual combinations like small employment and large turnover. There are also signs for faked or approximated data - if it ends in lots of zeroes, say, or if there are too many sevens (for a higher quality human-generated fake)

    But past all that we can’t tell a liar. If someone’s supplied us with highly plausible data - it just isn’t going to distort our final numbers enough to notice; when the final figure has a Relative Standard Error of 5% (and that’s a good estimate), a few plausible liars can’t pull the number askew more than some tiny fraction of that 5%. So who cares? Our budget is stretched like anyone elses’s …

  4. David Steel Says:

    ABS does make data files available for analysis on a similar basis as the LSAC, see http://www.abs.gov.au/websitedbs/d3310114.nsf/home/microdata+entry+page

    These are predominately surveys of people not businesses, although there is a data file for a Business Longitudinal Database for a panel (cohort) of small and medium businesses over time, and it includes both characteristics and financial data.

    The problem for most business data is that for it to be useful it has to include the large businesses. Such businesses will often be readily identifiable in a unit level data file. My experience when I was in the ABS working on surveys of businesses many years ago is that the big guys are very sensitive about who has access to data about them. Any suggestion that ABS would share this data at the micro level would affect their cooperation.

    I should declare that I am a member of the ABS’s Methodological Advisory Committee and ABS funds a research professor in the Centre for Statistical and Survey Methodology and I am the Director of this centre.

  5. Chris Lloyd Says:

    I am not sure either about the extent to which ABS collects the LSAC data. The claim was made by Paul Frijters in responding to my comment 27, arguing against his idea. I think the comments here have clarified the issue though. The Census act seems to be dracionian. Therefore, there would be advantages to collecting and disemminating data outside the ABS. Unlike Paul, I am not blaming the ABS.

    While I enjoy comments on this blog, I really think it would be helpful for knowledgeable people to post additional comments on Club Troppo which has a regualr readership fo 1000’s.

    BTW: Your comment that “… having a data-stealing virus loose in ABS would be about as bad as setting off a nuclear weapon in the office” rather does prove Paul’s point about secrecy being given perhaps too much weight. Yes, it would mean some bad press, but really…

  6. Rachel Barker Says:

    I think it’s better for an official ABS representative to speak over at Club Troppo. There are guidelines we have to follow as members of the public service, so that one individual’s opinion is not mistaken for the official position of the organization as a whole … in this blog, with its small and clueful readership, I can expect people to notice my disclaimer and understand the difference. At club troppo I don’t have that faith. Besides, many of my points have been made already.

    Bad Press over leaked data would be a permanent problem for the ABS. Big businesses wouldn’t want to give us their data any more. Little people wouldn’t want to give us their data any more. Our response rate would tank, our data quality would tank, and in two decades we would still be struggling with the consequences. (Still have response rates 10-20% lower than they would have been otherwise.) Destroying the office premises would mean we missed a few publications and lost a few days work that weren’t on the off-site backups. (Maybe not even that much.) And we’d have to spend money on new desks and computers. We’d be back to normal in about three months. Destroying the office *and* most of the staff would take more like five years to recover from - fortunately we do have off-site and former and part-time staff who could rebuild the knowledge and capabilities.

    Data theft vs nuclear bomb? Losing most of our staff probably would work out worse. But … if I’m exaggerating a little (and this is me talking, not ABS), the point remains: yes, we treat data security as extremely important. Would you want it any other way?

    Imagine if the Tax Office records were leaked into the public domain … the prospect for massive identity theft and fraud would be enormous. Well - some ABS data could also be turned to that kind of misuse. So we’ve set our confidentiality efforts to a nice high level with a good safety margin.

    Guys, post-code and birthdate (including year) will uniquely identify a sizeable fraction of the population. What looks anonymous at a casual glance can be broken by someone with malicious intent and other sources of data. We can’t release to the public anything that can’t stand up to a malicious and resourceful attack … and we have to think about the resources that will be available in ten years time.

    We can make more detailed unit record data analysable through Remote Access methods - you send your query to an ABS computer, it runs the query over the unit record file, it checks the results against confidentiality criteria and sends them back. Unit records never leave our servers. We can make more useful stuff available that way partly because if someone figures out an attack against it we can close the loophole … which we can’t do for a “confidentialized” file on someone else’s computer.

Leave a Reply