Call me naïve, but I’m constantly surprised at how little most organisations seem to know about their own information. Like how much they’ve got, where it is, what it is…. But that’s good news for us, as we’re often asked to carry out surveys or inventories of electronic and/or paper documents and data, usually as part of a wider information management piece, and they always provide a fascinating insight (for us as well as the client!) into each organisation’s business and how it’s carried out.

There are many reasons why you might want to carry out an inventory, for example:
• to identify vital business information (especially records)
• to support information governance, eg DPA, FOI or RUPSI requirements
• to support the migration of existing documents, records or data into a new environment
• to investigate possibilities for system and process integration
• to support the easy production of an Information Asset Register
• to inform file plan or taxonomy development
• to inform the development of document templates
• to plan for future storage requirements.

I’ve just completed an inventory for an organisation of about 700 people, and almost all of the above apply, so the inventory is pretty extensive and includes over 80 questions for each ‘collection’. The main difficulty has been identifying just what a ‘collection’ is in this context, as there are business documents such as you get in every organisation, but also extensive data sets, databases and publications which may all have a relationship to each other.

Our approach is to identify the teams (that in itself can be a challenge!), and create a profile of the team in terms of their function, but also in terms of the kinds of documents and records we can see that they use by analysis of file shares. This gives us a very preliminary view of the potential scope of each team’s content. Interviews with the teams will then reveal ‘collections’, ie coherent bodies of content used for specific business purposes. Sometimes these will be distinguished just by the business purpose, but sometimes format or location will also be a factor.

When the inventory has a relatively straightfoward and focussed purpose, it is very useful to send out the questions beforehand, but you have to be sure of your clients and their content before you do that. If I sent out my 80 questions for the complex inventory, I think people would run! So interviews are vital, but quite often people don’t know all the answers themselves, and you need to talk to other members of the team to get the whole picture.

We use InfoPath forms to record the information and create tailored reports so it’s easy to find out how many of this and what kind of that, otherwise you can spend hours analysing each questionnaire. The questionnaires and profiles are then all stored on a SharePoint site that is open to all to look at and query (but updating of the information needs to be carefully controlled, obviously).

Carrying out an inventory can take anything from 2 weeks to 6 months depending on exactly why you’re doing it, how complex the content is, and how many people you have working on it. Another contributory factor is how well people know their own information (and no-one is going to be able to answer the question ‘how many spreadsheets do you have?’), so the more you can do for yourselves by using tools to analyse file stores, for example, the better. We’ve developed tools to count numbers of documents and analyse formats, and that helps a lot. And when calculating how long it will take, don’t forget to add in the time it takes to set up the interviews, and to set them up again when people forget or cancel.

As well as the documentary evidence (ie the profiles and questionnaires), we always provide a report which highlights the main findings of the inventory, and there’s always something in it which surprises the client, whether it’s the sheer number of documents per person, or the number of Excel spreadsheets across the organisation (an important consideration if you’re thinking of migrating them to SharePoint, as it really doesn’t like macros at all), or the number of different repositories, or the fact that so many key assets are stored by a third party (fine if they’re archived material, but less good if they’re current IP). Which goes to show that inventories are extremely useful things.

Well done to JISC for the first deliverable of their ‘Measuring the impact of records management’ project, a ‘selective’ literature review as well as results of a supporting survey. The report concludes that “there is both a need for, and a current lack of, reliable evidence to demonstrate the tangible benefits of investing in records management”.

The records management profession’s long over reliance on a raison d’être of compliance over business benefit has, I think long been a big factor in its marginalisation. The organisational focus on the bottom line has long been with us and is set to stay. We in the RM/IM profession must get finally get to grips with this and be able to demonstrate tangible financial benefits in addition to pointing to a welter of regulation, an approach tantamount to telling the boss ‘do it because I told you to’.

I commend the report to you (www.jiscinfonet.ac.uk/records-management/measuring-impact/literature-review/cost-function) and look forward to the next deliverables of the project, a ‘Records and Information Management Impact Calculator’ and ‘Records Management Maturity Model’. Both of which should be of considerable benefit to the profession.

The new features of SharePoint 2010 are really starting to come now. Not only from “official” sources but from the “unofficial” ones as Microsoft starts to ramp up the marketing campaign.

Just some of things that look interesting already…

  • Designer things: the Office 2007 ribbon will used in SP 2010, Silverlight support, richer themes, easier web site editing, an update of SharePoint Designer.
  • Data things: improved Excel services (so I’ve been told), Visio Services, Access Services (aargh!), improved BDC (now called Business Connectivity Services).
  • Administrator things: a streamlined administration interface, best practise analyser, better reporting and logging, various database improvements.
  • Developer things: a new version of Visual Studio - finally aligned with SharePoint, plus various other useful stuff.
  • And last but by no means least - decent migration facilities from SP 2007 to SP 2010.

Of course there are some things which are conspicuous by their absence…

  • Records Management: no sign of it either being killed or made useful.
  • Metadata management: no mention of it other than a rumoured “Content Type” syndication.
  • I’ve also heard that the next version of Windows Workflow Foundation won’t be finished in time for SP 2010 – so the old workflow engine will be used…

In general it looks like SP 2010 will offer some new and useful things - we will be deploying it the moment we receive it via out partnership programme subscription. Watch this space.

Tips of MS tool use for multiple computer working, specifically Mesh and OneNote - (Future of Information Work)

Possible outcomes of the Facebook buy out of Friend Feed (Steve Rubel)

Avatars in Work – report by Gartner - the 3D web comes a step closer in work

True long term data storage issues getting a decent airing, as reported on CNet

Trying out Google Caffine – new tools to test out how your site ranks on google.

What else are we trying out this month – Posterous, Tumblr, Brightkite and Foursquare.

And what are we reading in the real world? Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran and Jeff Hammerbacher.

House Plan

I recently got round to reading a book that’s been on my to-read list for several years - How Buildings Learn by Stewart Brand (Stewart is a very interesting man, with many novel ideas about many things). The reason I wanted to read the book, apart from it being intrinsically interesting, was to see if and how much information architecture (IA) could learn from building architecture (BA). BA has been around for millennia unlike IA so I imagined there would many lessons that could be transferred. Surprisingly I was completely wrong!

The BA Stewart describes (which may not be wholly representative, but does seem to be right from my non-BA view), is one with no real regard of the three dimensions of any IA – users, content, context. Stewart describes in detail the mistakes building architects make when designing buildings. These are too numerous to mention in a blog article – you need to read the book! – but two themes do emerge from his response to all these mistakes. These are basically:

Assume change - design for maximum flexibility, scalability, adaptability and so on. The client may think their environment won’t change, but they are always wrong.

Don’t over design – keep the architecture as simple as is useful. The client may want the architecture to be perfect, but good enough is usually more than sufficient for their requirements.

Of course both these themes are ones which should sit at the heart of any information architecture.

Conclusion: Building architects need to be reading books on information architecture…

P.S. Stewart Brand is the person who coined the phrase ““Information wants to be free”.

Metataxis has been doing quite a lot of file plan development work over the last few years, and currently all of it revolves around SharePoint. Marc has written elsewhere about the need for information architecture with SharePoint, and we’re finding that file plans can actually work quite well in that environment, but whatever application is used, there are always lessons to be learned. Here are some of the ones I’ve found, from the blindingly obvious to the more obscure.

Starting out
- Getting buy in from senior managers takes far more time than you think, but needs to be done.

- Understand all the implications of the electronic environment that the file plan will live in. For example, if it’s just a set of Windows Explorer folders, how will you add metadata at the folder level; if it’s a SharePoint environment, nested sites run the risk of breaking the URL limit of 255 characters and may skew the relevance ranking of search. And the metadata at site/document library level is still an issue.

Development
- Make sure, as far as possible, that all staff have attended a briefing session (preferably at least introduced by a senior member of staff), and have been given clear information on the timetable.

- Have a good idea of your top terms before you start working with users (ideally by working with senior managers to agree them), but be prepared to alter them as necessary as the file plan develops.

- You can never consult too much with the users in the development of the file plan.

- NEVER be tempted to think that you can develop the whole file plan (or indeed very much of the file plan) according to organisational structure. It sounds like a logical starting point, but as soon as you get into it, it becomes very clear why that doesn’t work (eg folders repeated for every team relating to Parliamentary Questions, or invoices, or contracts).

- If possible, trial development with a ‘friendly’ team, so you can:
    - improve the way you intend to work in the workshops and your workshop materials;
    - have a better idea of potential issues.

- It saves a lot of time if you can review users’ current folder structures and put together a dummy file plan for discussion, rather than starting from scratch with each team.

- Make sure you’re using the language of the users, not of the file plan builders.

- In some ways (pace TNA and Records Managers) it almost doesn’t matter what the file plan looks like so long as everyone understands the rationale behind its structure, and uses it.

- Two-dimensional file plans (ie those being implemented just as a set of folders) need at least 6 levels.

- Your timetable for meeting with teams may look good when you draw it up, but be prepared to revise and revise again when teams cancel, or when only one person shows up.

- It’s tempting to put areas for each team’s local administrative documents under each management function. DON’T! For example, each team may have its own administrative files relating to internal procurement, office space and absence. There’s certainly a logic to putting to putting each team’s documents under the top classes of, for example, Finance, Estate management and HR, but this means that under each top class the file plan needs to have classes and folders for each team (and yes, I have seen this done – the users hated it). This means that each team’s administrative files are split across the file plan: it is more comfortable to have a top term or at least high level class for Team Administration (or similar concept), where all of each team’s admin files are grouped together.

Implementation
- Training the users requires significant effort, must be done well, and will need refreshing.

- It helps greatly to have a designated person in each team (an Information Officer, or similar) whose role it is to act as liaison on file plan development, but particularly to become ‘expert user’ so they can advise team members.

- It also helps enormously to have a means to explain to the users what each folder (or equivalent) is for – so a description that they can see when they hover over the folder title is very helpful.

Maintenance
- The work doesn’t stop when the first draft of the file plan is complete: there will always be review work in order to improve the structure and to take into account new units, projects, legislation, events, etc.

- It is important to have a central governance function (ideally a File Plan Board made up of representatives from across the business) to control the top three classes so that the file plan doesn’t suddenly start sprouting unnecessary branches.

- Regardless of how you implement the file plan, it’s important to have a tool which gives you an easy overview of the whole file plan, with the ability to expand and collapse branches. Using Excel can achieve this, or you can incorporate it in your organisational taxonomy management tool, if you have one. It’s even better if you can use the tool to provide each team with the parts of the file plan that are relevant to them on a daily basis, especially if that includes the descriptions of each folder that I mentioned earlier.

These are just a few of the things I’ve realised – additions from anyone else out there grappling with the same issues will be most welcome!

We’ve all been crazy busy recently, but we’ve managed to fit in a bit of reading! Here is a selection of the stuff we have found interesting of late:

We followed closely the launch of Lord Carter’s Digital Britain report. Nowhere covered it better that the Guardian with its minute by minute updates.

Life Hacker’s excellent tips on being a web commuter - hit a chord with Metataxis - there is not a week goes by when we don’t face some of these challenges!

A couple of Google announcements caught our attention - firstly, their desire for Chrome to not just the web, but help grow it, and secondly their new Labs experiement with data visualisation - definitely worth a look.

Reviews of Bing.com seem to have it doing much better than I had expected - apparently its second week saw its market share grow.

Finally - why Gen Y never complains about information overload.I find debates on this subject endlessly fascinating - my life is very information-rich, and yet I rarely (I would type never, but don’t want to tempt fate!) feel overloaded - in fact I often go in seek of even more information based on that which I receive. Penelope Trunk has some interesting thoughts (as always).

I have also spent some time (meaning more than I should…) playing with VisuWords - an online graphical dictionary. At first I thought it was quite cool, but failed to really engage with it, then when I discovered that the GUI meant you could drag the terms around, rearrange the word relationships and generally play around with the more complex webs I was much more interested - try searching for ‘information’ to see what I mean…

Later today, Lord Carter (the communications minister) will publish his final report into ‘Digital Britain’, covering the Government’s action plan regarding broadband for all, tackling Internet piracy and radio switch over from FM/Am to digital.

Good debate around the role of Internet Service Providers in addressing piracy - lack of willingness on the side of the ISPs and a suggested approach from the music, film and other creative industries, is not making either side happy. However, I seriously doubt that this report will offer solutions with any bite!

An additional observation from us - how many of the recommendations/plans will be implemented since the announcement by Lord Carter that he is stepping down from Government? We sincerely hope this isn’t another report destined to reside in a dusty corner un-acted upon…

On Friday 12th June I attended a Google enterprise search event. This was basically a sales pitch for their new GSA (Google Search Appliance - see http://www.google.co.uk/enterprise/gsa/). The GSA is is essentially Google in a box, for use with an organisation’s private content. It was interesting for a few reasons - only some which were what Google intended I’m sure. 

Firstly the event was badly organised - too much pointless queueing, a late start, speakers barely audible, my chair broke! Google gave the impression that whilst rich and smart, they did lack the professional finesse one normally gets at these kind of events. It made me think is that what they are like corporately…

More seriously, the GSA itself is impressive, in that typical brute force way Google can be (if you gave me that amount of disk and ram I could get lots of things done fast).

The main thing I found interesting was the role metadata plays in their search. They weren’t exactly dismissive, but of course they think it’s much less important than their full text search. However when a practical example of using the GSA was presented by one of their clients, it was clear that the GSA was very good at recall (see http://en.wikipedia.org/wiki/Precision_and_recall) but this wasn’t enough for the organisation. They needed more precision in the search results which was achieved using metadata. This completely confirms my view of Google’s approach. Google is great if you are on the Internet searching for a new TV and don’t mind getting 58 million results. If you are at work, that just isn’t good enough. At work precision is more important.

Google further undermined their own implicit philosophy that search is all, by showing one of the most well liked pieces of functionality in the GSA - their equivalent of the SharePoint “best bet”. i.e. if you use these search terms, always return this specific document/s above all other search results. These best bets replace the sponsored links found at www.google.com, as of course within an organisation these would not be relevant. The best bets are manged wiki style - anyone can add or remove them. Google claimed they were therefore self-policing, which sounds entirely plausible. The key comment made by Google was that these best bets often resulted in giving the user exactly what they wanted without the user having to look at the search results proper.

The summary of all this is that at Google the search is king, but if you are inside the firewall - metadata and user selection is the only practical way to find what you are looking for. 

Information Science In Transition Cover

Metataxis Associate Alan Gilchrist has a new book out, which we’re currently reading. It’s called Information Science in Transition, from Facet Publishing. 

Extracts from Facet’s website: “Are we at a turning point in digital information? The expansion of the internet was unprecedented; search engines dealt with it in the only way possible - scan as much as they could and throw it all into an inverted index. But now search engines are beginning to experiment with deep web searching and attention to taxonomies, and the Semantic Web is demonstrating how much more can be done with a computer if you give it knowledge. What does this mean for the skills and focus of the information science (or sciences) community?”

[Alan has edited] “a collection of essays written by some of the most pre-eminent contributors to the discipline. These peer reviewed perspectives capture insights into advances in, and facets of, information science, a profession in transition. This monograph previously appeared as a special issue of the Journal of Information Science, published by Sage. Reproduced … this important collection of perspectives on a skill in transition from a prestigious line-up of authors will now be available to information studies students worldwide and to all those working in the information science field.”