Poking away at the XLSX

Hope to be able to write my own filter that can handle Excel files in different programming languages. I will be updating information here as I find it (and have time to document…)

As always, StackOverflow has already got something interesting and there is a link to some sort of reference site which seems to require a login. So … yeah, more open, but not quite.

Points I have already found out from a simple table of data:

  • the .xlsx file is really a ZIP-archived tree of XML documents.
  • a single XML file called [Content_Types].xml sits in the main folder
  • next to this there are three folders, docProps, xl and rels.
  • docProps contains two files, core.xml and app.xml
  • rels contains a single XML file called .rels
  • xl is the most complex folder, with workbook.xml, styles.xml, sharedStrings.xml and 4 subfolders.

…any more information will have to wait. 🙂

This entry was posted in Just Me and tagged , , , . Bookmark the permalink.

3 Responses to Poking away at the XLSX

  1. Singapore Memory Project says:

    Hi Kheng Hui,

    We tried to look for your email contact but it does not seem to be available on your blog. So we are contacting you via a comment.

    On behalf of the National Library Board (NLB), we would like to invite you to pledge your blog to the Singapore Memory Project (SMP).

    We find that your entries about your exciting experiences would be a great addition to the Singapore Memory Project.

    We think your blog would offer a different perspective. Whether your posts are an account of your daily life or an expression of your thoughts, our project hopes to find a home for these memories so that it can help build a ground-up understanding of Singapore.

    If you believe memories are worth preserving, simply pledge your blog here: http://singaporememory.simulation.com.sg/Public/Pledge.

    The SMP is a national initiative started in 2011 to collect, preserve and provide access to stories, moments and memories related to Singapore. For more information about this initiative, you may wish to contact Mr Patrick Cher at patrick_cher@nlb.gov.sg or read the FAQ.

    Yours sincerely,
    Krishna

    [Simulation Software & Technologies (S2T) Pte Ltd. is the officially appointed vendor for SMP for the period Nov 2012 to Dec 2013.]

  2. Nikita Vorontsov says:

    Hello,

    Do you have any particular project in mind (with this Excel file filter)? Or, just a general purpose library ?

    I have been collecting file format specifications for some time already. Being fascinated by file formats (especially binary ones) I look for interesting applications like damaged file format recovery, file format conversion, e.t.c.

    So, in case of file format you are interested in – is it just “PK” (zip) archive ? Or is there something specific about it?

    For which programming languages are you trying to provide this filter ?

    Regards

    • icedwater says:

      Privet Nikita,

      Actually, I was only looking at Excel (XLSX) because one of my projects uses that sort of spreadsheet as a data source. I wanted to see what was in it so I could work with it in, for instance, C++ or Python.

      Of course, it is far more challenging (interesting) to attempt file recovery of binary formats like PDF, PSD and the old XLS, but I will attempt conversion at some point too. Zipped XML would be a nice way to ease myself into this.

      I figure a slim and fast C/C++ library can be plugged into various other languages later… and knowing the file formats will allow porting to be done in other languages like JavaScript.

      Regards,
      icedwater

Leave a Reply

Your email address will not be published. Required fields are marked *