Geekery: Receipt Log Using AppleScript, Numbers, YAML, and JSON

The coming end of the year means tax season, which also means having to gather all kinds of documents like receipts, statements, etc., to work on the annual drudgery of filing tax returns. This is especially true for those who like to itemize deductions (or at least try to), like me. Normally, state income taxes paid are deductible from U.S. federal tax returns. However, I live in a state (Florida) that has no personal income tax and therefore nothing to deduct. In such cases the government changed the rules a few years back to allow residents of state-income-tax-free states to deduct their state sales taxes instead. Thus, my need to hang on to, and scan, every single receipt I get (I scan everything that comes in to keep things as paperless as I can, but receipts are especially important). This is in addition to tracking medical expenses and other deductible items every year. I tried several applications to keep scans organized on my Mac, including Paperless, Evernote, Neat, and a few others. They're all good products, but each had their quirks that I did not like. So, I ended up rolling my own "system" using file comments to hold metadata and a script to pull that data into a spreadsheet later. Initially, I did this task with a combination of Google Drive and Google Apps Script. The description associated with the file on Google Drive contained JSON-formatted metadata, and then I wrote a Google Apps Script to pull the data from each file in the directory and add it to a Google Sheet. I chose JSON because it's already supported in Google Apps Script (which is based on JavaScript) and because it's less error-prone than trying to type XML with opening and closing tags for everything. It was also before I learned about an even simpler format called YAML. While the system worked, I recently decided that I want to save on expenses by pulling everything back to local storage and pay for a backup service rather than store all my files online, both for security reasons and because it's a lot easier making one payment a year than three payments a month to Google, Microsoft, and Apple for file storage, and to have everything on my machine backed up. There was also Google's limitation of a maximum execution time for Apps Script, which with the number of receipts I had was often exceeded, resulting in an incomplete spreadsheet. I also wanted to redo the system with YAML rather than JSON. That said, my task was to take what I was doing with the combination of Drive and Apps Script and bring it into my Mac world using Finder and AppleScript.

The Metadata

The first task was metadata. For this purpose, I decided to use JavaScript Object Notation (JSON) because it's quicker to type than XML and because it's the most compatible with Google Apps Script. Not to mention I was already familiar with it. The associated JSON looks something like this:

Eventually this was converted to YAML that looks like this (much simpler):

I could have put the data all on one level, but because sometimes I have receipt documents that are confirmations of multiple transactions (for example, paying multiple bills through my bank's online Bill Pay), I made a second level for the actual data so I could create a list of such blocks should there be multiple transactions on the same document.

in YAML:

Ultimately, the final format will be JSON (converted from YAML with node's yaml2json command-line utility via a do shell script call) that AppleScript will then convert into records using the JSON Helper for AppleScript, since AppleScript can't deal directly with YAML or JSON and there's no similar tool (that I could find) to convert YAML directly. With the switch to YAML, I chose to rewrite the metadata by hand rather than convert existing JSON. Writing the YAML directly in the file comments is tricky because of line breaks, so the easier method for me was to create a .yaml file for each receipt document in the folder I was using, saved alongside the original .pdf, .jpg, .png, or .tif of the receipt document. Then, I wrote the following AppleScript to copy the contents of each .yaml file into its associated document and then trash the .yaml file.

If everything works as expected, you should see your metadata in the file's comments when you right-click on the "Get Info" link or press ⌘I with the file selected in Finder.

Working with the Metadata

Now we have our files back, with their metadata; the next step is to be able to produce a spreadsheet with this information for later use. I chose to use Apple's Numbers application since it's already there on my Mac and it's cheaper than paying for Excel, which, while a fine application that I use all the time at work, is more than I need for this purpose. The first thing I did was to prepare a Numbers template (link to file in my iCloud Drive with the columns and headings that I need, all properly formatted. Because there are header and footer rows, Numbers requires that I include at least one blank row-the AppleScript will remove it when it's done importing the files' metadata. (Note I didn't use all the data I've put into the YAML; that's okay as long as your YAML does contain keys for all of the columns in your spreadsheet, even if the value is blank or zero. Now the fun part-transferring the metadata to the columns in my spreadsheet. Since AppleScript cannot deal directly with YAML, and I had a tool to work with JSON in AppleScript (the aforementioned JSON Helper for AppleScript), rather than reinvent the wheel I used an existing JavaScript designed for use with node.js called yaml2json that is usable from the command line. I installed node.js from its home page, then ran the following command line to install yaml2json:

sudo npm install -g yaml2json

I don't need to learn node.js at this point because the yaml2json module includes a command-line script, /usr/local/bin/yaml2json. The first line of that script does need to be modified to replace node with its full path, /usr/local/bin/node, because full paths are required by AppleScript when making the shell call (it won't read or know about your shell's environment). The process for the conversion involves two do shell script calls:

  • echo the quoted form of the file comments to a temporary text file (simpler in my mind than using open for access);
  • call yaml2json to read the temporary file and return the converted JSON result into a variable that is then converted to an AppleScript record by using tell application "JSON Helper" to set JSON to (read JSON from theResult).

Once you have that metadata as an AppleScript record, the rest of the process is straightforward:

  • Create a repeat with x in y loop to iterate through the individual subrecords inside the Receipt record;
  • Read the values from each subrecord into their respective variables;
  • Create a new row in the Numbers spreadsheet table and fill it with the values you just captured.
  • When done, delete the blank row at the bottom.

What you're left with is an open Numbers spreadsheet with your information, to save wherever you wish.


Popular Posts