Parsing spreadsheets with Clojure
Reading from an Excel file is surprisingly easy in clojure. We'll see an example in groovy and then compare it with one in clojure.
Groovy
In groovy we can read an Excel sheet like this:
XSSFWorkbook spreadsheet(File spreadsheetFile) {
FileInputStream file = new FileInputStream(spreadsheetFile)
new XSSFWorkbook(file)
}
def cellValue(XSSFSheet sheet, XSSFCell cell) {
if (cell.getCellType() == Cell.CELL_TYPE_NUMERIC) {
cell.getNumericCellValue().intValue()
} else {
stringCellValue(sheet, cell)
}
}
String stringCellValue(XSSFSheet sheet, XSSFCell cell) {
return cell.getStringCellValue().trim()
}
private List<String> headersFor(String sheetName, File spreadsheetFile) {
XSSFWorkbook spreadsheet = spreadsheet(spreadsheetFile)
XSSFSheet sheet
switch(sheetName) {
case SHEET_NAME1:
sheet = spreadsheet.getSheet(SHEET_NAME1)
break
case SHEET_NAME2:
sheet = spreadsheet.getSheet(SHEET_NAME2)
break
default:
return
}
sheet.getRow(sheet.getFirstRowNum())
.findAll{XSSFCell cell ->
cell.getStringCellValue().trim().length() > 0
}
.collect{XSSFCell cell ->
cell.getStringCellValue().toLowerCase()
}
}
XSSFSheet sheet = spreadsheet(spreadsheetFile).getSheet(sheetName)
def headers = headersFor(sheetName, spreadsheetFile)
def values = []
sheet.eachWithIndex { XSSFRow row, int index ->
if(index > 0) {
def rowFieldValues = [:]
row.each { XSSFCell cell ->
def cellValue = cellValue(sheet, cell)
if(field) {
rowFieldValues.put(field, cellValue)
}
}
values << rowFieldValues
}
}
Whew, that is a lot of code. Event though we wrote it in a more terse language than java, it is still quite long and error prone. Granted, we are using poi directly, compared to using a third party library like the one we'll use with clojure.
Clojure
In clojure we can use a third party library that is a wrapper for poi: Docjure. We can include this a dependency in our project.clj file:
[dk.ative/docjure "1.7.0"]
To extract the data from the spreadsheet, we basically need to do three things: 1. Load the file. 2. Select the sheet we want to read. 3. Specify the columns we are interested in.
Pretty straightforward:
(ns spreadsheet-example.core
(:require [dk.ative.docjure.spreadsheet :as spreadsheet]))
(defn read-rows []
(->>
(spreadsheet/load-workbook "resources/sample.xlsx")
(spreadsheet/select-sheet "Sheet Name")
(spreadsheet/select-columns {:A :name-for-column-A :B :name-for-column-B})))
Calling read-rows will return a list of maps whose keys correspond to the values of the columns. Eg:
[{:name-for-columnA "A" :name-for-columnB 1} {:name-for-columnA "B" :name-for-columnB "2} {:name-for-columnA nil :name-for-columnB nil]
The last two are empty rows. We can leave those out pretty easily:
(defn valid-row? [col]
(not (nil? (:name-for-columnA x))))
(defn all-valid-rows []
(->>
(read-rows)
(rest)
(filter valid-row?)))
This will filter out all rows that do not have a value in column A.
Happy hacking!

Roberto Guerra
Roberto Guerra is a developer living in the Caribbean.
In his spare time, he likes to read fantasy books and goof around with his dogs. He also started learning to play the classical guitar late in life. His current tech interests are Go, serverless and MacOS development with Swift.