When serialising data to a Comma Separated Values (CSV) file, it’s important that field values that contain reserved characters are escaped correctly. For example, if a value contains a unescaped comma, then this value will be incorrectly interpreted as two fields instead of one.
RFC4180 offers a specification and standard for CSV files, additionally I wanted to ensure that Microsoft Excel would be happy with any output generated by my converter.
The following method and unit tests demonstrate a way to do this conversion in Java. I’ve tried to make the code as efficient during execution as possible, so you may find that static analysis tools such as CheckStyle flag cyclomatic complexity (11) and Boolean expression complexity (5), but I consider the code simple and readable in its current form.
Method code as follows:
/** * Serialises and escapes any value so that it can be added to a CSV file. If the value contains a double-quote, CR, LF, comma or semi-colon, then * the entire value is wrapped in double-quotes. Any instances of double quotes (") are replaced with two double-quotes. * * @param value Any value that can be converted to a String using a {@link Object#toString()}. * @return A string suitable to be embedded in to a CSV file that will be read by RFC4180 compliant application or Microsoft Excel. If null is * passed, null is returned. */ public static String escapeForCsv(Object value) { // Handle either null inputs or toString() methods that return null. String inputString = value == null ? null : value.toString(); if (inputString == null) { return null; } // Do we need to wrap the entire value in quotes? boolean quotesRequired = false; int inputLength = inputString.length(); // Allocate a few extra bytes so we don't need to dynamically extend in the event of a quotes being required. final int extraCharsForQuoting = 3; StringBuffer retValue = new StringBuffer(inputLength + extraCharsForQuoting); for (int i = 0; i < inputLength; i++) { char ch = inputString.charAt(i); retValue.append(ch); if ((ch == 'n') || (ch == 'r') || (ch == ',') || (ch == ';') || (ch == '"')) { quotesRequired = true; if (ch == '"') { retValue.append(ch); } } } // Wrap the whole value in double quotes if we've found character that needed to be escaped. if (quotesRequired) { retValue.insert(0, """); retValue.append('"'); } return retValue.toString(); }
This code is used in xml2csv, contained within StringUtil.java. Therefore the unit tests refer to the static method within this class:
@Test public void testEscape() { assertNull(StringUtil.escapeForCsv(null)); assertNull(StringUtil.escapeForCsv(new Object() { @Override public String toString() { return null; } })); assertEquals("", StringUtil.escapeForCsv("")); assertEquals("a", StringUtil.escapeForCsv("a")); assertEquals("abc", StringUtil.escapeForCsv("abc")); assertEquals("""""", StringUtil.escapeForCsv(""")); assertEquals(""""a"", StringUtil.escapeForCsv(""a")); assertEquals(""a""b"", StringUtil.escapeForCsv("a"b")); assertEquals("","", StringUtil.escapeForCsv(",")); assertEquals("",;,;"", StringUtil.escapeForCsv(",;,;")); assertEquals("";"", StringUtil.escapeForCsv(";")); assertEquals(""n"", StringUtil.escapeForCsv("n")); assertEquals(""nr"", StringUtil.escapeForCsv("nr")); assertEquals(""n""n"", StringUtil.escapeForCsv("n"n")); }
Any comments or feedback welcome!