Subversion and Unicode

By default, Subversion tends to regard UTF-16 files as binary. It assigns them a MIME type of application/octet-stream. As a result, when an attempt is made to merge a change from a branched version of the file, there is always a conflict that must be hand-edited.

However, there is a solution. By giving the UTF-16 files a correct MIME type, SVN is able to perform merges just like a basic text file.

The required MIME type is one of

  • text/plain;encoding=UTF-16LE
  • text/plain;encoding=UTF-16BE

depending upon whether the encoding is LittleEndian or BigEndian respectively.

To set the property, use a command along the lines of

svn propset "svn:mime-type" "text/plain;encoding=UTF-16LE" *.utf-16.txt

I have found that this works with the (command-line) SVN version 1.6+ clients on both Linux and Windows.


One Response to “Subversion and Unicode”

  1. Nick Says:

    I found that a slightly different value for the svn:mime-type property is more helpful.

    AFAICT, instead of using “text/plain;encoding=UTF-16LE”, the value of “text/plain;charset=UTF-16LE” provides all the capabilities of the former, but also allows the file to be viewed in a web browser (for example, viewing the file via WebSVN).

