We need to rethink how Paratext handles NBSP (non-breaking spaces) in light of some recently discovered problems.
We currently handle nbsps as follows:
Existing non-breaking spaces are preserved and new ones can be inserted
"~" (tilde) can also be used to represent a nonbreaking space, and is shown as a nbsp in Preview mode, as well as in Print Draft and publishing
Unfortunately, this approach has several problems:
In Paratext, it is very difficult to tell whether a non-breaking space is present or not; they are visually identical
The underlying editor technology (MSHTML), over which we have little control, sometimes inserts nbsps into the text without being asked to do so. This is fairly standard in HTML editors, apparently, and is done to preserve spaces at the beginnings of lines. I can see no way to stop it from doing so.
It is very difficult for users to be consistent in their use of non-breaking spaces if there is no way to see them
USFM standard already defines "~" as the official way to handle non-breaking spaces
As a result of #1 and #2, non-breaking spaces may be silently inserted and we have no way of easily seeing them.
There are a number of possible solutions.
A) Use tilde on disk
Change all non-breaking spaces that are read in to tilde. USFM on disk would contain "~" instead of U+00A0
Keep Preview mode the same to allow a preview with the tildes converted to non-breaking spaces
Replace all non-breaking spaces with spaces when converting back from the displayed text (this would mean that the editor would not have to worry about whether a space or a non-breaking space was inserted)
B) Use NBSP on disk
Immediately before display, convert any NBSP to tilde.
Preview mode works as it does now: both non-breaking spaces and tildes are displayed as nbsp
When saving to disk, all tildes are converted back to NBSP.
Option B is problematic:
Any projects that use "~" for any other purpose will have them converted to non-breaking spaces.
Many different parts of Paratext will need to be modified, as the displayed text does not match the actual text on disk
Projects that validly use "~" for non-breaking space will have them removed and replaced with U+00A0
Option A is less painful, but:
Users will have to learn to use ~ instead of U+00A0 when inserting
Note that with Option A:
Since change is made when reading in the text, the conversion of non-breaking spaces to tildes will not trigger a save notification and will not interfere with merging
Projects with existing tildes will be unaffected, except in publishing and preview mode (which is broken anyway for projects that are not using Unicode)