Go to: Intranet Home Page
  Dashboard > Paratext > Home > Paratext 7 NBSP
  Paratext
  Paratext 7 NBSP
Log In   View a printable version of the current page.  
 

Added by Jeff Klassen , last edited by Clayton Grassick on Dec 07, 2009  (view change)
Labels: 
(None)

We need to rethink how Paratext handles NBSP (non-breaking spaces) in light of some recently discovered problems.

We currently handle nbsps as follows:

  • Existing non-breaking spaces are preserved and new ones can be inserted
  • "~" (tilde) can also be used to represent a nonbreaking space, and is shown as a nbsp in Preview mode, as well as in Print Draft and publishing

Unfortunately, this approach has several problems:

  1. In Paratext, it is very difficult to tell whether a non-breaking space is present or not; they are visually identical
  2. The underlying editor technology (MSHTML), over which we have little control, sometimes inserts nbsps into the text without being asked to do so. This is fairly standard in HTML editors, apparently, and is done to preserve spaces at the beginnings of lines. I can see no way to stop it from doing so.
  3. It is very difficult for users to be consistent in their use of non-breaking spaces if there is no way to see them
  4. USFM standard already defines "~" as the official way to handle non-breaking spaces

As a result of #1 and #2, non-breaking spaces may be silently inserted and we have no way of easily seeing them.

There are a number of possible solutions.

A) Use tilde on disk

  • Change all non-breaking spaces that are read in to tilde. USFM on disk would contain "~" instead of U+00A0
  • Keep Preview mode the same to allow a preview with the tildes converted to non-breaking spaces
  • Replace all non-breaking spaces with spaces when converting back from the displayed text (this would mean that the editor would not have to worry about whether a space or a non-breaking space was inserted)

B) Use NBSP on disk

  • Immediately before display, convert any NBSP to tilde.
  • Preview mode works as it does now: both non-breaking spaces and tildes are displayed as nbsp
  • When saving to disk, all tildes are converted back to NBSP.

Option B is problematic:

  • Any projects that use "~" for any other purpose will have them converted to non-breaking spaces.
  • Many different parts of Paratext will need to be modified, as the displayed text does not match the actual text on disk
  • Projects that validly use "~" for non-breaking space will have them removed and replaced with U+00A0

Option A is less painful, but:

  • Users will have to learn to use ~ instead of U+00A0 when inserting

Note that with Option A:

  • Since change is made when reading in the text, the conversion of non-breaking spaces to tildes will not trigger a save notification and will not interfere with merging
  • Projects with existing tildes will be unaffected, except in publishing and preview mode (which is broken anyway for projects that are not using Unicode)
  Name Size Creator (Last Modifier) Creation Date Last Mod Date Comment  
PDF File NBSPEmails.pdf 686 kb Jeff Klassen Dec 04, 2009 Dec 04, 2009  
File NBSP-2.png 25 kb Brian Renes Dec 10, 2009 Dec 10, 2009  
File NBSP-1.png 19 kb Brian Renes Dec 10, 2009 Dec 10, 2009  

Precision is what this work is about and some texts want/need the precision of holding strings together. Typically it is more with resources that we need to really worry about non-breaking spaces. It is a non-issue for most projects.

Invisibility does not help for precision.

Therefore, I would absolutely vote for option A.

In the past we made NBSP visible. I think we need to return to that practice. When it is important for it to be converted, at that point convert it, but make it a ~ in the file.

I do not object to converting NBSP to a visible character.

I do object to losing the NBSPs (ZWJs, etc.) that are entered by the teams to get the text to display correctly.

If NBSPs are converted to tilde in the USFM file, I can handle that. It would not have to be a tilde, if we could agree on another suitable character.

 So, either choice appears fine to me for non-roman typesetting.

In looking at the encodings for NBSP – the best "visible" option (which follows an accepted standard, as well as USFM standard) seems to the tilde. So I would suggest that we use the tilde. (see attachment NBSP-1 for the listing of NBSP encodings)

Joan brings up a good point in considering other "invisible" encodings. I do not work with any languages that need ZWJ, ZWNJ – but do we need to do the same thing with other "invisible" items? Does it help to make them visible on HDD as HTML does?

In looking at some charts, there are various invisible characters that could be considered; like right-to-left, left-to-right markers. (see attachment NBSP-2)

Many of the other "invisible" characters must remain invisible – they are more of a type of control character, resulting in a direction change in the text presentation, or a forced joining or separating of two shapes in a cursive script (ZWJ, ZWNJ). You never want to see anything, just the result of the their presence (they can make the characters around them look different).

Powered by a free Atlassian Confluence Open Source Project / Non-profit License granted to Canadian Bible Society. Evaluate Confluence today.
Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators