Monday, May 29, 2017

Reformatting text values:


This article describes how to use a simple I-type to format an existing D-type field with any number of multi-values containing text, into regularly sized values of text. While replicating data from UniVerse to Microsoft SQL Server I ran across some data that varied widely in its format with regards to the length of multi-valued text. The problem was evident when inserting the individual values as separate rows in the SQL Server database. The schema had been set to some arbitrary column width (for example VARCHAR(255)) which worked well for most of the records but for many the text length exceeded this limit and was truncated to fit the schema. One option would be to insert the contents of the field, including value-marks as a Binary Large OBject (BLOB) in its own column. This would make future queries difficult so a better option is to make the replicated data fit the schema.

Here is what the data looks like (I display some calculated values on-the-fly to help illustrate the problem):

Note: All programming displayed on this site is for illustrative purposes only. Use at your own risk.


>LIST EVENTS EVAL "DCOUNT(DESC,@VM)" COL.HDG "#vals" FMT "4R" EVAL "LEN(REUSE(DESC))" FMT "10R" COL.HDG "Len Desc" EVAL "CONVERT(@VM,'}',DESC)" COL.SPCS BY.DSND EVAL "LEN(REUSE(DESC))" COUNT.SUP

EVENTS #vals Len Desc CONVERT(@VM,'}',DESC)...................
130196     1     9998 While emailing from the tablet app the 
                      full text was not completely visible at
                      all times. Instead I had to utilize the
                      vertical and horizontal scroll bars and
...
160283    18     1273 Choosing to print displays the print 
                      dialog but the "Print" and "Cancel" 
                      buttons aren't immediately visible. By
                      scrolling the app "up" the buttons are
                      visible.}Sometimes I can zoom in and
...
140119    17      792 Running the latest version the option to
                      automatically activate the map with my
                      current location isn't apparent.}After
                      changing views to map-view then by
                      clicking the use-current-location icon
                      will the map appear with my location.}
...
021092     2       66 There are periodic flashes that occur at
                      random times.}See attached.
...
181002     1       46 The font size of the field labels is too
                      small.



In the above example the list is sorted in descending order by the DESC field length. The largest field is 9,998 characters in length and the smallest field contains just 46 characters.

The problem that becomes apparent is that the replicated data, being split off into an associated table with foreign keys that link it to the base table row, is not formatted consistently. Obviously had the issue of varying lengths of values been anticipated then the entry window that initially captures the information could have been designed to always result in regularly sized values. With that said the most expedient solution is to replicate a new dictionary item in place of the "DESC" data field, one that "chunks" the output into regularly sized values that are replicated into individual rows. This will ensure all data conforms to the schema constraints and subsequent queries will return all matching rows.

The I-type used for this purpose is as follows:




>LIST DICT EVENTS "DESC" "DESCRIPTION"

               Type &
Field......... Field. Field........................................................ Column......... Output Depth &
Name.......... Number Definition................................................... Heading........ Format Assoc..
DESC           D   12                                                               Description     40T    M
DESCRIPTION    I      TRIM(CONVERT(@TM,@VM,FMT(TRIM(CONVERT(@VM,' ',DESC)),'80T'))) Description     80T    M


2 records listed.



The original field is "DESC" which is a D-type item. The new field is "DESCRIPTION" which is an I-type item. The new item inserts value marks every 80th character. What follows are the same records shown above, but with the DESCRIPTION field alongside that, for the purposes of displaying what's happening, also converts the @VMs (value marks) to "}" just for the sake of visualizing them in the sample listing. The actual fields as rendered by RetrieVe appear as line breaks when listed.



>LIST EVENTS EVAL "CONVERT(@VM,'}',DESC)" EVAL "CONVERT(@VM,'}',DESCRIPTION)" COL.SPCS BY.DSND EVAL "LEN(REUSE(DESC))" COUNT.SUP

EVENTS CONVERT(@VM,'}',DESC)................... CONVERT(@VM,'}',DESC)...................
130196 While emailing from the tablet app the   While emailing from the tablet app the  
       full text was not completely visible at  full text was not completely visible at 
       all times. Instead I had to utilize the  }all times. Instead I had to utilize the
       vertical and horizontal scroll bars and  vertical and horizontal scroll bars and}
...
160283 Choosing to print displays the print     Choosing to print displays the print     
       dialog but the "Print" and "Cancel"      dialog but the "Print" and "Cancel" 
       buttons aren't immediately visible. By   }buttons aren't immediately visible. By
       scrolling the app "up" the buttons are   scrolling the app "up" the buttons are
       visible.}Sometimes I can zoom in and     }visible. Sometimes I can zoom in and
...
140119 Running the latest version the option to Running the latest version the option to
       automatically activate the map with my   automatically activate the map with my 
       current location isn't apparent.}After   }current location isn't apparent. After
       changing views to map-view then by       changing views to map-view then by
       clicking the use-current-location icon   }clicking the use-current-location icon
       will the map appear with my location.}   will the map appear with my location.
...
021092 There are periodic flashes that occur    There are periodic flashes that occur
       at random times.}See attached.           at random times. See attached.
...
181002 The font size of the field labels is too The font size of the field labels is too
       small.                                   small.



The result is that very large fields with none, one or a few value marks are broken up into discrete 80 character (or fewer) chunks. These segments will fit within a SQL Server table column with type VARCHAR(80). Any DESC fields that are not very large will result in a DESCRIPTION field that aggregates the values into one or a few value "chunks".

While purposefully entered line breaks (in the original field) will not always be preserved in the new aggregated field, the output does conform to a SQL Server schema without truncating anything.

No comments:

Post a Comment