Synchronised Scrambling

Synchronised scrambling enables data to be replaced throughout a database. For example, you may want to scramble the Account codes in the database. However, not only does every matching Account code in every file need to be scrambled to the same value to maintain the data integrity, but also potentially the data is too sensitive and therefore the test data must contain Account codes that do not exist in the live database. This feature essentially allows you to change the data in a field from one value to another consistently across the database, replacing real values with new ones that optionally do not already exist. It can be used in conjunction with any of the other warping or scrambling options.

In order to use this feature you need to define the Synchronised Scrambling Fields (SSFs). Synchronised scrambling is going to generate and amend with new values so the definition needs to be the same across all tables. Therefore, you define the Account Code field and its characteristics. You then go into every table where you want to ‘Scramble’ the data (or use the auto define function) and the definition of the field on the file and the ‘Scrambling Field’ will be checked to ensure they are the same. If they are not, you will receive an error message. If the field definitions are not the same on every file, AFDs can be used to create new matching field definitions and the AFD fields can then be scrambled.

Synchronised Scrambling Fields (SSFs)
Option 6 on the ‘Work with Warp Cases’ screen enables the Synchronised Scrambling Fields to be set up. This screen is also accessed via F7 from the Object Details screen when defining a Warp Object and from the Auto Definition display.

Options:

2 – Change Edit the properties of the SSF, see following section for more information.

4 – Delete Remove the SSF, a confirmation screen is displayed before the action takes place. If the SSF is being used in the Warp Case then it cannot be deleted and an error message is displayed. Option ‘7’ can be used to determine where the SSF has been defined and these definitions must be removed before the field can be deleted.

7 – Where Used Display a window which lists against which files and fields in the Warp Case this SSF has been defined.

Function Keys:

F6 – Add Create a new SSF, see following section for more information.

Add/Change a Synchronised Scrambling Field
Option ‘2’ or F6 from the Synchronised Scrambling Fields display causes the following screen to appear. If you change the definition of a field with regards to Type, Length or Decimal Positions and that field is already being used by the Warp Case, you will receive an error message stating that the definitions no longer match the fields on the file.

Field The name of the SSF which must be unique within the Warp Case. A field name which matches any combination of warp formats is not allowed, for example DMY or DATE.

Description A compulsory field which can be used to explain the purpose of the SSF.

Type Packed, Signed, Binary, Alpha and Graphic are supported. Variable length fields are also supported.

Length The maximum length for alpha fields is 1024 and for numeric fields it is 30.

Decimal Positions This is only valid for numeric fields, if left blank 0 is assumed.

Generate By There are three options which determine how this field will be scrambled.
If you choose option ‘1’ then a new value will be generated for you.
If you choose option ‘2’ then you must generate the value via a User Exit. See the following section for more information.
If you choose Option 3=Scramble  , use this option to create a vertical synchronised scramble field.  Using this option will vertically scramble the data across all files that use this field definition.     

Start Value The Start Value field can be used for numeric values only. If you would like Random values to start from a specific value other than 1, that value can be keyed in here. If keyed the number of digits must match the field length exactly. For example, if you have a 7 long numeric your start value must be in the format 0000345, 345 will not be accepted.       

Program/Library This is the name of the User Exit program that will be used to generate the new values if the Generate By option is set to ‘2’. See the following section for more information.

Controlling Parms This field is optional and can only be used in conjunction with a User Exit program. It will pass whatever data is specified into the User Exit.

Value Existence By default this option is set to ‘2’ which means that the new value that is generated by either the Random or User Exit method can already exist in the file being processed. Option ‘1’ however forces a new value to be generated. See following section for more information.

Master File/Library/Field Use in conjunction with the above Cannot Exist option to reference a master list of field values against which the new value can be validated. These fields can be prompted with F4. Either the Master File and Field must be *SAME (same as the file and field being processed) or all values need to be specified. See the following Value Existence section for more information.

Random Value Generation
If the Generate By option is set to ‘1’ then the following rules will be used to generate new values for the fields against which the SSF has been specified.

Numeric fields
A new value will be generated using a simple count.

Example: There are 3 account numbers in my database, all are 7 long numeric:- 2837629, 2736463, 3984343

These values will be scrambled to 0000001, 0000002, 0000003 and used throughout the database. Therefore if account number 2387629 has been changed to 0000001, this change will be made for every instance of this account code in every file where this SSF has been specified.

A start value can optionally be used if the random values should begin from a specific number.

Alpha fields
A new value will be generated using the uppercase alphabet.

Example: There are 3 surnames in my database, all are 10 long:-BISHOP, PONSFORD, ROGERS

These values will be scrambled to AAAAAAAAAA, AAAAAAAAAB, AAAAAAAAAC.

Once the value AAAAAAAAAZ has been used, the next value will be AAAAAAAABA and so on.

In version 8.1, generation of new alpha values has been improved.  Rather than the AAAA, AAAB, AAAC, etc., there is the new start value template such as “Test Company ####” which produces
“Test Company 0001”, “Test Company 0002” etc. during the scramble.  At least 4 hashes must be defined but this can be increased.  The number of hashes needs to be long enough to cater for the number of records in the file.  


User Exit
If the Generate By option is set to ‘2’ then the program which has been specified in the Program and Library fields will be called and it is the job of this program to generate and return the new value.

A template User Exit program called TBI005R is shipped with the product and the source for this can be found in QRPGLESRC in the library TB_DEMO. There is also a sample User Exit for creating true Random numbers if you need a random number for any field between 1 – 30 digits. This is called TBI006R and can be found in the same location.

The Controlling Parms field can be used to pass data into the User Exit if there is any additional information that is required by the program.

New user exit programs have been written for version 8.1.0 such as generate credit card number, American SSN and new random value using old value as template.

The following exit program objects are provided in TB_7PO along with source code that can be found in
TB_DEMO/QRPGLESRC. They are designed to obfuscate sensitive data for testing purposes. The source is
provided so that the customer may modify the logic if needed.

TBI011R – Random Bank Card Number Generator
This program will take a bank card number of different lengths and which may also be formatted with spaces or other characters. The IIN (first 6 digits) are not modified. The following digits which form the account number are then replaced with a random number. The last digit which is a check digit is calculated using the Luhn algorithm.

TBI012R – Social Security Number Generator
This will take an existing SSN and generate a new random value. A SSN is formatted as aaa-bb-cccc.
The generator uses the following rules.

  1. No part of the code (aaa, bb or cccc) can be all zeros.
  2. If the aaa value is in the range of 001 to 899 it is replaced with a random number within this range. The value 666 will never be generated.
  3. If the aaa value is 900 or above then this can be a valid code but is not an SSN. It will be replaced with a random number between 900 and 999.
  4. The values at bb and cccc are replaced with random numbers.

TBI013R – Generate Random Value
This exit program will generate a new random value based upon the format of the existing value. The passed string is used as a template for the value. Existing letters are replaced with a new random letter, case sensitivity is maintained. Existing numbers are replaced with random numbers. Other characters are not replaced, for example a postcode of RG24 8AG could be changed to JW86 9AS.

Value Existence
If this option is set to ‘2=Can Exist’, the new value that is generated by either the Random or User Exit method can already exist in the file being processed. This will not cause a duplicate key problem if you are scrambling a key field because all the values are going to change from one value to another. For example:

Old Values – New Values:

01 – 05
02 – 03
03 – 02
04 – 06
05 – 01
06 – 04

All of the values have been changed, but the values on the file are the same. ‘Can Exist’ does not mean that they necessarily will exist however. In the above example we could have specified a ‘Start Value’ of 10 and then all the values would have been new.

If you want to guarantee that the new values generated will not already exist in the database then you need to use ‘1=Cannot Exist’. This option will mean that warping will check and see if the new value created already exists in the database and if it does it will generate an alternative value.


NOTE: Please take care when using this option. SQL is used to check for every new value whether it already exists or not in the database. If you have 100,000 records in the file, 100,000 SQL Selects will occur. It will be slowed down to almost unworkable speed if the file and field being processed does not have a sensible key or access path available as the SQL engine will choose to process the file sequentially. Therefore, in the above scenario it is possible that 100,000 records are read 100,000 times. This needs to be avoided.

The Master File/Library/Field can help with this. If you are processing a file that has an account code in the file, but it is not part of the key, then you can set up these fields to point to the Account Code file so that any new values can be checked against this rather than the file being processed. As this file will have a unique key, the SQL Selects will be very quick.

The Master File/Library/Field would also be needed in the above scenario because you cannot control the sequence in which the files are processed. If the above file was processed before the Account code file and it was checking back against itself to ensure all values were new, not all Account codes would necessarily exist in this file and therefore the ‘Cannot exist’ option could not be maintained.