All we need is an easy explanation of the problem, so here it is.
In Db2 v11.5.7 on Linux I have simple table:
create table admin.patient_data ( patientid int not null primary key, patient_name varchar(10), illness varchar(15), doctor_name varchar(10) ); insert into admin.patient_data values (1, 'Alice', 'illness A', 'DOCTOR1'); insert into admin.patient_data values (2, 'Bob', 'illness B', 'DOCTOR2'); select * from admin.patient_data;
PATIENTID PATIENT_NAME ILLNESS DOCTOR_NAME ----------- ------------ --------------- ---------- 1 Alice illness A DOCTOR1 2 Bob illness B DOCTOR2
Column ILLNESS is sensitive data. I would like to allow this column to be displayed only if column DOCTOR_NAME matches Db2 logged-in user.
create mask admin.patient_data on admin.patient_data for column illness return case when doctor_name = USER then illness else 'Masked data' end enable; alter table admin.patient_data activate column access control;
Now user DOCTOR1 connects to database and checks the data with:
select * from admin.patient_data
it is returned:
PATIENTID PATIENT_NAME ILLNESS DOCTOR_NAME ----------- ------------ --------------- ----------- 1 Alice illness A DOCTOR1 2 Bob Masked data DOCTOR2
In row PATIENTID=2 column ILLNESS is expected to be masked.
But doctor is educated, so he/she knows all the illnesses and now target "illness B"
db2 "select * from admin.patient_data where illness = 'illness B'"
and it gets:
PATIENTID PATIENT_NAME ILLNESS DOCTOR_NAME ----------- ------------ --------------- ----------- 2 Bob Masked data DOCTOR2
Field ILLNESS is still marked as expected, but now because of where conditional DOCTOR1 knows that patient Bob has "illness B".
I expected that end-users could filter by values they are returned with where condition. That is: "illness A" and "Marked data".
Is there some simple solution to prevent this? I expect to have no record returned for last select statement.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
This is Working As Designed™:
The application of enabled column masks does not interfere with the operations of other clauses within the statement such as the
SELECT DISTINCT, and
ORDER BY. The rows returned in the final result table remain the same, except that the values in the resulting rows might be masked by the column masks.
If you want to prevent access to specific rows, you’ll need to activate row access control and define the required row permissions.
Masking is done pretty late in the process, before returning data to end-user, way after WHERE condition is executed.
If the requirement is to strictly do the masking on database level (and not on application level) like all doctors have to have access to all of the patient names, then I see no other option then redesign ER model. This kind of kills the main "marketing" idea masking to be applied on top of existing tables.
First option is to use views instead of data masking and applications access views instead of tables.
Second option is to separate sensitive and non-sensitive data in separate tables. Like having "patientid", "patient_name", "doctor_name" in "patient" table and separate patient-illness relationship in separate table with "patientid" and "illness" fields. Or having "patientid", "patient_name", "doctor_name" and new field "illnessid" in "patient" table and illnessid and illness in new illness table. Then alongside of column masking also use row access control in this two separated tables.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂