Feature Wiki

Information about planned and released features

Tabs

(Project) Establish Referential Integrity

This is a project page that bundles several feature wiki pages which belong to a larger development activity for the ILIAS component Database.

1 Aim of Project

Currently (ILIAS 8) ILIAS doesn't use advanced database-built-in functionalities that ensure the integrity of stored data.

The benefits ILIAS demands from the Database Management System (DBMS) regarding data value correctness on field-level currently are:

  • uniqueness, by primary or unique indexes
  • 'not null' (without a defined default) for essential required fields
  • low-level datatype warranty on field-level, reasonable for numeric- and date/time-types, sometimes poor for unspecific varchar fields (examples: email and client_ip in [usr_data])
To improve data quality and to support code maintaining modern DBMSs offer several options ILIAS COULD use:
  1. Referential Integrity (Foreign Keys)
  2. Stored Procedures & Functions
  3. Trigger
Options 2 and 3 COULD be used (in combination) to check, format, generate and archive data on both field- and row-level, in sophisticated manner. These options, however, are not addressed by this feature request. In this FR, Option 1 only is to be discussed.

The lack of RI/Foreign Keys currently results in data inconsistency. This problem is very obvious, because at the time being we need additional cleanup features to handle orphaned and corrupted data, e.g.:
  • System Check: Missing Entries in Repository Tree
  • System Check: Missing Object References
Furthermore every administrator/developer/user certainly noticed errors like `Whoops\Exception\ErrorException thrown with message "<b>Error: There is no dataset with id 92771!</b><br />class: ilObjUser<b />Script: /srv/www/ilias/htdocs/Services/User/classes/class.ilObjUser.php<br />Line: 292"` in their ILIAS installations. This is also an evidence for missing RI.

To improve the quality of the ILIAS data stored in the DMBS, the next goal SHOULD be to establish foreign keys in and between tables where ever it is semantic consequential.

Once a semantic relation between data fields is established by a foreign key, this level of data integrity will be guaranteed by the DBMS for every attempt of data manipulation!
This means the integrity is no longer reliant solely on:

  • correctness of data values provided by manipulating PHPcode
  • correctness of data values provided by direct (interactive) manipulations (CLI, phpmyadmin, ...)
  • completeness of complex actions without an appropriate sequence of steps, (which may be interrupted by external technical incidents)
which can be failing unintentionally or by purpose.

As another benefit, if data manipulating code has yet unrecognized semantic bugs violating a relation, those bugs will emerge due to DBMS denying the execution, leading to attention so they can be fixed.

On the other hand, once a relation is established in the DBMS, every PHP class that manipulates (insert, update, delete) an involved data field must be checked whether the now existing relation requires a different behavior of its actions.
Potential activities which may be affected by a relation are:
  • deletes of data in the referenced table
  • updates of data in the referencing table
  • insertions in both tables, the referencing and the referenced table, as ONE unit of work
The kind of necessary or desirable code modifications may be:
  • changing the sequence of actions (insertion:  first in referenced table, secondary in referencing table)
  • removing of now unnecessary actions (deletion:  using ON DELETE CASCADE of relation)
  • changing the strategy when updating or inserting referencing data,  (example: If the function formerly could not rely on the presumption whether the new value already exists in the (now referenced) superior table, a check for its existance had to be scheduled for every update/insert.  With RI and the presumption that the value exists in very most cases, the former check can be optimistically removed and a (new) handling for violating the constraint must be implemented. The benefit by RI lies in smaller resource consumption for regular cases.)

The process of establishing RI in ILIAS implies actions in many contexts:

  • introduce the intentions of using Referential Integrity to the ILIAS community
  • modification of ILIAS' database abstraction layer
  • guidelines/documentation for maintainers to handle RI
  • migration of individual installations to use an appropriate database engine
  • changes to database structures
  • introduction of foreign keys in portions of at least one module or one service by its maintainer
  • management of the communication process when a modules maintainer can't act self-sufficient
  • reflecting an ongoing RI coverage for a general data deletion strategy (?)
  • presumably as a final phase: process superior services like 'User'
  • ...
Some of this actions are one-shots, some are ongoing.

As a pure technical view, following actions MUST take place in sequence:

  1. Migration of individual installations to use an appropriate database engine
  2. Modification of ILIAS' database abstraction layer
  3. Introduction of foreign keys in portions of at least one module or one service by its maintainer
  4. presumably as a final phase: process superior services like 'User'

Step 1. is addressed in feature request Abandon support for MyISAM engine and MySQL fulltext search
Step 2. is addressed in feature request Generalized Management Functions for Referential Integrity

A major challenge for Step 3. will be to design a RI-establishing process which handles existing violating data. This shouldn't happen theoretically if everything was fine in whole past - but this is unrealistic in the wild!
Therefore data checking scripts will be REQUIRED to ensure acceptable data in all affected client databases before an installation updates its ILIAS codebase to a new version including a new "portion of RI".
At least, this checking scripts (or at best: with data correcting ability) MUST cover those data-structures, which are going to be used for RI by this certain update portion.
The task to design and provide a data checking/preparation process is addressed by feature request Generalized Management Functions for Referential Integrity as well.

2 Involved Maintainers and Stakeholders

3 Timeline

ILIAS 9

  • Implement ilDBInterface changes
  • Provide Diagnostics when using the Setup Status Command
  • Provide an Abstraction for special RI Migrations
  • Provide a documentation/guideline in ./docs
  • Maybe a first component where RI gets established: Mail

ILIAS 10

RI established in more components

4 Related Feature Requests and Status

Feature Request

Suggested by

Funding

Planned Release

Status

Abandon support for MyISAM engine and MySQL fulltext search

Uwe Tesche / Michal Jansen

funded

ILIAS 8

Published in trunk

Generalized Management Functions for Referential Integrity

Uwe Tesche / Michal Jansen

funding required

ILIAS 9

Database: Detecting Violations for Referential Integrity

Uwe Tesche / Michal Jansen

funding required

ILIAS 9

Database: Management Functions for Referential Integrity in ilDBInterface

Uwe Tesche / Michael Jansen

HSU (interested)

ILIAS 9

accepted by JF

5 Further Results

6 Additional Information

7 General Discussion

Please discuss specific questions of feature requests on the related feature wiki pages. This discussion section is only for a general discussion of the project and its realisation.

JourFixe, ILIAS [jourfixe], 21 FEB 2022: Jansen, Michael [mjansen] and Tesche, Uwe [utesche] presented the project at the Big Projects Jour Fixe and answered questions.

  • Relevance for strategic ILIAS development: Data integrity important for further development and security of ILIAS.
  • Plan for ILIAS 9: more workshops; development of tools for migration, utility functions, ilDB...; creation of guideline; reference implementation in mail component.
  • Chance of realisation: partly financed - good (but still needs to be estimated); personnel: depends on distribution of tasks
  • Dependencies: No dependencies on other developments 
  • Impact on other development activities: not known, if at most positive

Statement Technical Board, 2022 Apr 6: Thank you very much for relentlessly pushing this effort for the last few years. Database integrity is definitely a valid target for improvement. The outlined plan to provide tools to maintainers to actually move their component to a better integrity sounds plausible. If these tools are applied, they will definitely make the system more reliable. Still, we worry that this effort will bind resources urgently needed for other efforts with similar goals that are already ongoing. Especially the IRSS seems to be problematic, since Fabian Schmid (as a maintainer of database and IRSS) will be heavily involved in both efforts. In general, we cannot push too many improvements at the same time, as this would overwhelm maintainers. We thus suggest keeping this effort low profile for ILIAS 9 and prepare the according code and documentation. If prepared properly, we can see this becoming a large topic with ILIAS 10. Please coordinate closely with Fabian to enable him to distribute his resources judiciously. In addition, please see, if maybe other people can tackle tasks for him. We will be happy to support this effort with advice for a proper preparation and project management. We are looking forward to making this a larger topic with ILIAS 10.

Last edited: 18. Oct 2024, 16:06, Kunkel, Matthias [mkunkel]