The audit in Dataverse/Dynamics 365 CE is one of the most useful functionalities, since it provides a complete history of the changes made in the system records for the enabled entities. However, storing this information in Dataverse in the long term can grow exponentially and can be costly in terms of storage. Therefore, it may be convenient to export the audit history to an external cloud storage solution (blob storage, FTP, SQL database…) in order to free up storage space in the Dataverse database.
In this article, it proposes a solution for exporting Dataverse audit history into a Data Lake Gen2 type storage account through SQL Server Integration Services (SSIS) and KingswaySoft components for Dynamics 365, and exploiting the audit data hosted in DataLake from PowerBI.
Requisitos
- Visual Studio 2019/2022 (Professional or Enterprise edition).
- SQL Server Data Tools (SSDT) for Visual Studio.
- KingswaySoft SSIS Productivity Pack.
- KingswaySoft SSIS Integration Toolkit for Microsoft Dynamics 365.
- Data Lake Gen2 Azure storage account.
Architecture
As a starting point, the proposed architecture for the solution to periodically export audit data from Dataverse to Data Lake is detailed.
Typically, audit records in Dataverse are created every so often due to continuous user activity in the system and scheduled processes, so it is proposed to retrieve the audit information incrementally and recurrently (e.g., once a day), so it is necessary to keep the date of the last data extraction in order to retrieve only the new records created in the system during the next executions of the export ETL process.
For this purpose, it is proposed to store the date of the last execution for each of the audited entities in a CSV file hosted in the Data Lake, retrieve this value during the execution of the process by applying the corresponding filters as a cut-off date, and update it at the end of the process for future executions.
In reference to the directory structure of the storage account in Data Lake, it is proposed the creation of a specific root container to house the audit information, composed of a folder for each of the audited entities in Dataverse and distributed in subfolders in the lower levels according to the structure ‘YEAR / MONTH / DAY’. However, this is only a suggestion at the organizational level within Data Lake, any other structure is equally valid according to the requirements, development needs and recurrence in the execution of the process (for example, if the extraction of the audit is performed on a monthly or quarterly basis, a folder could be created for each month or quarter, respectively).
In the use case proposed in the article, a process will be implemented for the daily extraction of the audit of the Contact entity in Dataverse, so that the following directory structure is proposed in Data Lake:
Once the architecture is defined, the implementation of the ETL process for exporting the audit data from Dataverse to the Data Lake storage account can proceed.
Implementation of the SSIS integration package
As mentioned above, as a use case is proposed the implementation of a process for the daily export of the audit corresponding to the contact entity in Dataverse through the implementation of a SQL Server Integration Services (SSIS) package and the use of KingswaySoft components for the extraction of data from Dataverse.
The steps necessary for the implementation of the ETL process are detailed below.
1. The first step is the creation in Visual Studio of a new project of type’Integration Services Project’.
2. After creating the project, the first thing to do is to retrieve the date from which you want to retrieve the audit data. To do this, create a new delimited CSV file (any other extension can be used) with the following structure, which will be stored in a specific location of the Data Lake storage account to retrieve and update this value during each of the executions of the process.
It is recommended to use the date format ‘yyyy-MM-dd HH:mm’ to avoid unnecessary format transformations in the ETL process.
According to the file structure defined in the previous section, this file for incremental date control is stored in the ‘auditcrm/contact/’ directory of Data Lake.
3. The main control flow is made up of 4 stages, detailed below.
Stage | Description |
GetLastExecutionDate | The date of the last extraction is retrieved from the incremental date control file stored in Data Lake, and the value is stored in a global process variable. |
SetFilePath | The destination path is generated in Data Lake (depending on the date on which the process is executed) where the CSV file with the audit data will be created. |
GetAuditData | The audit data from the last extraction date is retrieved from Dataverse, the CSV file is generated and deposited in the Data Lake directory generated in the previous phase. |
UpdateExecutionDate | The incremental date control file is updated with the current date at the time of process execution. |
STAGE 1: GetLastExectionDate
For the implementation of the first stage, it is required to create a global variable in the integration package (LastExecutionDate), of type ‘String’ and with no initial value assigned, which will be used to store the date of the last extraction of the authoring from Data Lake.
Then, a data flow is created to retrieve this value. For this, it is recommended to use the ‘Premium Flat File’ component to read from the incremental date control file, and the ‘Premium Derived Colum’ component to update the ‘LastExecutionDate’ variable with the retrieved date, both available in the KingswaySoft Productivity Pack.
First, create a connection to the Data Lake storage account through KingswaySoft’s ‘Azure Data Lake Storage Connection Manager’, specifying the storage account name and shared key for the connection setup.
Once the connection has been created and validated, to retrieve the file execution date, the connection, the path in Data Lake, the file format and the file field and data delimiters must be specified through the ‘Premium Flat File Source’ component.
From the ‘Columns’ section, the mapping of the ‘lastExecutionDate’ column of the file is added.
To update the ‘lastExecutionDate’ variable in the integration package, the following function is used from the KingswaySoft ‘Premium Derived Column’ component:
WriteValueToVariable(@[User::LastExecutionDate],[lastExecutionDate])
Where @[User::LastExecutionDate] is the global variable and [lastExecutionDate] is the output column of the previous component with the value of the execution date retrieved from the CSV file.
STAGE 2: SetFilePath/strong>
Once the last extraction date of the audit has been obtained, in the next stage the directory is generated in Data Lake where the CSV file corresponding to the current execution of the process will be created. To store the value of the generated path, it is necessary to create an additional variable in the integration package (Path), of String type and with the initial value “” assigned.
In this case, as previously described in the article a directory structure “/auditcrm/contact/YEAR/MONTH/DAY/contact_HHmmss.csv” is maintained, so the following expression can be used to generate the path and update the ‘Path’ variable, through KingswaySoft’s ‘Premium Expression Task’ component, directly on the main control flow:
WriteValueToVariable(@[User::Path] ,“/auditcrm/contact/”+YEAR(GETDATE())+“/”+MONTH(GETDATE())+“/”+DAY(GETDATE())+“/contact”+“_”+DATEPART(“Hh”, GETDATE())+DATEPART(“Mi”, GETDATE())+DATEPART(“Ss”, GETDATE())+“.csv”)
STAGE 3: GetAuditData
The third stage corresponds to the main data flow of the ETL process, in which the audit data is extracted from Dataverse from the last run date and the CSV file stored in Data Lake is generated.
For data extraction from Dataverse, the KingswaySoft ‘CDS/CRM Source’ component is used (it is necessary to previously create a connection against the Dataverse instance from which you want to extract the audit through the connection manager).
Once the connection is created, the ‘AuditLogs’ option of the component must be checked, and then a query in FetchXML format corresponding to the audited entity (in this case contact) must be entered. Additional filters can be applied to the query if desired, but to retrieve only the authorship records generated since the last execution, it is necessary to add at least the condition marked in yellow, where @[User::LastExecutionDate] corresponds to the global variable of the integration package in which the date of the last extraction in the initial phase of the process has been stored.
When the ‘AuditLogs’ option is selected, different outputs are enabled in the component, the following link of the KingswaySoft documentation provides more information on all the options that can be used to work with the audit records.
Extracting Audit Logs for Multiple CRM Entities (kingswaysoft.com)
To retrieve the data corresponding to the history of modifications made on the records in Dataverse, the following outputs are used:
- Primary Output: It contains the main data about each audit history entry, such as the record to which it corresponds, the user who made the changes, the date and the type of action performed on the record.
- Audit Details (Attribute Changes): This output includes the related primary blame record and the new and old values of the modified fields.
Each primary audit record contains a unique identifier (auditid), which is also included in all secondary outputs of the component to relate them.
Therefore, it is necessary to cross-reference the primary output with the changes in the attributes to obtain the complete audit trail for each record modified on the same line. To do this, both outputs are sorted by the ‘auditid’ field in ascending order and then an ‘INNER JOIN’ type cross (using the ‘auditid’ field as the related key) is performed through the SSIS ‘Merge’ component, as shown below:
For each of the related outputs, the following attributes are selected:
Salida | Atributo | Descripción |
Primary Output | actionname | Name of the action taken on the audited record. |
Primary Output | createdon | Date of creation of the audit history entry. |
Primary Output | objectid | GUID of the modified record. |
Primary Output | objectidname | Main field (usually name) of the modified record. |
Primary Output | userid | GUID of the user performing the modification. |
Primary Output | useridname | Name of the user making the modification. |
Attribute Changes | fieldname | Name of the modified field. |
Attribute Changes | oldvalue | Previous value. |
Attribute Changes | oldvalue_label | Previous value (formatted). |
Attribute Changes | newvalue | New value. |
Attribute Changes | newvalue_label | New value (formatted). |
In this way, each of the lines obtained from the audit detail as an output of the mix combination will be made up of the columns indicated in the table above.
Before writing the CSV file with the audit data retrieved in the Data Lake, it is recommended to apply an additional filter on the date of creation of the audit entries with respect to the date of last extraction, in order to avoid duplicities in the information stored in Data Lake. To apply the filter, the SSIS ‘Conditional Split’ component is used, from which a new output is created on the records to be considered by applying the following condition (where @[User::LastExecutionDate] corresponds to the variable in which the last extraction date is stored):
createdon > (DT_DBTIMESTAMP)@[User::LastExecutionDate]
Finally, the CSV file for the retrieved authoring data is written and stored in the corresponding directory (previously generated) of the Data Lake storage account. This operation can be performed through the KingswaySoft ‘Premium Flat File Destination’ component, specifying the connection to Data Lake, the destination directory (previously stored in the @[User::Path] variable of the integration package, the file format and the field and data delimiters).
To set the destination directory you can use an expression on the ‘DestinationFilePath‘ property, specifying the value stored in the ‘Path’ variable during the second phase of the process.
On the other hand, it is necessary to specify the field mapping corresponding to the CSV file to be generated from the ‘Columns’ section of the component.
Following all these steps, the main logic of the ETL process for retrieving audit data from Dataverse and exporting it to Data Lake is implemented.
STAGE 4: UpdateExecutionDate
In the last stage of the process, the current date at the time of the process execution is updated on the incremental date control file, in order to guarantee incremental data extraction during each execution and to avoid data duplication in Data Lake. For this purpose, an additional data flow integrating the following components is created.
An auxiliary data entry is generated by using the KingswaySoft ‘Data Spawner’ component.
Then, the current date with format ‘yyyy-MM-dd HH:MM’ is obtained using the following expression through the ‘Premium Derived Column’ component of KingswaySoft:
(DT_WSTR, 4)YEAR(GETDATE()) + “-” + RIGHT(“0” + (DT_WSTR, 2)MONTH(GETDATE()), 2) + “-” + RIGHT(“0″ + (DT_WSTR, 2)DAY(GETDATE()), 2) + ” ” + RIGHT(“0” + (DT_WSTR, 2)DATEPART(“HOUR”, GETDATE()), 2) + “:” + RIGHT(“0” + (DT_WSTR, 2)DATEPART(“MINUTE”, GETDATE()), 2)
Finally, the incremental date control CSV file stored in Data Lake is updated with the current date obtained through the KingswaySoft ‘Premium Flat File Destination’ component, specifying the destination directory where the control file is stored, the format, field delimiter and data delimiter.
From the ‘Columns’ section of the component, you must specify the mapping of the input column from the previous component (which contains the current date obtained) to the single column of the CSV file for incremental date control.
4. After the implementation of the 4 phases detailed above, one could proceed to the execution of the complete process, which should retrieve during each execution the Dataverse audit history data generated since the last extraction and store them in Data Lake according to the file and directory structure that has been defined.
Once we verify that the process is working correctly and that the audit data is exported correctly, we could proceed to delete the old audit records in Dataverse to free up space in the database.
NOTE: If you wish to deploy and automate the execution of the integration package from a SQL Server, you must have a valid KingswaySoft license installed on the server.
Audit report in PowerBI
A common way to exploit and analyse the data stored in the Data Lake is through a report in PowerBI. In this last section, we are going to see how to connect to the Data Lake storage account and build a simple report to query the audit information of a Dataverse environment externally.
- From the PowerBI Desktop tool, select the ‘Azure Data Lake Storage Gen2’ option from the ‘Get Data’ menu.
- Enter the URL (DFS) of the audit container in Data Lake. which has the following format (where ‘XXXXX’ is the name of the storage account and ‘YYYYYY’ is the name of the audit container):
https://XXXXX.dfs.core.windows.net/YYYYY
Azure credentials or the shared key of the storage account will be requested to perform authentication.
- In the next screen, click on ‘Transform Data’, as some settings are necessary prior to loading the data into the report.
- From the Power Query editor, first of all, you must exclude the files stored in the storage account that do not contain the authoring data (for example, the incremental date control files), for which you can apply the filter ‘Begins With…’ and the name of the audited entity as value over the ‘Name’ field.
- Then click on the ‘Combine files’ button, located on the right side of the ‘Content’ column header.
- In the ‘Combine Files’ dialog, click OK and proceed to load the data into the report by clicking the ‘Close & Apply’ button at the top of the ribbon.
- Once the data loading is completed, you could add, for example, a table type visualization in the work area and select the necessary columns, apply filters, add segmentations, graphs, etc. according to the needs of the report.
NOTE: If you wish to publish the report in the PowerBI service, you must have a PowerBI license and the necessary privileges assigned.
As a final conclusion, throughout the article we have seen step by step how to implement a solution to export the data of the entities enabled for the registration of the authoring history of a Dataverse environment in a Data Lake storage account using SQL Server Integration Services (SSIS) and KingswaySoft components, and how to exploit this information from a Power BI report, which allows us to free up space in the Dataverse database and reduce storage costs. I hope you found the solution useful.