When you change the UNDO_RETENTION parameter, the LOB segment’s retention value is not modified

March 18, 2019, 12:30 am

≫ Next: Migrating Oracle database from windows to ODA

≪ Previous: Upgrade to Oracle 19c – performance issue

Below, I will try to explain, a particular case for the general error : ORA-01555 snapshot too old error..

Normally, when we have this error, we are trying to adapt the retention parameters or to tune our queries.

SQL> show parameter undo;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
temp_undo_enabled                    boolean     FALSE
undo_management                      string      AUTO
undo_retention                       integer     3600 --extended from 900,
undo_tablespace                      string      UNDOTBS1

But, are some scenarios where the above rule is not working .

We got from the alert log file of the DB the sql id which caused the issue : pmrbk5fdfd665

But, when you want to search for it, in V$SQL/V$SQLAREA is not there

SQL> select sql_fulltext from v$sql where sql_id like '%pmrbk5fdfd665%';

no rows selected

Why?

Seems that, sql_id is present in V$OPEN_CURSOR, with an input into the sqltext column.
The issue is coming from the fact that the statement is accessing a LOB column, which causes to Oracle to generate a new sql id.
The execution part related to the LOBs will not appear into V$SQL/V$SQLAREA and is not captured into the AWR reports.

SQL>  select distinct * from v$open_cursor
  2     where rownum < 25
  3     and sql_id like '%pmrbk5fdfd665%';

SADDR                   SID USER_NAME                      ADDRESS          HASH_VALUE SQL_ID        SQL_TEXT                                                     LAST_SQL SQL_EXEC_ID CURSOR_TYPE
---------------- ---------- ------------------------------ ---------------- ---------- ------------- ------------------------------------------------------------ -------- ----------- ---------------
0000000670A19780         74 my_user                   00000002EB91F1F0 3831220380 pmrbk5fdfd665 table_104_11_XYZT_0_0_0
00000006747F0478        131 my_user                   00000002EB91F1F0 3831220380 pmrbk5fdfd665 table_104_11_XYZT_0_0_0

Apparently, the string into the sql_text column is a HEX representation of the object_id that is being accessed.
In our case is :XYZT

SQL>    select owner, object_name, object_type
  2    from dba_objects
  3    where object_id = (select to_number('&hex_value','XXXXXX') from dual);
Enter value for hex_value: XYZT
old   3:   where object_id = (select to_number('&hex_value','XXXXXX') from dual)
new   3:   where object_id = (select to_number('XYZT','XXXXXX') from dual)

                                                                                                                    
OWNER                  OBJECT_TYPE                                               OBJECT_NAME
---------------------- --------------------------------------------------------------------------
my_user                TABLE                                                     my_table


SQL> desc my_user.my_table;
 Name                  Type
 -------------------   ----------------
 EXPERIMENT_ID          VARCHAR2(20)
 DOCUMENT               BLOB
............….

If we are looking for the retention on the ” DOCUMENT ” column, we will see :

SQL> select table_name, pctversion, retention,segment_name from dba_lobs where table_name in ('my_table');

TABLE_NAME                                                                               
                                                  PCTVERSION  RETENTION                  SEGMENT_NAME
---------------------------------------------------------------------------------------- ------------------------------------
my_table                                                       900                       SYS_LOB0000027039C00002$$

In order to fix it , try to run this column to adapt the retention of the blob column, related to the new value of the UNDO_RETENTION parameter,

ALTER TABLE my_table MODIFY LOB (DOCUMENT) (3600);

L’article When you change the UNDO_RETENTION parameter, the LOB segment’s retention value is not modified est apparu en premier sur dbi Blog.

↧

Migrating Oracle database from windows to ODA

September 30, 2019, 10:47 pm

≫ Next: Oracle 12c – pre-built join index

≪ Previous: When you change the UNDO_RETENTION parameter, the LOB segment’s retention value is not modified

Nowadays I have been working on an interesting customer project where I had to migrate windows oracle standard databases to ODA. The ODAs are X7-2M Models, running version 18.5. This version is coming with Red Hat Enterprise Linx 6.10 (Santiago). Both windows databases and target ODA databases are running PSU 11.2.0.4.190115. But this would definitively also be working for oracle 12c and oracle 18c databases. The databases are licensed with Standard Edition, so migrating through data guard was not possible. Through this blog I would like to share the experience I could get on this topic as well as the method and steps I have been using to successfully migrate those databases.

Limitations

Windows and Linux platform been on the same endian, I have been initially thinking that it would not be more complicated than simply duplicating the windows database to an ODA instance using the last backup. ODA databases are OMF databases, so can not be easier, as no convert parameter is needed.
After having created a single instance database on the ODA, exported the current database pfile and adapted it for the ODA, created the needed TNS connections, I have been running a single RMAN duplicate command :
RMAN> run { 2> set newname for database to new; 3> duplicate target database to 'ODA_DBNAME' backup location '/u99/app/oracle/backup'; 4> }

Note : If the database is huge, as for example, more than a Tera bytes, and your sga is small, you might want to increase it. Having a bigger sga size will lower the restore time. Minimum 50 GB would be a good compromise. Also if your ODA is from the ODA-X7 family you will benefit from the NVMe technologie. As per my experience, a duplication of 1.5 TB database, with backup stored locally, did not take more than 40 minutes.

I have been more than happy to see the first duplication step been successfully achieved :
Finished restore at 17-JUL-2019 16:45:10

And I was expecting the same for the next recovery part.

Unfortunately, this didn’t end as expected and I quickly got following restore errors :
Errors in memory script RMAN-03015: error occurred in stored script Memory Script RMAN-06136: ORACLE error from auxiliary database: ORA-01507: database not mounted ORA-06512: at "SYS.X$DBMS_RCVMAN", line 13661 ORA-06512: at line 1 RMAN-03015: error occurred in stored script Memory Script RMAN-20000: abnormal termination of job step RMAN-11003: failure during parse/execution of SQL statement: alter database recover logfile '/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_07_17/o1_mf_1_25514_glyf3yd3_.arc' RMAN-11001: Oracle Error: ORA-10562: Error occurred while applying redo to data block (file# 91, block# 189) ORA-10564: tablespace DBVISIT ORA-01110: data file 91: '/u02/app/oracle/oradata/ODA_DBNAME_RZA/ODA_DBNAME_RZA/datafile/o1_mf_dbvisit_glyczqcj_.dbf' ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 501874 ORA-00600: internal error code, arguments: [4502], [0], [], [], [], [], [], [], [], [], [], [] RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of Duplicate Db command at 07/17/2019 16:45:32 RMAN-05501: aborting duplication of target database

Troubleshooting the problem I could understand that migrating database from Windows to Linux might not be so simple. Following oracle Doc ID is describing the problem :
Restore From Windows To Linux using RMAN Fails (Doc ID 2003327.1)
Cross-Platform Database Migration (across same endian) using RMAN Transportable Database (Doc ID 1401921.1)
RMAN DUPLICATE/RESTORE/RECOVER Mixed Platform Support (Doc ID 1079563.1)
Restore From Windows To Linux using RMAN Fails (Doc ID 2003327.1)

The problem is coming from the fact that recovering redo transactions between windows and linux platform is not supported if the database is not a standby one. For standard database version, the only possibility would be to go through a cold backup which, in my case, was impossible knowing the database size, the time taken to execute a backup and the short maintenance windows.

Looking for other solution and doing further tests, I could find a solution that I’m going to describe in the next steps.

Restoring the database from the last backup

In order to restore the database, I have been running next steps.

Start the ODA instance in no mount :

SQL> startup nomount

Restore the last available control file from backup with rman :

RMAN> connect target / RMAN> restore controlfile from '/mnt/backupNFS/oracle/ODA_DBNAME/20190813_233004_CTL_ODA_DBNAME_1179126808_S2864_P1.BCK';

Mount the database :

SQL> alter database mount;

Catalog the backup path :

RMAN> connect target / RMAN> catalog start with '/mnt/backupNFS/oracle/ODA_DBNAME';

And finally restore the database :

RMAN> connect target / RMAN> run { 2> set newname for database to new; 3> restore database; 4> switch datafile all; 5> }

Convert the primary database to a physical standby database

In order to be able to recover the database we will convert the primary database to a physical standby one.

We can check the actual status and see that our database is a primary one in mounted state :

SQL> select status,instance_name,database_role,open_mode from v$database,v$Instance; STATUS INSTANCE_NAME DATABASE_ROLE OPEN_MODE ------------ ---------------- ---------------- -------------------- MOUNTED ODA_DBNAME PRIMARY MOUNTED

We will convert the database to a physical standby

SQL> alter database convert to physical standby; Database altered.

We need to restart the database.

SQL> shutdown immediate SQL> startup mount

We can check new database status

Get the current windows SCN database

We are now ready to recover the database and the application can be stopped. The next steps will now be executed during the maintenance windows. The windows database listener can be stopped to make sure there is no new connection.

We will make sure there is no existing application session on the database :

SQL> set linesize 300 SQL> set pagesize 500 SQL> col machine format a20 SQL> col service_name format a20 SQL> select SID, serial#, username, machine, process, program, status, service_name, logon_time from v$session where username not in ('SYS', 'PUBLIC') and username is not null order by status, username;

We will create a restore point :

SQL> create restore point for_migration_14082019; Restore point created.

We will get the last online log transactions archived :

SQL> ALTER SYSTEM ARCHIVE LOG CURRENT; System altered.

We will retrieve the SCN corresponding to the restore point :

SQL> col scn format 999999999999999 SQL> select scn from v$restore_point where lower(name)='for_migration_14082019'; SCN ---------------- 13069540631

We will backup the last archive log. This will be executed on the windows database using our dbi services internal DMK tool (https://www.dbi-services.com/offering/products/dmk-management-kit/) :

servicedbi@win_srv:E:apporaclelocaldmk_custombin [ODA_DBNAME] ./rman_backup_ODA_DBNAME_arc.bat E:apporaclelocaldmk_custombin>powershell.exe -command "E:apporaclelocaldmk_habincheck_primary.ps1 ODA_DBNAME 'dmk_rman.ps1 -s ODA_DBNAME -t bck_arc.rcv -c E:apporacleadminODA_DBNAMEetcrman.cfg [OK]::KSBL::RMAN::dmk_dbbackup::ODA_DBNAME::bck_arc.rcv Logfile is : E:apporacleadminODA_DBNAMElogODA_DBNAME_bck_arc_20190814_141754.log RMAN return Code: 0 2019-08-14_02:19:01::check_primary.ps1::MainProgram ::INFO ==> Program completed

Recover the database

The database can now be recovered till our 13069540631 SCN number.

We will first need to catalog new archive log backups :

RMAN> connect target / RMAN> catalog start with '/mnt/backupNFS/oracle/ODA_DBNAME';

And recover the database till SCN 13069540632 :

RMAN> connect target / RMAN> run { 2> set until scn 13069540632; 3> recover database; 4> } archived log file name=/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30098_go80084r_.arc RECID=30124 STAMP=1016289320 archived log file name=/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30099_go80084x_.arc thread=1 sequence=30099 channel default: deleting archived log(s) archived log file name=/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30099_go80084x_.arc RECID=30119 STAMP=1016289320 archived log file name=/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30100_go8008bg_.arc thread=1 sequence=30100 channel default: deleting archived log(s) archived log file name=/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30100_go8008bg_.arc RECID=30121 STAMP=1016289320 media recovery complete, elapsed time: 00:00:02 Finished recover at 14-AUG-2019 14:35:23

We can check the alert log and see that recovering has been performed until SCN 13069540632 :

oracle@ODA02:/u02/app/oracle/oradata/ODA_DBNAME_RZA/ODA_DBNAME_RZA/datafile/ [ODA_DBNAME] taa ORA-279 signalled during: alter database recover logfile '/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30098_go80084r_.arc'... alter database recover logfile '/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30099_go80084x_.arc' Media Recovery Log /u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30099_go80084x_.arc ORA-279 signalled during: alter database recover logfile '/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30099_go80084x_.arc'... alter database recover logfile '/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30100_go8008bg_.arc' Media Recovery Log /u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30100_go8008bg_.arc Wed Aug 14 14:35:23 2019 Incomplete Recovery applied until change 13069540632 time 08/14/2019 14:13:46 Media Recovery Complete (ODA_DBNAME) Completed: alter database recover logfile '/u03/app/oracle/fast_recovery_area/ODA_DBNAME_RZA/archivelog/2019_08_14/o1_mf_1_30100_go8008bg_.arc'

We can check the new ODA database current SCN :

SQL> col current_scn format 999999999999999 SQL> select current_scn from v$database; CURRENT_SCN ---------------- 13069540631

Convert database to primary again

Database can now be converted back to primary.
SQL> alter database activate standby database; Database altered.

At this step if the windows source database would be running 11.2.0.3 version, we could successfully upgrade the new ODA database to 11.2.0.4 following common oracle database upgrade process.

And finally we can open our database and have the database been migrated from windows to linux.

SQL> alter database open; Database altered.

oracle@ODA02:/u02/app/oracle/oradata/ODA_DBNAME_RZA/ODA_DBNAME_RZA/datafile/ [ODA_DBNAME] ODA_DBNAME ********* dbi services Ltd. ********* STATUS : OPEN DB_UNIQUE_NAME : ODA_DBNAME_RZA OPEN_MODE : READ WRITE LOG_MODE : ARCHIVELOG DATABASE_ROLE : PRIMARY FLASHBACK_ON : NO FORCE_LOGGING : YES VERSION : 11.2.0.4.0 *************************************

Post migration steps

There will be a few post migration steps to be executed.

Created redo logs again

Redo logs are still stamped with windows path and therefore have been created in $ORACLE_HOME/dbs folder. In this steps we will create new OMF one again.

Checking current online log members :

SQL> set linesize 300 SQL> set pagesize 500 SQL> col member format a100 SQL> select a.GROUP#, b.member, a.status, a.bytes/1024/1024 MB from v$log a, v$logfile b where a.GROUP#=b.GROUP#; GROUP# MEMBER STATUS MB ---------- ---------------------------------------------------------------------------------------------------- ---------------- ---------- 6 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_6_1.LOG UNUSED 500 6 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_6_2.LOG UNUSED 500 5 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_5_2.LOG UNUSED 500 5 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_5_1.LOG UNUSED 500 4 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_4_2.LOG UNUSED 500 4 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_4_1.LOG UNUSED 500 3 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_3_2.LOG UNUSED 500 3 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_3_1.LOG UNUSED 500 2 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_2_2.LOG UNUSED 500 2 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_2_1.LOG UNUSED 500 1 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_1_2.LOG CURRENT 500 1 /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_1_1.LOG CURRENT 500

Drop the first unused redo log group keeping only one :

SQL> alter database drop logfile group 6; Database altered. SQL> alter database drop logfile group 5; Database altered. SQL> alter database drop logfile group 4; Database altered. SQL> alter database drop logfile group 3; Database altered. SQL> alter database add logfile group 3 size 500M; Database altered.

Create the recent dropped group again :

SQL> alter database add logfile group 3 size 500M; Database altered. SQL> alter database add logfile group 4 size 500M; Database altered. SQL> alter database add logfile group 5 size 500M; Database altered. SQL> alter database add logfile group 6 size 500M; Database altered.

Drop the last unused redo log group and create it again :

SQL> alter database drop logfile group 2; Database altered. SQL> alter database add logfile group 2 size 500M; Database altered.

Execute a switch log file and checkpoint so the current redo group becomes unused :

SQL> alter system switch logfile; System altered. SQL> alter system checkpoint; System altered.

Drop it and create it again :

SQL> alter database drop logfile group 1; Database altered. SQL> alter database add logfile group 1 size 500M; Database altered.

Check redo group members :

SQL> select a.GROUP#, b.member, a.status, a.bytes/1024/1024 MB from v$log a, v$logfile b where a.GROUP#=b.GROUP#; GROUP# MEMBER STATUS MB ---------- ---------------------------------------------------------------------------------------------------- ---------------- ---------- 3 /u03/app/oracle/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_3_go81rj4t_.log INACTIVE 500 3 /u02/app/oracle/oradata/ODA_DBNAME_RZA/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_3_go81rjqn_.log INACTIVE 500 4 /u03/app/oracle/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_4_go81ron1_.log UNUSED 500 4 /u02/app/oracle/oradata/ODA_DBNAME_RZA/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_4_go81rp6o_.log UNUSED 500 5 /u03/app/oracle/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_5_go81rwhs_.log UNUSED 500 5 /u02/app/oracle/oradata/ODA_DBNAME_RZA/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_5_go81rx1g_.log UNUSED 500 6 /u03/app/oracle/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_6_go81s1rk_.log UNUSED 500 6 /u02/app/oracle/oradata/ODA_DBNAME_RZA/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_6_go81s2bx_.log UNUSED 500 2 /u03/app/oracle/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_2_go81sgdf_.log CURRENT 500 2 /u02/app/oracle/oradata/ODA_DBNAME_RZA/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_2_go81sgxd_.log CURRENT 500 1 /u03/app/oracle/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_1_go81vpls_.log UNUSED 500 1 /u02/app/oracle/oradata/ODA_DBNAME_RZA/redo/ODA_DBNAME_RZA/onlinelog/o1_mf_1_go81vq4v_.log UNUSED 500

Delete the wrong previous redo log members files :

oracle@ODA02:/u02/app/oracle/oradata/ODA_DBNAME_RZA/ODA_DBNAME_RZA/datafile/ [ODA_DBNAME] cdh oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/ [ODA_DBNAME] cd dbs oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] ls -ltrh *REDO*.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 14:59 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_6_2.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 14:59 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_6_1.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 14:59 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_5_2.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 14:59 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_5_1.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 14:59 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_4_2.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 14:59 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_4_1.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 14:59 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_3_2.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 14:59 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_3_1.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 14:59 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_2_2.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 14:59 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_2_1.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 15:05 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_1_2.LOG -rw-r----- 1 oracle asmadmin 501M Aug 14 15:05 I:FAST_RECOVERY_AREAODA_DBNAME_SITE1ONLINELOGREDO_1_1.LOG oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] rm *REDO*.LOG

Created temp file again

Checking current temp file we can see that the path is still the windows one :

SQL> set linesize 300 SQL> col name format a100 SQL> select b.name, b.status, b.bytes/1024/1024 MB, a.name from v$tablespace a, v$tempfile b where a.TS#=b.TS#; NAME STATUS MB NAME ---------------------------------------------------------------------------------------------------- ------- ---------- ------------------------------------------- F:ORADATAODA_DBNAMETEMPORARY_DATA_1.DBF ONLINE 8192 TEMPORARY_DATA

We can check that the default temporary tablespace is TEMPORARY_DATA

SQL> col property_value format a50 SQL> select property_name, property_value from database_properties where property_name like '%DEFAULT%TABLESPACE%'; PROPERTY_NAME PROPERTY_VALUE ------------------------------ -------------------------------------------------- DEFAULT_TEMP_TABLESPACE TEMPORARY_DATA DEFAULT_PERMANENT_TABLESPACE USER_DATA

Let’s create a new temp tablespace and make it the default one

SQL> create temporary tablespace TEMP tempfile size 8G; Tablespace created. SQL> alter database default temporary tablespace TEMP; Database altered. SQL> select property_name, property_value from database_properties where property_name like '%DEFAULT%TABLESPACE%'; PROPERTY_NAME PROPERTY_VALUE ------------------------------ -------------------------------------------------- DEFAULT_TEMP_TABLESPACE TEMP DEFAULT_PERMANENT_TABLESPACE USER_DATA

Drop previous TEMPORARY_DATA tablespace

SQL> drop tablespace TEMPORARY_DATA including contents and datafiles; Tablespace dropped. SQL> select b.file#, b.name, b.status, b.bytes/1024/1024 MB, a.name from v$tablespace a, v$tempfile b where a.TS#=b.TS#; FILE# NAME STATUS MB NAME ---------- ---------------------------------------------------------------------------------------------------- ------- ---------- 3 /u02/app/oracle/oradata/ODA_DBNAME_RZA/ODA_DBNAME_RZA/datafile/o1_mf_temp_go83m1tp_.tmp ONLINE 8192 TEMP

Create TEMPORARY_DATA tablespace again and make it the default one :

SQL> create temporary tablespace TEMPORARY_DATA tempfile size 8G; Tablespace created. SQL> select b.file#, b.name, b.status, b.bytes/1024/1024 MB, a.name from v$tablespace a, v$tempfile b where a.TS#=b.TS#; FILE# NAME STATUS MB NAME ---------- ---------------------------------------------------------------------------------------------------- ------- ---------- 1 /u02/app/oracle/oradata/ODA_DBNAME_RZA/ODA_DBNAME_RZA/datafile/o1_mf_temporar_go83wfd7_.tmp ONLINE 8192 TEMPORARY_DATA 3 /u02/app/oracle/oradata/ODA_DBNAME_RZA/ODA_DBNAME_RZA/datafile/o1_mf_temp_go83m1tp_.tmp ONLINE 8192 TEMP SQL> alter database default temporary tablespace TEMPORARY_DATA; Database altered. SQL> select property_name, property_value from database_properties where property_name like '%DEFAULT%TABLESPACE%'; PROPERTY_NAME PROPERTY_VALUE ------------------------------ -------------------------------------------------- DEFAULT_TEMP_TABLESPACE TEMPORARY_DATA DEFAULT_PERMANENT_TABLESPACE USER_DATA

And finally drop the intermediare temp tablespace :

SQL> drop tablespace TEMP including contents and datafiles; Tablespace dropped. SQL> select b.file#, b.name, b.status, b.bytes/1024/1024 MB, a.name from v$tablespace a, v$tempfile b where a.TS#=b.TS#; FILE# NAME STATUS MB NAME ---------- ---------------------------------------------------------------------------------------------------- ------- ---------- 1 /u02/app/oracle/oradata/ODA_DBNAME_RZA/ODA_DBNAME_RZA/datafile/o1_mf_temporar_go83wfd7_.tmp ONLINE 8192 TEMPORARY_DATA

Appropriate max size can be given to the new created temp tablespace

SQL> alter database tempfile '/u02/app/oracle/oradata/ODA_DBNAME_RZA/ODA_DBNAME_RZA/datafile/o1_mf_temporar_go83wfd7_.tmp' autoextend on maxsize 31G; Database altered.

Remove wrong temp file stored in $ORACLE_HOME/dbs

oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] ls -ltr -rw-r--r-- 1 oracle oinstall 2851 May 15 2009 init.ora -rw-r--r-- 1 oracle oinstall 64 Jul 25 08:10 initODA_DBNAME.ora.old -rw-r----- 1 oracle oinstall 2048 Jul 25 08:10 orapwODA_DBNAME -rw-r--r-- 1 oracle oinstall 67 Jul 25 08:31 initODA_DBNAME.ora -rw-r----- 1 oracle asmadmin 8589942784 Aug 14 08:14 F:ORADATAODA_DBNAMETEMPORARY_DATA_1.DBF -rw-rw---- 1 oracle asmadmin 1544 Aug 14 14:59 hc_ODA_DBNAME.dat -rw-r----- 1 oracle asmadmin 43466752 Aug 14 15:48 snapcf_ODA_DBNAME.f oracle@RZA-ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] rm F:ORADATAODA_DBNAMETEMPORARY_DATA_1.DBF

Apply specific ODA parameters

Following specific ODA parameters can be updated to the new created instance.

SQL> alter system set "_datafile_write_errors_crash_instance"=false scope=spfile; System altered. SQL> alter system set "_db_writer_coalesce_area_size"=16777216 scope=spfile; System altered. SQL> alter system set "_disable_interface_checking"=TRUE scope=spfile; System altered. SQL> alter system set "_ENABLE_NUMA_SUPPORT"=FALSE scope=spfile; System altered. SQL> alter system set "_FILE_SIZE_INCREASE_INCREMENT"=2143289344 scope=spfile; System altered. SQL> alter system set "_gc_policy_time"=0 scope=spfile; System altered. SQL> alter system set "_gc_undo_affinity"=FALSE scope=spfile; System altered. SQL> alter system set db_block_checking='FULL' scope=spfile; System altered. SQL> alter system set db_block_checksum='FULL' scope=spfile; System altered. SQL> alter system set db_lost_write_protect='TYPICAL' scope=spfile; System altered. SQL> alter system set sql92_security=TRUE scope=spfile; System altered. SQL> alter system set use_large_pages='only' scope=spfile; System altered.

“_fix_control”parameter is specific to Oracle12c and not compatible Oracle 11g. See Doc ID 2145105.1.

Register database in grid

After applying specific ODA instance parameters, we can register the database in the grid and start it with the grid.

oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] srvctl add database -d ODA_DBNAME_RZA -o /u01/app/oracle/product/11.2.0.4/dbhome_1 -c SINGLE -i ODA_DBNAME -x RZA-ODA02 -m ksbl.local -p /u02/app/oracle/oradata/ODA_DBNAME_RZA/dbs/spfileODA_DBNAME.ora -r PRIMARY -s OPEN -t IMMEDIATE -n ODA_DBNAME -j "/u02/app/oracle/oradata/ODA_DBNAME_RZA,/u03/app/oracle" SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] srvctl start database -d ODA_DBNAME_RZA oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] srvctl status database -d ODA_DBNAME_RZA Instance ODA_DBNAME is running on node rza-oda02 oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] ODA_DBNAME ********* dbi services Ltd. ********* STATUS : OPEN DB_UNIQUE_NAME : ODA_DBNAME_RZA OPEN_MODE : READ WRITE LOG_MODE : ARCHIVELOG DATABASE_ROLE : PRIMARY FLASHBACK_ON : NO FORCE_LOGGING : YES VERSION : 11.2.0.4.0 *************************************

We can check the well functionning :
oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] srvctl stop database -d ODA_DBNAME_RZA oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] srvctl status database -d ODA_DBNAME_RZA Instance ODA_DBNAME is not running on node rza-oda02 oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] ODA_DBNAME ********* dbi services Ltd. ********* STATUS : STOPPED ************************************* oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] srvctl start database -d ODA_DBNAME_RZA oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] srvctl status database -d ODA_DBNAME_RZA Instance ODA_DBNAME is running on node rza-oda02 oracle@ODA02:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [ODA_DBNAME] ODA_DBNAME ********* dbi services Ltd. ********* STATUS : OPEN DB_UNIQUE_NAME : ODA_DBNAME_RZA OPEN_MODE : READ WRITE LOG_MODE : ARCHIVELOG DATABASE_ROLE : PRIMARY FLASHBACK_ON : NO FORCE_LOGGING : YES VERSION : 11.2.0.4.0 *************************************

Conclusion

Going through a physical standby database, I was able to migrate successfully the windows databases into ODA linux one. I have been able to achieve migration of source 11.2.0.4 databases but also 11.2.0.3 database by adding an upgrade step in the process.

L’article Migrating Oracle database from windows to ODA est apparu en premier sur dbi Blog.

↧

Oracle 12c – pre-built join index

June 5, 2020, 12:09 pm

≫ Next: Oracle 12c – peak detection with MATCH_RECOGNIZE

≪ Previous: Migrating Oracle database from windows to ODA

By Franck Pachot

.
This post is part of a series of small examples of recent features. I’m running this in the Oracle 20c preview in the Oracle Cloud. I have created a few tables in the previous post with a mini-snowflake scheme: a fact table CASES with the covid-19 cases per country and day. And a dimension hierarchy for the country with COUNTRIES and CONTINENTS tables.

This title may look strange for people used to Oracle. I am showing the REFRESH FAST ON STATEMENT Materialized View clause here, also known as “Synchronous Refresh for Materialized Views”. This name makes sense only when you already know materialized views, complete and fast refreshes, on commit and on-demand refreshes… But that’s not what people will look for. Indexes are also refreshed by the statements, synchronously. Imagine that they were called “Synchronous Refresh for B*Trees”, do you think they would have been so popular?

A materialized view, like an index, is a redundant structure where data is stored in a different physical layout in order to be optimal for alternative queries. For example, you ingest data per date (which is the case in my covid-19 table – each day a new row with the covid-19 cases per country). But if I want to query all points for a specific country, those are scattered though the physical segment that is behind the table (or the partition). With an index on the country_code, I can identify easily one country, because the index is sorted on the country. I may need to go to the table to get the rows, and that is expensive, but I can avoid it by adding all the attributes in the index. With Oracle, as with many databases, we can build covering indexes, for real index-only access, even if they don’t mention those names.

But with my snowflake schema, I’ll not have the country_code in the fact table and I have to join to a dimension. This is more expensive because the index on the country_name will get the country_id and then I have to go to an index on the fact table to get the rows for this country_id. When it comes to joins, I cannot index the result of the join (I’m skipping bitmap join indexes here because I’m talking about covering indexes). What I would like is an index with values from multiple tables.

A materialized view can achieve much more than an index. We can build the result of the join in one table. And no need for event sourcing or streaming here to keep it up to date. No need to denormalize and risk inconsistency. When NoSQL pioneers tell you that storage is cheap and redundancy is the way to scale, just keep your relational database for integrity and build materialized views on top. When they tell you that joins are expensive, just materialize them upfront. Before 12c, keeping those materialized views consistent with the source required either:

materialized view logs which is similar to event sourcing except that ON COMMIT refresh is strongly consistent
partition change tracking which is ok for bulk changes, when scaling big data

This is different from indexes which are maintained immediately: when you update the row, the index is synchronized because your session has the values and the rowid and can go directly to update the index entry.

refresh fast on statement

In 12c you have the benefit from both: index-like fast maintenance with rowid access, and the MView possibility of querying pre-build joins. Here is an example on the tables created in the previous post.


SQL> 
 create materialized view flatview refresh fast on statement as
  select daterep,continent_name,country_name,cases from cases join countries using(country_id) join continents 
  using(continent_id) where cases>0;

select daterep,continent_name,country_name,cases from cases join countries using(country_id) join continents using(continent_id) where cases>0
                                                                                                                                             *
ERROR at line 2:
ORA-12015: cannot create a fast refresh materialized view from a complex query

There are some limitations when we want fast refresh and we have a utility to help us understand what we have to change or add in our select clause.

explain_mview

I need to create the table where the messages will be written to by this utility:


@ ?/rdbms/admin/utlxmv

SQL> 
set sqlformat ansiconsole
SQL> 
set pagesize 10000

This has created mv_capabilities_table and I can run dbms_mview.explain_mview() now.

Here is the call, with the select part of the materialized view:


SQL> 
exec dbms_mview.explain_mview('-
  select daterep,continent_name,country_name,cases from cases join countries using(country_id) join continents using(continent_id) where cases>0-
  ');

PL/SQL procedure successfully completed.

SQL> 
select possible "?",capability_name,related_text,msgtxt from mv_capabilities_table where capability_name like 'REFRESH_FAST%' order by seq;

   ?                  CAPABILITY_NAME    RELATED_TEXT                                                                 MSGTXT
____ ________________________________ _______________ ______________________________________________________________________
N    REFRESH_FAST
N    REFRESH_FAST_AFTER_INSERT                        inline view or subquery in FROM list not supported for this type MV
N    REFRESH_FAST_AFTER_INSERT                        inline view or subquery in FROM list not supported for this type MV
N    REFRESH_FAST_AFTER_INSERT                        view or subquery in from list
N    REFRESH_FAST_AFTER_ONETAB_DML                    see the reason why REFRESH_FAST_AFTER_INSERT is disabled
N    REFRESH_FAST_AFTER_ANY_DML                       see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
N    REFRESH_FAST_PCT                                 PCT FAST REFRESH is not possible if query contains an inline view

SQL> 
rollback;

Rollback complete.

“inline view or subquery in FROM list not supported for this type MV” is actually very misleading. I use ANSI joins and they are translated to query blocks and this is not supported.

No ANSI joins

I rewrite it with the old join syntax:


SQL> 
exec dbms_mview.explain_mview('-
  select daterep,continent_name,country_name,cases from cases , countries , continents where cases.country_id=countries.country_id and countries.continent_id=continents.continent_id and cases>0-
  ');

PL/SQL procedure successfully completed.

SQL> 
select possible "?",capability_name,related_text,msgtxt from mv_capabilities_table where capability_name like 'REFRESH_FAST%' order by seq;

   ?                  CAPABILITY_NAME       RELATED_TEXT                                                                      MSGTXT
____ ________________________________ __________________ ___________________________________________________________________________
N    REFRESH_FAST
N    REFRESH_FAST_AFTER_INSERT        CONTINENTS         the SELECT list does not have the rowids of all the detail tables
N    REFRESH_FAST_AFTER_INSERT        DEMO.CASES         the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_INSERT        DEMO.COUNTRIES     the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_INSERT        DEMO.CONTINENTS    the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_ONETAB_DML                       see the reason why REFRESH_FAST_AFTER_INSERT is disabled
N    REFRESH_FAST_AFTER_ANY_DML                          see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
N    REFRESH_FAST_PCT                                    PCT is not possible on any of the detail tables in the materialized view

SQL> 
rollback;

Rollback complete.

Now I need to add the ROWID of the table CONTINENTS in the materialized view.

ROWID for all tables

Yes, as I mentioned, the gap between indexes and materialized views is shorter. The REFRESH FAST ON STATEMENT requires access by rowid to update the materialized view, like when a statement updates an index.


SQL> 
exec dbms_mview.explain_mview('-
  select continents.rowid continent_rowid,daterep,continent_name,country_name,cases from cases , countries , continents where cases.country_id=countries.country_id and countries.continent_id=continents.continent_id and cases>0-
  ');

PL/SQL procedure successfully completed.

SQL> 
select possible "?",capability_name,related_text,msgtxt from mv_capabilities_table where capability_name like 'REFRESH_FAST%' order by seq;

   ?                  CAPABILITY_NAME       RELATED_TEXT                                                                      MSGTXT
____ ________________________________ __________________ ___________________________________________________________________________
N    REFRESH_FAST
N    REFRESH_FAST_AFTER_INSERT        COUNTRIES          the SELECT list does not have the rowids of all the detail tables
N    REFRESH_FAST_AFTER_INSERT        DEMO.CASES         the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_INSERT        DEMO.COUNTRIES     the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_INSERT        DEMO.CONTINENTS    the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_ONETAB_DML                       see the reason why REFRESH_FAST_AFTER_INSERT is disabled
N    REFRESH_FAST_AFTER_ANY_DML                          see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
N    REFRESH_FAST_PCT                                    PCT is not possible on any of the detail tables in the materialized view
SQL> 
rollback;

Rollback complete.

Now, the ROWID for COUNTRIES.

I continue the and finally I’ve added ROWID for all tables involved:


SQL> 
exec dbms_mview.explain_mview('-
  select continents.rowid continent_rowid,daterep,continent_name,country_name,cases from cases , countries , continents where cases.country_id=countries.country_id and countries.continent_id=continents.continent_id and cases>0-
  ');

PL/SQL procedure successfully completed.

SQL> 
select possible "?",capability_name,related_text,msgtxt from mv_capabilities_table where capability_name like 'REFRESH_FAST%' order by seq;

   ?                  CAPABILITY_NAME       RELATED_TEXT                                                                      MSGTXT
____ ________________________________ __________________ ___________________________________________________________________________
N    REFRESH_FAST
N    REFRESH_FAST_AFTER_INSERT        COUNTRIES          the SELECT list does not have the rowids of all the detail tables
N    REFRESH_FAST_AFTER_INSERT        DEMO.CASES         the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_INSERT        DEMO.COUNTRIES     the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_INSERT        DEMO.CONTINENTS    the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_ONETAB_DML                       see the reason why REFRESH_FAST_AFTER_INSERT is disabled
N    REFRESH_FAST_AFTER_ANY_DML                          see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
N    REFRESH_FAST_PCT                                    PCT is not possible on any of the detail tables in the materialized view

SQL> 
rollback;

Rollback complete.

SQL> 
exec dbms_mview.explain_mview('-
  select cases.rowid case_rowid,countries.rowid country_rowid,continents.rowid continent_rowid,daterep,continent_name,country_name,cases from cases , countries , continents where cases.country_id=countries.country_id and countries.continent_id=continents.continent_id and cases>0-
  ');

PL/SQL procedure successfully completed.

SQL> select possible "?",capability_name,related_text,msgtxt from mv_capabilities_table where capability_name like 'REFRESH_FAST%' order by seq;
   ?                  CAPABILITY_NAME       RELATED_TEXT                                                                      MSGTXT
____ ________________________________ __________________ ___________________________________________________________________________
N    REFRESH_FAST
N    REFRESH_FAST_AFTER_INSERT        DEMO.CASES         the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_INSERT        DEMO.COUNTRIES     the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_INSERT        DEMO.CONTINENTS    the detail table does not have a materialized view log
N    REFRESH_FAST_AFTER_ONETAB_DML                       see the reason why REFRESH_FAST_AFTER_INSERT is disabled
N    REFRESH_FAST_AFTER_ANY_DML                          see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
N    REFRESH_FAST_PCT                                    PCT is not possible on any of the detail tables in the materialized view

SQL> 
rollback;

Rollback complete.

Ok, now another message: “the detail table does not have a materialized view log”. But that’s exactly the purpose of statement-level refresh: being able to fast refresh without creating and maintaining materialized view logs, and without full-refreshing a table or a partition.

This’t the limit of DBMS_MVIEW.EXPLAIN_MVIEW. Let’s try to create the materialized view now:


SQL> 
create materialized view flatview refresh fast on statement as
  select cases.rowid case_rowid,countries.rowid country_rowid,continents.rowid continent_rowid,daterep,continent_name,country_name,cases from cases , countries , continents where cases.country_id=countries.country_id and countries.continent_id=continents.continent_id and cases>0;

select cases.rowid case_rowid,countries.rowid country_rowid,continents.rowid continent_rowid,daterep,continent_name,country_name,cases from cases , countries , continents where cases.country_id=countries.country_id and countries.continent_id=continents.continent_id and cases>0
                                                                                                                                                                                                                                                                                    *
ERROR at line 2:
ORA-32428: on-statement materialized join view error: Shape of MV is not
supported(composite PK)

SQL>

That’s clear. I had created the fact primary key on the compound foreign keys.

Surrogate key on fact table

This is not allowed by statement-level refresh, so let’s change that:


SQL> 
alter table cases add (case_id number);

Table altered.

SQL> 
update cases set case_id=rownum;

21274 rows updated.

SQL> 
alter table cases drop primary key;

Table altered.

SQL> 
alter table cases add primary key(case_id);

Table altered.

SQL> 
alter table cases add unique(daterep,country_id);
Table altered.

I have added a surrogate key and defined a unique key for the composite one.

Now the creation is sucessful:


SQL> 
create materialized view flatview refresh fast on statement as
  select cases.rowid case_rowid,countries.rowid country_rowid,continents.rowid continent_rowid,daterep,continent_name,country_name,cases from cases , countries , continents where cases.country_id=countries.country_id and countries.continent_id=continents.continent_id and cases>0;

Materialized view created.

Note that I tested later and I am able to create it with the ROWID from the fact table CASES only. But that’s not a good idea: in order to propagate any change to the underlying tables, the materialized view must have the ROWID, like an index. I consider as a bug the possibility to do it.

Here are the columns stored in my materialized view:


SQL> 
desc flatview

              Name    Null?            Type
__________________ ________ _______________
CASE_ROWID                  ROWID
COUNTRY_ROWID               ROWID
CONTINENT_ROWID             ROWID
DATEREP                     VARCHAR2(10)
CONTINENT_NAME              VARCHAR2(30)
COUNTRY_NAME                VARCHAR2(60)
CASES                       NUMBER

Storing the ROWID is not something we should recommend as some maintenance operations may change the physical location of rows. You will need to complete refresh the materialized view after an online move for example.

No-join query

I’ll show query rewrite in another blog post. For the moment, I’ll query this materialized view directly.

Here is a query similar to the one in the previous post:


SQL> 
select continent_name,country_name,top_date,top_cases from (
 select continent_name,country_name,daterep,cases
  ,first_value(daterep)over(partition by continent_name order by cases desc) top_date
  ,first_value(cases)over(partition by continent_name order by cases desc)top_cases
  ,row_number()over(partition by continent_name order by cases desc) r
  from flatview
 )
 where r=1 order by top_cases
;

   CONTINENT_NAME                COUNTRY_NAME      TOP_DATE    TOP_CASES
_________________ ___________________________ _____________ ____________
Oceania           Australia                   23/03/2020             611
Africa            South_Africa                05/06/2020            3267
Europe            Russia                      12/05/2020           11656
Asia              China                       13/02/2020           15141
America           United_States_of_America    26/04/2020           48529

I have replaced the country_id and continent_id by their name as I didn’t put them in my materialized view. And I repeated the window function everywhere if you want to run the same in versions lower than 20c.

This materialized view is a table. I can partition it by hash to scatter the data. I can cluster on another column. I can add indexes. I have the full power of a SQL databases on it, without the need to join if you think that joins are slow. If you come from NoSQL you can see it like a DynamoDB global index. You can query it without joining, fetching all attributes with one call, and filtering on another key than the primary key. But here we have always strong consistency: the changes are replicated immediately, fully ACID. They will be committed or rolled back by the same transaction that did the change. They will be replicated synchronously or asynchronously with read-only replicas.

DML on base tables

Let’s do some changes here, lowering the covid-19 cases of CHN to 42%:


SQL> 
alter session set sql_trace=true;

Session altered.

SQL> 
update cases set cases=cases*0.42 where country_id=(select country_id from countries where country_code='CHN');

157 rows updated.

SQL> 
alter session set sql_trace=false;

Session altered.

I have set sql_trace because I want to have a look at the magic behind it.

Now running my query on the materialized view:



SQL> 
select continent_name,country_name,top_date,top_cases from (
 select continent_name,country_name,daterep,cases
  ,first_value(daterep)over(partition by continent_name order by cases desc) top_date
  ,first_value(cases)over(partition by continent_name order by cases desc)top_cases
  ,row_number()over(partition by continent_name order by cases desc) r
  from flatview
 )
 where r=1 order by top_cases
;

   CONTINENT_NAME                COUNTRY_NAME      TOP_DATE    TOP_CASES
_________________ ___________________________ _____________ ____________
Oceania           Australia                   23/03/2020             611
Africa            South_Africa                05/06/2020            3267
Asia              India                       05/06/2020            9851
Europe            Russia                      12/05/2020           11656
America           United_States_of_America    26/04/2020           48529

CHN is not the top one in Asia anymore with the 42% correction.

The changes were immediately propagated to the materialized view like when indexes are updated, and we can see that in the trace:


SQL> 
column value new_value tracefile
SQL> 
select value from v$diag_info where name='Default Trace File';
                                                                     VALUE
__________________________________________________________________________
/u01/app/oracle/diag/rdbms/cdb1a_iad154/CDB1A/trace/CDB1A_ora_49139.trc


SQL> 
column value clear
SQL> 
host tkprof &tracefile trace.txt

TKPROF: Release 20.0.0.0.0 - Development on Thu Jun 4 15:43:13 2020

Copyright (c) 1982, 2020, Oracle and/or its affiliates.  All rights reserved.

SQL> 
host awk '/"FLATVIEW/,/^[*]/' trace.txt

sql_trace instruments all executions with time and number of rows. tkprof aggregates those for analysis.

The trace shows two statements on my materialized view: DELETE and INSERT.

The first one is about removing the modified rows.


DELETE FROM "DEMO"."FLATVIEW"
WHERE
 "CASE_ROWID" = :1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      158      0.00       0.00          0          0          0           0
Execute    158      0.02       0.10         38        316        438         142
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      316      0.02       0.10         38        316        438         142

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 634     (recursive depth: 1)
Number of plan statistics captured: 3

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  DELETE  FLATVIEW (cr=2 pr=0 pw=0 time=2080 us starts=1)
         1          0          1   INDEX UNIQUE SCAN I_OS$_FLATVIEW (cr=2 pr=0 pw=0 time=2055 us starts=1 cost=1 size=10 card=1)(object id 78728)

This has been done row-by-row but is optimized with an index on ROWID that has been created autonomously with my materialized view.

The second one is inserting the modified rows:


INSERT INTO  "DEMO"."FLATVIEW" SELECT "CASES".ROWID "CASE_ROWID",
  "COUNTRIES".ROWID "COUNTRY_ROWID","CONTINENTS".ROWID "CONTINENT_ROWID",
  "CASES"."DATEREP" "DATEREP","CONTINENTS"."CONTINENT_NAME" "CONTINENT_NAME",
  "COUNTRIES"."COUNTRY_NAME" "COUNTRY_NAME","CASES"."CASES" "CASES" FROM
  "CONTINENTS" "CONTINENTS","COUNTRIES" "COUNTRIES", (SELECT "CASES".ROWID
  "ROWID","CASES"."DATEREP" "DATEREP","CASES"."CASES" "CASES",
  "CASES"."COUNTRY_ID" "COUNTRY_ID" FROM "DEMO"."CASES" "CASES" WHERE
  "CASES".ROWID=(:Z)) "CASES" WHERE "CASES"."COUNTRY_ID"=
  "COUNTRIES"."COUNTRY_ID" AND "COUNTRIES"."CONTINENT_ID"=
  "CONTINENTS"."CONTINENT_ID" AND "CASES"."CASES">0

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      158      0.00       0.00          0          0          0           0
Execute    158      0.01       0.11          0        734        616         142
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      316      0.02       0.12          0        734        616         142


Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 634     (recursive depth: 1)
Number of plan statistics captured: 3

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD TABLE CONVENTIONAL  FLATVIEW (cr=5 pr=0 pw=0 time=235 us starts=1)
         1          0          1   NESTED LOOPS  (cr=2 pr=0 pw=0 time=25 us starts=1 cost=3 size=52 card=1)
         1          0          1    NESTED LOOPS  (cr=2 pr=0 pw=0 time=20 us starts=1 cost=2 size=42 card=1)
         1          0          1     TABLE ACCESS BY USER ROWID CASES (cr=1 pr=0 pw=0 time=14 us starts=1 cost=1 size=22 card=1)
         1          0          1     TABLE ACCESS BY INDEX ROWID COUNTRIES (cr=1 pr=0 pw=0 time=5 us starts=1 cost=1 size=20 card=1)
         1          0          1      INDEX UNIQUE SCAN SYS_C009414 (cr=0 pr=0 pw=0 time=3 us starts=1 cost=0 size=0 card=1)(object id 78716)
         1          0          1    TABLE ACCESS BY INDEX ROWID CONTINENTS (cr=1 pr=0 pw=0 time=2 us starts=1 cost=1 size=10 card=1)
         1          0          1     INDEX UNIQUE SCAN SYS_C009412 (cr=0 pr=0 pw=0 time=1 us starts=1 cost=0 size=0 card=1)(object id 78715)

Again, a row-by-row insert apparently as the “execute count” is nearly the same as the “rows count”. 157 is the number of rows I have updated.

You may think that this is a huge overhead, but those operations are optimized for a long time. The materialized view is refreshed and ready for optimal queries: no need to queue, stream, reorg, vacuum,… And I can imagine that if this feature is used, it will be optimized with bulk operations which would allow compression.

Truncate

This looks all good. But… what happens if I truncate the table?


SQL> 
truncate table cases;

Table truncated.

SQL> 
select continent_name,country_name,top_date,top_cases from (
 select continent_name,country_name,daterep,cases
  ,first_value(daterep)over(partition by continent_name order by cases desc) top_date
  ,first_value(cases)over(partition by continent_name order by cases desc)top_cases
  ,row_number()over(partition by continent_name order by cases desc) r
  from flatview
 )
 where r=1 order by top_cases
;

   CONTINENT_NAME                COUNTRY_NAME      TOP_DATE    TOP_CASES
_________________ ___________________________ _____________ ____________
Oceania           Australia                   23/03/2020             611
Africa            South_Africa                05/06/2020            3267
Asia              India                       05/06/2020            9851
Europe            Russia                      12/05/2020           11656
America           United_States_of_America    26/04/2020           48529

Nothing changed. This is dangerous. You need to refresh it yourself. This may be a bug. What will happen if you insert data back? Note that, like with triggers, direct-path inserts will be transparently run as conventional inserts.


SQL> 
exec dbms_mview.refresh('DEMO.FLATVIEW');
SQL> 
insert into cases select rownum cases_id,daterep, geoid country_id,cases from covid where continentexp!='Other';

21483 rows created.

SQL> 
commit;

   CONTINENT_NAME                COUNTRY_NAME      TOP_DATE    TOP_CASES
_________________ ___________________________ _____________ ____________
Oceania           Australia                   23/03/2020             611
Africa            South_Africa                05/06/2020            3267
Europe            Russia                      12/05/2020           11656
Asia              China                       13/02/2020           15141
America           United_States_of_America    26/04/2020           48529

Joins are not expensive

This feature is really good to pre-build the joins in a composition of tables, as a hierarchical key-value, or snowflake dimension fact table. You can partition, compress, order, filter, index,… as with any relational table. There no risk here with the denormalization as it is transparently maintained when you update the underlying tables.

If you develop on a NoSQL database because you have heard that normalization was invented to reduce storage, which is not nexpensive anymore, that’s a myth (you can read this long thread to understand the origin of this myth). Normalization is about database integrity and separation lof logical and physical layers. And that’s what Oracle Database implements with this feature: you update the logical view, tables are normalized for integrity, and the physical layer transparently maintains additional structures like indexes and materialized views to keep queries under single-digit milliseconds. Today you still need to think about indexes and materialized views to build. Some advisors may help. All those are the bricks for the future: an autonomous database where you define only the logical layer for your application and all those optimisations will be done in background.

L’article Oracle 12c – pre-built join index est apparu en premier sur dbi Blog.

↧

Oracle 12c – peak detection with MATCH_RECOGNIZE

June 8, 2020, 5:39 am

≫ Next: Oracle 12c – reorg and split table with clustering

≪ Previous: Oracle 12c – pre-built join index

By Franck Pachot

.
This post is part of a series of small examples of recent features. I’m running this in the Oracle 20c preview in the Oracle Cloud. I’ll show a very basic example of “Row Pattern Recognition” (the MATCH_RECOGNIZE clause in a SELECT which is documented as “row pattern matching in native SQL” feature by Oracle”). You may be afraid of those names. Of course, because SQL is a declarative language there is a small learning curve to get beyond this abstraction. Understanding procedurally how it works may help. But when you understand the declarative nature it is really powerful. This post is there to start simple on a simple table with time series where I just want to detect peaks (the points where the value goes up and then down).

Historically, a SELECT statement was operating on single rows (JOIN, WHERE, SELECT) within a set, or an aggregation of rows (GROUP BY, HAVING) to provide a summary. Analytic functions can operate on windows of rows (PARTITION BY, ORDER BY, ROWS BETWEEN,…) where you keep the detailed level or rows and compare it to the aggregated values of the group. A row can then look at its neighbours and when needing to go further, the SQL MODEL can build the equivalent of spreadsheet cells to reference other rows and columns. As in a spreadsheet, you can also PIVOT to move row detail to columns or vice versa. All that can be done in SQL, which means that you don’t code how to do it but just define the result you want. However, there’s something that is easy to do in a spreadsheet application like Excel but not easy to code with analytic functions: looking at a Chart, as a Line Graph, to detect some behaviour. That’s something we can code in SQL with MATCH_RECOGNIZE.

For example, from the “COVID” table I have imported in the previous post I want to see each peak of covid-19 cases in Switzerland:

I did this manually in Excel: showing all labels but keeping only those that are at a peak, whether it is a small peak or high one. There’s one value per day in this timeseries but I’m am not interested by the intermediate values. Only peaks. So, this was done from the .csv imported from http://opendata.ecdc.europa.eu/covid19/casedistribution/csv/ through an external table but, as I imported it into an Oracle table for the previous post (Oracle 18c – select from a flat file).

Ok, let’s show directly the result. Here is a small SQL statement that show me exactly those peaks, each match being numbered:


SQL> select countriesandterritories "Country","Peak date","Peak cases","match#"
  2  from covid
  3  match_recognize (
  4   partition by continentexp, countriesandterritories order by daterep
  5   measures
  6    match_number() as "match#",
  7    last(GoingUp.dateRep) as "Peak date",
  8    last(GoingUp.cases) as "Peak cases"
  9   one row per match
 10   pattern (GoingUp+ GoingDown+)
 11   define
 12    GoingUp as ( GoingUp.cases > prev(GoingUp.cases) ),
 13    GoingDown as ( GoingDown.cases < prev(GoingDown.cases))
 14  )
 15  where countriesandterritories='Switzerland';

       Country    Peak date    Peak cases    match#
______________ ____________ _____________ _________
Switzerland    26-FEB-20                1         1
Switzerland    28-FEB-20                7         2
Switzerland    07-MAR-20              122         3
Switzerland    09-MAR-20               68         4
Switzerland    14-MAR-20              267         5
Switzerland    16-MAR-20              841         6
Switzerland    18-MAR-20              450         7
Switzerland    22-MAR-20             1237         8
Switzerland    24-MAR-20             1044         9
Switzerland    28-MAR-20             1390        10
Switzerland    31-MAR-20             1138        11
Switzerland    03-APR-20             1124        12
Switzerland    08-APR-20              590        13
Switzerland    10-APR-20              785        14
Switzerland    16-APR-20              583        15
Switzerland    18-APR-20              346        16
Switzerland    20-APR-20              336        17
Switzerland    24-APR-20              228        18
Switzerland    26-APR-20              216        19
Switzerland    01-MAY-20              179        20
Switzerland    09-MAY-20               81        21
Switzerland    11-MAY-20               54        22
Switzerland    17-MAY-20               58        23
Switzerland    21-MAY-20               40        24
Switzerland    24-MAY-20               18        25
Switzerland    27-MAY-20               15        26
Switzerland    29-MAY-20               35        27
Switzerland    06-JUN-20               23        28


28 rows selected.

Doing that with analytic functions or MODEL clause is possible, but not easy.

So let’s explain the clauses in this simple example.

Define

I’ll need to define what is a peak. For that, I need to define two very primary patterns. The value I’m looking for, which is the one you see on the graph, is the column “CASES”, which is the number of covid-19 cases for the day and country. How do you detect peaks visually? Like when hiking in mountains: it goes up and when you continue it goes down. Here are those two primary patterns:


 11   define
 12    GoingUp as ( GoingUp.cases >= prev(GoingUp.cases) ),
 13    GoingDown as ( GoingDown.cases < prev(GoingDown.cases))

“GoingUp” matches a row where “cases” value is higher than the preceding row and “GoingDown” matches a row where “cases” is lower than the preceding one. The sense of “preceding one”, of course, depends on an order, like with analytic functions. We will see it below.

Pattern

A peak is when a row matches GoingDown just after matching GoingUp. That’s simple but you can imagine crazy things that a data scientist would want to recognize. And then the MATCH_RECOGNIZE defines patterns in a similar way as Regular Expressions: mentioning the primary patterns in a sequence with some modifiers. Mine is so simple:


 10   pattern (GoingUp+ GoingDown+)

This means: one or more GoingUp followed by one or more GoingDown. This is exactly what I did in the graph above: ignore intermediate points. So, the primary pattern compares a row with the preceding only and consecutive comparisons are walked through and compared with the pattern.

Partition by

As mentioned, I follow the rows in order. For a timeseries, this is simple: the key is the country here, I partition by continent and country, and the order (x-axis) is the date. I’m looking at the peaks per country when the value (“cases”) is ordered by date (“daterep”):


  2  from covid
...
  4   partition by continentexp, countriesandterritories order by daterep
...
 15* where countriesandterritories='Switzerland';

I selected only my country here with a standard where clause, to show simple things.

Measures

Eatch time a pattern is recognized, I want to display only one row (“ONE ROW PER MATCH”) with some measures for it. Of course, I must access to the point I’m interested in: the x-axis date and y-axis value for it. I can reference points within the matching window and I use the pattern variables to reference them. The peak is the last row in the “GoingUp” primary pattern and last(GoingUp.dateRep) and last(GoingUp.cases) are my points:


  5   measures
  6    match_number() as "match#",
  7    last(GoingUp.dateRep) as "Peak date",
  8    last(GoingUp.cases) as "Peak cases"
  9   one row per match

Those measures are accessible in the SELECT clause of my SQL statement. I added the match_number() to identify the points.

Here is the final query, with the partition, measures, pattern and define clauses within the MATCH_RECOGNIZE():


select countriesandterritories "Country","Peak date","Peak cases","match#"
from covid
match_recognize (
 partition by continentexp, countriesandterritories order by daterep
 measures
  match_number() as "match#",
  last(GoingUp.dateRep) as "Peak date",
  last(GoingUp.cases) as "Peak cases"
 one row per match
 pattern (GoingUp+ GoingDown+)
 define
  GoingUp as ( GoingUp.cases > prev(GoingUp.cases) ),
  GoingDown as ( GoingDown.cases < prev(GoingDown.cases))
)
where countriesandterritories='Switzerland';

The full syntax can have more and of course all is documented: https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8982

Debug mode

In order to understand how it works (and debug) we can display “all rows” (ALL ROWS PER MATCH instead of ONE ROW PER MATCH in line 9), and add the row columns (DATEREP and CASES in line 1) and, in addition to the match_number() I have added the classifier() measure:


  1  select countriesandterritories "Country","Peak date","Peak cases","match#",daterep,cases,"classifier"
  2  from covid
  3  match_recognize (
  4   partition by continentexp, countriesandterritories order by daterep
  5   measures
  6    match_number() as "match#", classifier() as "classifier",
  7    last(GoingUp.dateRep) as "Peak date",
  8    last(GoingUp.cases) as "Peak cases"
  9   all rows per match
 10   pattern (GoingUp+ GoingDown+)
 11   define
 12    GoingUp as ( GoingUp.cases > prev(GoingUp.cases) ),
 13    GoingDown as ( GoingDown.cases < prev(GoingDown.cases))
 14  )
 15* where countriesandterritories='Switzerland';

“all rows per match” shows all rows where pattern matching is tested, classifier() shows which primary pattern is matched.

Here are the rows around the 10th match. You must keep in mind that rows are processed in order and for each row, it looks ahead to recognize a pattern.


       Country    Peak date    Peak cases    match#      DATEREP    CASES    classifier
______________ ____________ _____________ _________ ____________ ________ _____________
...
Switzerland    24-MAR-20             1044         9 24-MAR-20        1044 GOINGUP
Switzerland    24-MAR-20             1044         9 25-MAR-20         774 GOINGDOWN
Switzerland    26-MAR-20              925        10 26-MAR-20         925 GOINGUP
Switzerland    27-MAR-20             1000        10 27-MAR-20        1000 GOINGUP
Switzerland    28-MAR-20             1390        10 28-MAR-20        1390 GOINGUP
Switzerland    28-MAR-20             1390        10 29-MAR-20        1048 GOINGDOWN
Switzerland    30-MAR-20             1122        11 30-MAR-20        1122 GOINGUP
Switzerland    31-MAR-20             1138        11 31-MAR-20        1138 GOINGUP              
Switzerland    31-MAR-20             1138        11 01-APR-20         696 GOINGDOWN  
Switzerland    02-APR-20              962        12 02-APR-20         962 GOINGUP
Switzerland    03-APR-20             1124        12 03-APR-20        1124 GOINGUP
Switzerland    03-APR-20             1124        12 04-APR-20        1033 GOINGDOWN

You see here how we came to output the 10th matched (28-MAR-20 1390 cases). After the peak of 24-MAR-20 we were going down the next day 25-MAR-20 (look at the graph). This was included in the 10th match because of regular expression “GoingDown+”. Then up 26-MAR-2020 to 28-MAR-20, which matches GoingUp+ followed by a “GoingDown” on 29-MAR-20 which means that a 11th match has been recognized. It continues for all “GoingDown+” but there’s only one here as the next one is a higher value: 1122 > 1048 so the 11th match is closed here on 29-MAR-20. This is where the ONE ROW PER MATCH is returned, when processing the row from 29-MAR-20, with the values from the last row classified as GOINGUP, and defined in the measures, which are 28-MAR-20 and 1390. And then the pattern matching continues from this row and a GoingUp has been detected…

If you want to go further, there are good examples from Lucas Jellama: https://technology.amis.nl/?s=match_recognize
And about its implementation in SQL engines, read Markus Winand https://modern-sql.com/feature/match_recognize

And I’ll probably have more blog posts here in this series about recent features interesting for BI and DWH…

L’article Oracle 12c – peak detection with MATCH_RECOGNIZE est apparu en premier sur dbi Blog.

↧

Oracle 12c – reorg and split table with clustering

June 9, 2020, 7:49 pm

≫ Next: Oracle 12c – global partial index

≪ Previous: Oracle 12c – peak detection with MATCH_RECOGNIZE

By Franck Pachot

.
In this series of small examples on recent features, I have imported in a previous post, the statistics of covid-19 per day and per countries. This is typical of data that comes as a time-series ordered by date, because this is how it is generated day after day, but where you probably want to query from another dimension, like per countries.

If you want to ingest data faster, you keep it in the order of arrival, and insert it in heap table blocks. If you want to optimize for the future queries on the other dimension, you may load it in a table with a specialized organization where each row has its place: an Index Organized Table, a Hash Cluster, a partitioned table, or a combination of those. With Oracle we are used to storing data without the need to reorganize it. It is a multi-purpose database. But in 12c we have many features that make this reorganization easier, like partitioning, online move and online split. We can then think about a two-phase lifecycle for some operational tables that are used later for analytics:

Fast ingest and query on short time window: we insert data on the flow, with conventional inserts, into a conventional heap table. Queries on recent data is fast as the rows are colocated as they arrived.
Optimal query on history: regularly we reorganize physically the latest ingested rows, to be clustered on another dimension, because we will query for a large time range on this other dimension

Partitioning is the way to do those operations. We can have a weekly partition for the current week. When the week is over new rows will go to a new partition (11g PARTITION BY RANGE … INTERVAL) and we can optionally merge the old partition with the one containing old data, per month or year for example, to get larger time ranges for the past data. This merge is easy (18c MERGE PARTITIONS … ONLINE). And while doing that we can reorganize rows to be clustered together. This is what I’m doing in this post.

Partitioning

From the table, I have created in the previous post I create an index on GEOID (as the goal is to query by countries) and I partition it by range on DATEREP:


SQL> create index covid_geoid on covid(geoid);

Index created.

SQL> alter table covid modify partition by range(daterep) interval (numToYMinterval(1,'year')) ( partition old values less than (date '2020-01-01') , partition new values less than (date '2021-01-01') ) online;

Table altered.

This is an online operation in 12cR2. So I have two partitions, one for “old” data and one for “new” data.

I query all dates for one specific country:


SQL> select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1
  2  /
   TRUNC(DATEREP,'MON')    MAX(CASES)
_______________________ _____________
01-DEC-19                           0
01-JAN-20                           3
01-FEB-20                          19
01-MAR-20                       21595
01-APR-20                       48529
01-MAY-20                       33955
01-JUN-20                       25178

This reads rows scattered through the whole table because they were inserted day after day.

This is visible in the execution plan: the optimizer does not use the index but a full table scan:


SQL> select * from dbms_xplan.display_cursor(format=>'+cost iostats last')
  2  /
                                                                                       PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________
SQL_ID  2nyu7m59d7spv, child number 0
-------------------------------------
select trunc(daterep,'mon'), max(cases) from covid where geoid='US'
group by trunc(daterep,'mon') order by 1

Plan hash value: 4091160977

-----------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |    55 (100)|      7 |00:00:00.01 |     180 |
|   1 |  SORT ORDER BY       |       |      1 |     77 |    55   (4)|      7 |00:00:00.01 |     180 |
|   2 |   PARTITION RANGE ALL|       |      1 |     77 |    55   (4)|      7 |00:00:00.01 |     180 |
|   3 |    HASH GROUP BY     |       |      2 |     77 |    55   (4)|      7 |00:00:00.01 |     180 |
|*  4 |     TABLE ACCESS FULL| COVID |      2 |    105 |    53   (0)|    160 |00:00:00.01 |     180 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("GEOID"='US')

This has read 180 blocks, with multiblock reads.

I force the access by index in order to compare the cost:


SQL> select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1
  2  /

   TRUNC(DATEREP,'MON')    MAX(CASES)
_______________________ _____________
01-DEC-19                           0
01-JAN-20                           3
01-FEB-20                          19
01-MAR-20                       21595
01-APR-20                       48529
01-MAY-20                       33955
01-JUN-20                       25178

SQL> select * from dbms_xplan.display_cursor(format=>'+cost iostats last')
  2  /
                                                                                                                     PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________________________
SQL_ID  2whykac7cnjks, child number 0
-------------------------------------
select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid
where geoid='US' group by trunc(daterep,'mon') order by 1

Plan hash value: 2816502185

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |      1 |        |    95 (100)|      7 |00:00:00.01 |     125 |
|   1 |  SORT ORDER BY                               |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |     125 |
|   2 |   HASH GROUP BY                              |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |     125 |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |      1 |    105 |    93   (0)|    160 |00:00:00.01 |     125 |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |      1 |    105 |     1   (0)|    160 |00:00:00.01 |       2 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')

Even if the number of blocks is a bit smaller, 125 blocks, they are single block reads and then the cost is higher: 95 for index access when the full table scan was 55. Using hints and comparing the cost is how I often try to understand the optimizer choice and here the reason is clear: because rows are scattered, the clustering factor of the index access is really bad.

I said that I want to merge the partitions. And maybe reorg with an online table move. But now, for this second phase of the lifecycle, I want to cluster rows on the country dimension rather than on arrival date.

Attribute clustering

This preference can be declared on the table with 12c Attribute Clustering:


SQL> alter table covid add clustering by linear order (continentexp, countriesandterritories);

Table altered.

You see that I can mention multiple columns and I don’t need to use the GEOID column that I will use to query. This is not an index. This just a preference to cluster rows and, if they are clustered on the country name, they will be also clustered on continent, country code, geoid,… I have chosen those columns for clarity when reading the DDL:


SQL> exec dbms_metadata.set_transform_param(DBMS_METADATA.SESSION_TRANSFORM,'SEGMENT_ATTRIBUTES',false);

PL/SQL procedure successfully completed.

SQL> ddl covid

  CREATE TABLE "COVID"
   (    "DATEREP" DATE,
        "N_DAY" NUMBER,
        "N_MONTH" NUMBER,
        "N_YEAR" NUMBER,
        "CASES" NUMBER,
        "DEATHS" NUMBER,
        "COUNTRIESANDTERRITORIES" VARCHAR2(50),
        "GEOID" VARCHAR2(10),
        "COUNTRYTERRITORYCODE" VARCHAR2(3),
        "POPDATA2018" NUMBER,
        "CONTINENTEXP" VARCHAR2(10)
   )
 CLUSTERING
 BY LINEAR ORDER ("COVID"."CONTINENTEXP",
  "COVID"."COUNTRIESANDTERRITORIES")
   YES ON LOAD  YES ON DATA MOVEMENT
 WITHOUT MATERIALIZED ZONEMAP
  PARTITION BY RANGE ("DATEREP") INTERVAL (NUMTOYMINTERVAL(1,'YEAR'))
 (PARTITION "OLD"  VALUES LESS THAN (TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
 PARTITION "NEW"  VALUES LESS THAN (TO_DATE(' 2021-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ) ;

  CREATE INDEX "COVID_GEOID" ON "COVID" ("GEOID")
  ;

As you can see the default is YES for ON LOAD which means that direct-path inserts will cluster rows, and ON DATA MOVEMENT is also YES which is why merging partitions will also cluster rows.

I’ve done that afterward here but this is something you can do at table creation. You mention on which attributes you want to cluster. You mention when: direct-path inserts (YES ON LOAD) and/or table reorganization (YES ON DATA MOVEMENT). This is defined at table level. Beyond those defaults, the table reorganizations (ALTER TABLE … MOVE, ALTER TABLE … MERGE PARTITIONS) can explicitly DISALLOW CLUSTERING or ALLOW CLUSTERING.

Move Partition

When I have ingested some data and think that it would be better to cluster them, maybe at the time this partition is completed and new inserts go to a higher interval, I can reorganize it with a simple ALTER TABLE … MOVE:


SQL> alter table covid move partition new online allow clustering;

Table altered.

This will cluster rows together on the clustering attributes. I mentioned ALLOW CLUSTERING to show the syntax but it is the default (YES ON DATA MOVEMENT) anyway here.

At that point, you may also want to compress the old partitions with basic compression (the compression that does not require an additional option but is possible only with bulk load or data movement). However, be careful: the combination of online operation and basic compression requires the Advanced Compression Option. More info in a previous post on “Segment Maintenance Online Compress” feature usage.

Merge Partition

As my goal is to cluster data on a different dimension than the time one, I may want to have larger partitions for the past ones. Something like the current partition holding a week of data at maximum, but the past partitions being on quarter or yearly ranges. That can be done with partition merging, which is an online operation in 18c (and note that I have a global index here and an online operation does not invalidate indexes):


SQL> alter table covid merge partitions old,new into partition oldmerged online allow clustering;

Table altered.

This is a row movement and clustering on data movement is enabled. Again I mentioned ALLOW CLUSTERING just to show the syntax.

Let’s see the number of buffers read now with index accesss. The statistics of the index (clustering factor) has not been updated, so the optimizer may not choose the index access yet (until dbms_stats runs on stale tables). I’m forcing with an hint:


SQL> select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

   TRUNC(DATEREP,'MON')    MAX(CASES)
_______________________ _____________
01-DEC-19                           0
01-JAN-20                           3
01-FEB-20                          19
01-MAR-20                       21595
01-APR-20                       48529
01-MAY-20                       33955
01-JUN-20                       25178

SQL> select * from dbms_xplan.display_cursor(format=>'+cost iostats last')
  2  /
                                                                                                                     PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________________________
SQL_ID  2whykac7cnjks, child number 0
-------------------------------------
select /*+ index(covid) */ trunc(daterep,'mon'), max(cases) from covid
where geoid='US' group by trunc(daterep,'mon') order by 1

Plan hash value: 2816502185

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |      1 |        |    95 (100)|      7 |00:00:00.01 |       8 |
|   1 |  SORT ORDER BY                               |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |       8 |
|   2 |   HASH GROUP BY                              |             |      1 |     77 |    95   (3)|      7 |00:00:00.01 |       8 |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |      1 |    105 |    93   (0)|    160 |00:00:00.01 |       8 |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |      1 |    105 |     1   (0)|    160 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')
       filter(TBL$OR$IDX$PART$NUM(,0,8,0,"COVID".ROWID)=1)

The cost has not changed (because of the statistics) but the number of buffers read is minimal: only the 8 buffers where all my rows for this country are clustered. Remember that I clustered on the country name but use the GEOID here in my predicate. That doesn’t matter as long as the rows are together.

Asynchronous global index maintenance

Note the strange predicate on TBL$OR$IDX$PART$NUM(,0,8,0,”COVID”.ROWID)=1 that results from another 12c feature where global indexes are maintained usable during the partition maintenance (which is required for an online operation) but optimized to be cleaned-out asynchronously later. This is visible from DBA_INDEXES:


SQL> select index_name,to_char(last_analyzed,'hh24:mi:ss') last_analyzed,clustering_factor,orphaned_entries from user_indexes where table_name='COVID';

    INDEX_NAME    LAST_ANALYZED    CLUSTERING_FACTOR    ORPHANED_ENTRIES
______________ ________________ ____________________ ___________________
COVID_GEOID    08:33:34                        19206 YES

Orphaned entries mean that some entries in the global index may reference the dropped segment after my MOVE or MERGE and the query has to ignore them.

Those ranges of rowid are determined from the segment concerned, stored in the dictionary:


SQL> select * from sys.index_orphaned_entry$;
   INDEXOBJ#    TABPARTDOBJ#    HIDDEN
____________ _______________ _________
       79972           79970 O
       79972           79971 O
       79972           79980 O
       79972           79973 O

HIDDEN=’O’ means Orphaned and the ROWIDs addressing these partitions are filtered out from the dirty index entries buy the predicated filter(TBL$OR$IDX$PART$NUM(,0,8,0,”COVID”.ROWID)=1) above.

This maintenance of the dirty index will be done during the maintenance window but I can do it immediately to finish my reorganization correctly:


SQL> alter index COVID_GEOID coalesce cleanup;

Index altered.

SQL> select index_name,to_char(last_analyzed,'hh24:mi:ss') last_analyzed,clustering_factor,orphaned_entries from user_indexes where table_name='COVID';

    INDEX_NAME    LAST_ANALYZED    CLUSTERING_FACTOR    ORPHANED_ENTRIES
______________ ________________ ____________________ ___________________
COVID_GEOID    08:33:34                        19206 NO

No orphaned index entries anymore. Note that I could also have called the DBMS_PART.CLEANUP_GIDX procedure to do the same.

This is fine for the query, but as the statistics were not updated, the optimizer doesn’t know yet how clustered is my table. In order to complete my reorganization and have queries benefiting from this immediately, I gather the statistics:


SQL> exec dbms_stats.gather_table_stats(user,'COVID',options=>'gather auto');

PL/SQL procedure successfully completed.

SQL> select index_name,to_char(last_analyzed,'hh24:mi:ss') last_analyzed,clustering_factor,orphaned_entries from user_indexes where table_name='COVID';

    INDEX_NAME    LAST_ANALYZED    CLUSTERING_FACTOR    ORPHANED_ENTRIES
______________ ________________ ____________________ ___________________
COVID_GEOID    08:38:40                          369 NO

GATHER AUTO gathers only the stale ones, and, as soon as I did my MOVE or MERGE, the index was marked as stale (note that the ALTER INDEX COALESCE does not mark them a stale by itself).

And now my query will use this optimal index without the need for any hint:


SQL_ID  2nyu7m59d7spv, child number 0
-------------------------------------
select trunc(daterep,'mon'), max(cases) from covid where geoid='US'
group by trunc(daterep,'mon') order by 1

Plan hash value: 2816502185

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |      1 |        |     7 (100)|      7 |00:00:00.01 |       5 |
|   1 |  SORT ORDER BY                               |             |      1 |    101 |     7  (29)|      7 |00:00:00.01 |       5 |
|   2 |   HASH GROUP BY                              |             |      1 |    101 |     7  (29)|      7 |00:00:00.01 |       5 |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |      1 |    160 |     5   (0)|    160 |00:00:00.01 |       5 |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |      1 |    160 |     2   (0)|    160 |00:00:00.01 |       2 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')

and, thanks to the coalesce cleanup, there’s no predicate on orphan ROWIDs anymore.

With this pattern, you may realize that my global index on countries is useful only for past data. Not for the recent one that has not been clustered yet. Then, we can even avoid maintaining the index for this partition. We will see that in the next post. it is called partial indexing.

With this pattern, we can even doubt about the need to maintain an index for the old partitions. As all my rows for GEOID=’US’ were packed in a few contiguous blocks, why not just store the range of ROWIDs rather than the list of it? This is called Zone Maps. But this is only available on Exadata and I like to think about Oracle as a multiplatform database.

Those many features came in the recent releases thanks to the development of the Autonomous Database. When the DBA is a cloud provider, whether it is automated or not, all maintenance must be done online without stopping the application. Those features are the bricks to build automatic lifecycle management and performance optimization.

L’article Oracle 12c – reorg and split table with clustering est apparu en premier sur dbi Blog.

↧

Oracle 12c – global partial index

June 9, 2020, 8:02 pm

≫ Next: No{Join,GroupBy}SQL – Analytic Views for BI

≪ Previous: Oracle 12c – reorg and split table with clustering

By Franck Pachot

.
We have an incredible number of possibilities with Oracle. Yes, an index can be global (indexing many partitions without having to be partitioned itself on the same key) and partial (skipping some of the table partitions where we don’t need indexing). In the previous post of this series of small examples on recent features I partitioned a table, with covid-19 cases per day and per country, partitioned on range of date by interval. The index on the country code (GEOID) was not very efficient for data ingested per day, because countries are scattered through all the table. And then I have reorganized the old partitions to cluster them on countries.

My global index on country code is defined as:


SQL> create index covid_geoid on covid(geoid);

Index created.

This is efficient, thanks to clustering, except for the new rows coming again in time order. As those go to a new partition that is small (the idea in the post was to have short time range for the current partition, and larger ones for the old, using the ALTER TABLE … MERGE ONLINE to merge the newly old one to the others). For the current partition only, it is preferable to full scan this last partition. And even avoid maintaining the index entries for this partition as this will accelerate data ingestion.

I think that partial indexing is well known for local indexes, as this is like marking some index partitions as unusable. But here I’m showing it on a global index.

Splitting partitions

In order to continue from the previous previous post where I merged all partitions, I’ll split them again, and this can be an online operation in 12cR2:


SQL> alter table covid split partition oldmerged at (date '2020-04-01') into (partition old, partition new) online;

Table altered.

SQL> alter index COVID_GEOID coalesce cleanup;

Index altered.

I have two partitions, “old” and “new”, and a global index. I also cleaned up the orphaned index entries to get clean execution plans. And it has to be done anyway.

Here is my query, using the index:


SQL> explain plan for select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

Explained.

SQL> select * from dbms_xplan.display();
                                                                                                              PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________________________
Plan hash value: 2816502185

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |             |   101 |  1515 |     6  (34)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                               |             |   101 |  1515 |     6  (34)| 00:00:01 |       |       |
|   2 |   HASH GROUP BY                              |             |   101 |  1515 |     6  (34)| 00:00:01 |       |       |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |   160 |  2400 |     4   (0)| 00:00:01 | ROWID | ROWID |
|*  4 |     INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("GEOID"='US')

This goes to all partitions, as the ROWID in a global index carries the partition information through the data object id. We see that with Pstart/Pstop=ROWID.

Partial indexing

Now I want to set my global index on countries to be a partial index:


SQL> alter index covid_geoid indexing partial;

Index altered.

This doesnt change anything for the moment. The indexing of partitions will depend on the partition attributes which is by default INDEXING ON.

I set the “new” partition to not maintain indexes (INDEXING OFF), for this partition only.


SQL> alter table covid modify partition new indexing off;

Table altered.

This means that partial indexes will not reference the “new” partition. Whether they are local (which then means no index partition) or global (which then means no index entries for this partition).

And that’s all. Now there will be no overhead in maintaining this index when ingesting new data in this partition.

Table Expansion

And then, the optimizer has a transformation to split the execution plan in two branches: one for the index access and one without. This transformation was introduced in 11g for unusable local partitions and is now used even with global indexes. :


SQL> explain plan for /*+ index(covid) */ select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

Explained.

SQL> select * from dbms_xplan.display();
                                                                                                                PLAN_TABLE_OUTPUT
_________________________________________________________________________________________________________________________________
Plan hash value: 1031592504

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |             |   321 |  7062 |    37   (6)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                                 |             |   321 |  7062 |    37   (6)| 00:00:01 |       |       |
|   2 |   HASH GROUP BY                                |             |   321 |  7062 |    37   (6)| 00:00:01 |       |       |
|   3 |    VIEW                                        | VW_TE_2     |   321 |  7062 |    35   (0)| 00:00:01 |       |       |
|   4 |     UNION-ALL                                  |             |       |       |            |          |       |       |
|*  5 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |    93 |  1395 |     4   (0)| 00:00:01 |     1 |     1 |
|*  6 |       INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
|   7 |      PARTITION RANGE SINGLE                    |             |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
|*  8 |       TABLE ACCESS FULL                        | COVID       |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
|   9 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |   160 |  4320 |     4   (0)| 00:00:01 | ROWID | ROWID |
|* 10 |       INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("COVID"."DATEREP"=TO_DATE(' 2020-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "COVID"."DATEREP"<TO_DATE(' 2021-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  10 - access("GEOID"='US')
       filter(TBL$OR$IDX$PART$NUM("COVID",0,8,0,ROWID)=1 AND TBL$OR$IDX$PART$NUM("COVID",0,0,65535,ROWID)1 AND
              TBL$OR$IDX$PART$NUM("COVID",0,0,65535,ROWID)2)

The TABLE ACCESS BY GLOBAL INDEX ROWID is for partition 1 as mentioned by Pstart/Pstop, which is the “old” one with INDEXING ON. The TABLE ACCESS FULL is for partition 2, the “new” one, that has INDEXING OFF. The optimizer uses predicates on the partition key to select the branch safely.

But this plan has also an additional branch and this TBL$OR$IDX$PART$NUM again because I have interval partitioning. With interval partitioning, there is no known Pstop, it then it has handle the cases where a new partition has been created (with indexing on). Then, the third branch can access by index ROWID for the partitions that are not hardcoded in this plan.

Let’s remove interval partitioning just to get the plan easier to read:


SQL> alter table covid set interval();

Table altered.


SQL> explain plan for select trunc(daterep,'mon'), max(cases) from covid where geoid='US' group by trunc(daterep,'mon') order by 1;

Explained.

SQL> select * from dbms_xplan.display();
                                                                                                                PLAN_TABLE_OUTPUT
_________________________________________________________________________________________________________________________________
Plan hash value: 3529087922

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |             |   161 |  3542 |    35   (6)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                                 |             |   161 |  3542 |    35   (6)| 00:00:01 |       |       |
|   2 |   HASH GROUP BY                                |             |   161 |  3542 |    35   (6)| 00:00:01 |       |       |
|   3 |    VIEW                                        | VW_TE_2     |   161 |  3542 |    33   (0)| 00:00:01 |       |       |
|   4 |     UNION-ALL                                  |             |       |       |            |          |       |       |
|*  5 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| COVID       |    93 |  1395 |     6   (0)| 00:00:01 |     1 |     1 |
|*  6 |       INDEX RANGE SCAN                         | COVID_GEOID |   160 |       |     1   (0)| 00:00:01 |       |       |
|   7 |      PARTITION RANGE SINGLE                    |             |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
|*  8 |       TABLE ACCESS FULL                        | COVID       |    68 |  1020 |    27   (0)| 00:00:01 |     2 |     2 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("COVID"."DATEREP"<TO_DATE(' 2020-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   6 - access("GEOID"='US')
   8 - filter("GEOID"='US')

Here it is clear: access by index to the partition 1 and full table scan for partition 2. This is exactly what I wanted because I know the clustering factor on the new partition is not very good until I reorganize it (move or merge as I did in the previous post).

All these features help to manage the lifecycle of data. That’s a completely different approach from purpose-built databases where you have one database service for fast ingest with simple queries on recent data (NoSQL folks may think about DynamoDB for that), then streaming data to a relational database for more OLTP queries (RDS to continue with the AWS analogy), and move old data into a database dedicated to analytics (that could be Redshift then). With Oracle, which has always been a multi-purpose database, the goal is to avoid duplication and replication and manage data in-place for all usage. Through the 40 years of this database engine, many approaches have been implemented to cluster data: CLUSTER and IOT can sort (or hash) data as soon as it is inserted, in order to put them at their optimal place for future queries. But the agility of heap tables finally wins. Now, with the ease of in-database data movement (partitioning and online operations) and improvement of full scan (multiblock reads, direct-path reads, storage indexes) we can get the best of both: heap tables with few indexes for fast ingest of current data, reorganize regularly to be clustered, with additional indexes.

I mentioned NoSQL and I mentioned fast ingest. Actually, there’s a feature called Fast Ingest for IoT (lowercase ‘o’ there) that goes further with this idea. Instead of inserting into a persistent segment and reorganize later, rows are buffered in a ‘memoptimized rowstore’ before going to the heap segment in bulk. But that’s an Exadata feature and I like to think about Oracle as a multiplatform database.

L’article Oracle 12c – global partial index est apparu en premier sur dbi Blog.

↧

No{Join,GroupBy}SQL – Analytic Views for BI

June 14, 2020, 9:36 am

≫ Next: Oracle non-linguistic varchar2 columns to order by without sorting

≪ Previous: Oracle 12c – global partial index

By Franck Pachot

.
Advocates of NoSQL can query their structures without having to read a data model first. And without writing long table join clauses. They store and query a hierarchical structure without the need to follow relationships, and without the need to join tables on a foreign key name, in order to get a caption or description from a lookup table. The structure, like an XML or JSON document, provides metadata to understand the structure and map it to business objects. The API is simple ‘put’ and ‘get’ where you can retrieve a whole hierarchy, with aggregates at all levels, ready to drill-down from summary to details. Without the need to write sum() functions and group by clauses. For analytics, SQL has improved a lot with window functions and grouping sets but, despite being powerful, this makes the API more complex. And, at a time were the acceptable learning curve should reach its highest point after 42 seconds (like watching the first bits of a video or getting to the stackoverflow top-voted answer), this complexity cannot be adopted easily.

Is SQL too complex? If it does, then something is wrong. SQL was invented for end-users: to query data like in plain English, without the need to know the internal implementation and the procedural algorithms that can make sense out of it. If developers are moving to NoSQL because of the complexity of SQL, then SQL missed something from its initial goal. If they go to NoSQL because “joins are expensive” it just means that joins should not be exposed to them. Because optimizing access paths and expensive operations is the job of the database optimizer, with the help of the database designer, but not the front-end developer. However, this complexity is unfortunately there. Today, without a good understanding of the data model (entities, relationships, cardinalities) writing SQL queries is difficult. Joining over many-to-many relationships, or missing a group by clause, can give wrong results. When I see a select with a DISTINCT keyword, I immediately think that there’s an error in the query and the developer, not being certain of the aggregation level he is working on, has masked it with a DISTINCT because understanding the data model was too time-consuming.

In data warehouses, where the database is queried by the end-user, we try to avoid this risk by building simple star schemas with only one fact tables and many-to-one relationships to dimensions. And on top of that, we provide a reporting tool that will generate the queries correctly so that the end-user does not need to define the joins and aggregations. This requires a layer of metadata on top of the database to describe the possible joins, aggregation levels, functions to aggregate measures,… When I was a junior on databases I’ve been fascinated by those tools. On my first Data Warehouse, I’ve built a BusinessObjects (v3) universe. It was so simple: define the “business objects”, which are the attributes mapped to the dimension columns. Define the fact measures, with the aggregation functions that can apply. And for the joins, it was like the aliases in the from clause, a dimension having multiple roles: think about an airport that can be the destination or the origin of a flight. And then we defined multiple objects: all the airport attributes in the destination role, and all the airport attributes as an origin, were different objects for the end-user. Like “origin airport latitude”, rather than “airport latitude” that makes sense only after a join on “origin airport ID”. That simplifies a lot the end-user view on our data: tables are still stored as relational tables to be joined at query time, in order to avoid redundancy, but the view on top of that shows the multiple hierarchies, like in a NoSQL structure, for the ease of simple queries.

But, as I mentioned, this is the main reason for SQL and this should be done with SQL. All these descriptions I did in the BusinessObjects universe should belong to the database dictionary. And that’s finally possible with Analytic Views. Here is an example on the tables I’ve created in a previous post. I am running on the 20c cloud preview, but this can run on 18c or 19c. After importing the .csv of covid-19 cases per day and countries, I’ve built one fact and one snowflake-dimension tables:


create table continents as select rownum continent_id, continentexp continent_name from (select distinct continentexp from covid where continentexp!='Other');
create table countries as select country_id,country_code,country_name,continent_id,popdata2018 from (select distinct geoid country_id,countryterritorycode country_code,countriesandterritories country_name,continentexp continent_name,popdata2018 from covid where continentexp!='Other') left join continents using(continent_name);
create table cases as select daterep, geoid country_id,cases from covid where continentexp!='Other';
alter table continents add primary key (continent_id);
alter table countries add foreign key (continent_id) references continents;
alter table countries add primary key (country_id);
alter table cases add foreign key (country_id) references countries;
alter table cases add primary key (country_id,daterep);

The dimension hierarchy is on country/continent. I should have created one for time (day/month/quarter/year) but the goal is to keep it simple to show the concept.

When looking at the syntax, it may seem complex. But, please, understand that the goal is to put more in the static definition so that runime usage is easier.

Attribute Dimension

I’ll describe the Country/Continent dimension. It can be in one table (Star Schema) or multiple (Snowflake Schema). I opted for snowflake to show how it is supported since 18c. In 12c we have to create a view on it as the using clause can be only a table or view identifier.


create or replace attribute dimension COUNTRIES_DIM_ATT
using COUNTRIES a ,CONTINENTS b join path country_continent on a.CONTINENT_ID=b.CONTINENT_ID
attributes ( a.COUNTRY_ID "Country ID", a.COUNTRY_CODE "Country", a.COUNTRY_NAME "Country name", a.CONTINENT_ID "Continent ID", b.CONTINENT_NAME "Continent")
level "Continent"
  key "Continent ID"
  member name         '#'||to_char("Continent ID")
  member caption      upper(substr("Continent",1,3))
  member description  "Continent"
  determines ("Continent")
level "Country"
  key "Country ID"
  member name         "Country ID"
  member caption      "Country"
  member description  "Country name"
  determines ("Country ID","Country", "Country name", "Continent ID", "Continent")
 all member name 'WORLD'
/

Let’s take it simply, I have an internal name for my dimension COUNTRIES_DIM_ATT and a USING clause which declares the dimension table and an optional join for snowflake schemas with JOIN PATH. Then I’ve declared the attributes which are the projection of those columns. For this example, I decided to use quoted identifiers for the one that I add in this layer, to distinguish them from the table columns. But do as you want.

The most important here is about levels and dependency. In a star schema, we denormalize the fact tables for simplification (and because it is not a problem as there are no updates, and size is not as large as the fact tables). The metadata we declare here describes the relationships. I have two levels: country and continent. And a many-to-one relationship from country to continent. This is what I declare with the LEVEL and DETERMINES keyword: from all the attributes declared, which ones are functional dependencies of others.

The second important description here is standard naming. In the analytic view, I can query the attributes as columns from the USING clause. But for the ease of querying by simple tools, they will also have standard columns names. Each attribute has as MEMBER NAME (I used the 2-letter country code here which is the COUNTRY_ID primary key in my COUNTRIES dimension table. They have a MEMBER CAPTION as a short name and a MEMBER DESCRIPTION for a longer one. Those are standardized names for each object. The idea is to provide a view that can be used without reading the data model: for each level, the end-user can query the name, caption or the description.

The idea is that those hierarchy levels will be selected in the WHERE clause by a LEVEL_NAME instead of mentioning all columns in GROUP BY clause or PARTITION BY analytic function windowing clause. Note that the’s also an ALL level for the top-most aggregation and we can keep the ‘ALL’ name or a specific one like the ‘WORLD’ I’ve defined here for all countries.

This is the most important metadata is defined by the dimension but we don’t query on dimensions. We can only look at the definitions in the dictionary:


SQL> select * FROM user_attribute_dimensions;

      DIMENSION_NAME    DIMENSION_TYPE    CACHE_STAR    MAT_TABLE_OWNER    MAT_TABLE_NAME    ALL_MEMBER_NAME    ALL_MEMBER_CAPTION    ALL_MEMBER_DESCRIPTION    COMPILE_STATE    ORIGIN_CON_ID
____________________ _________________ _____________ __________________ _________________ __________________ _____________________ _________________________ ________________ ________________
COUNTRIES_DIM_ATT    STANDARD          NONE                                               'WORLD'                                                            VALID                           3
CALENDAR_DIM_ATT     STANDARD          NONE                                               'ALL'                                                              VALID                           3
DAYS_DIM_ATT         TIME              NONE                                               'ALL'                                                              VALID                           3

SQL> select * FROM user_attribute_dim_attrs;

      DIMENSION_NAME    ATTRIBUTE_NAME    TABLE_ALIAS       COLUMN_NAME    ORDER_NUM    ORIGIN_CON_ID
____________________ _________________ ______________ _________________ ____________ ________________
DAYS_DIM_ATT         Date              CASES          DATEREP                      0                3
COUNTRIES_DIM_ATT    Country ID        A              COUNTRY_ID                   0                3
COUNTRIES_DIM_ATT    Country           A              COUNTRY_CODE                 1                3
COUNTRIES_DIM_ATT    Country name      A              COUNTRY_NAME                 2                3
COUNTRIES_DIM_ATT    Continent ID      A              CONTINENT_ID                 3                3
COUNTRIES_DIM_ATT    Continent         B              CONTINENT_NAME               4                3
CALENDAR_DIM_ATT     Date              CASES          DATEREP                      0                3

SQL> select * FROM user_attribute_dim_levels;

      DIMENSION_NAME    LEVEL_NAME    SKIP_WHEN_NULL    LEVEL_TYPE                MEMBER_NAME_EXPR               MEMBER_CAPTION_EXPR    MEMBER_DESCRIPTION_EXPR    ORDER_NUM    ORIGIN_CON_ID
____________________ _____________ _________________ _____________ _______________________________ _________________________________ __________________________ ____________ ________________
COUNTRIES_DIM_ATT    Continent     N                 STANDARD      '#'||to_char("Continent ID")    upper(substr("Continent",1,3))    "Continent"                           0                3
DAYS_DIM_ATT         Day           N                 DAYS          TO_CHAR("Date")                                                                                         0                3
COUNTRIES_DIM_ATT    Country       N                 STANDARD      "Country ID"                    "Country"                         "Country name"                        1                3
CALENDAR_DIM_ATT     Day           N                 STANDARD      TO_CHAR("Date")                                                                                         0                3

There are more that we can define here. I the same way we want to simplify the PARTITION BY clause of analytic function, thanks to levels, we avoid the ORDER BY clause with ordering in each level. I keep it simple here.

For drill-down analytics, we query on hierarchies.

Hierarchy

This is a simple declaration of parent-child relationship between levels:


SQL> 
create or replace hierarchy "Countries"
    using COUNTRIES_DIM_ATT
    ( "Country" child of "Continent")
 /

Hierarchy created.

This is actually a view that we can query, and the best way to understand it is to look at it.

The definition from the dictionary just reflects what we have created:


SQL> select * FROM user_hierarchies;

   HIER_NAME    DIMENSION_OWNER       DIMENSION_NAME    PARENT_ATTR    COMPILE_STATE    ORIGIN_CON_ID
____________ __________________ ____________________ ______________ ________________ ________________
Countries    DEMO               COUNTRIES_DIM_ATT                   VALID                           3

SQL> select * FROM user_hier_levels;

   HIER_NAME    LEVEL_NAME    ORDER_NUM    ORIGIN_CON_ID
____________ _____________ ____________ ________________
Countries    Continent                0                3
Countries    Country                  1                3

We can also query USER_HIER_COLUMNS to see what is exposed as a view.

but a simple DESC will show them:


SQL> desc "Countries"

                 Name    Role            Type
_____________________ _______ _______________
Country ID            KEY     VARCHAR2(10)
Country               PROP    VARCHAR2(3)
Country name          PROP    VARCHAR2(50)
Continent ID          KEY     NUMBER
Continent             PROP    VARCHAR2(10)
MEMBER_NAME           HIER    VARCHAR2(41)
MEMBER_UNIQUE_NAME    HIER    VARCHAR2(95)
MEMBER_CAPTION        HIER    VARCHAR2(12)
MEMBER_DESCRIPTION    HIER    VARCHAR2(50)
LEVEL_NAME            HIER    VARCHAR2(9)
HIER_ORDER            HIER    NUMBER
DEPTH                 HIER    NUMBER(10)
IS_LEAF               HIER    NUMBER
PARENT_LEVEL_NAME     HIER    VARCHAR2(9)
PARENT_UNIQUE_NAME    HIER    VARCHAR2(95)

This is like a join on the COUNTRIES and CONTINENTS (defined in the using clause of the attribute dimension) with the attributes exposed. But there are also additional columns that are there with standard names in all hierarchies: member name/caption/description and level information. Because all levels are here, as if we did some UNION ALL over GROUP BY queries.

Additional columns and additional rows for each level. Let’s query it:


SQL> select * from "Countries";

   Country ID    Country                         Country name    Continent ID    Continent    MEMBER_NAME    MEMBER_UNIQUE_NAME    MEMBER_CAPTION                   MEMBER_DESCRIPTION    LEVEL_NAME    HIER_ORDER    DEPTH    IS_LEAF    PARENT_LEVEL_NAME    PARENT_UNIQUE_NAME
_____________ __________ ____________________________________ _______________ ____________ ______________ _____________________ _________________ ____________________________________ _____________ _____________ ________ __________ ____________________ _____________________
                                                                                           WORLD          [ALL].[WORLD]                                                                ALL                       0        0          0
                                                                            1 Asia         #1             [Continent].&[1]      ASI               Asia                                 Continent                 1        1          0 ALL                  [ALL].[WORLD]
AE            ARE        United_Arab_Emirates                               1 Asia         AE             [Country].&[AE]       ARE               United_Arab_Emirates                 Country                   2        2          1 Continent            [Continent].&[1]
AF            AFG        Afghanistan                                        1 Asia         AF             [Country].&[AF]       AFG               Afghanistan                          Country                   3        2          1 Continent            [Continent].&[1]
BD            BGD        Bangladesh                                         1 Asia         BD             [Country].&[BD]       BGD               Bangladesh                           Country                   4        2          1 Continent            [Continent].&[1]
...
VN            VNM        Vietnam                                            1 Asia         VN             [Country].&[VN]       VNM               Vietnam                              Country                  43        2          1 Continent            [Continent].&[1]
YE            YEM        Yemen                                              1 Asia         YE             [Country].&[YE]       YEM               Yemen                                Country                  44        2          1 Continent            [Continent].&[1]
                                                                            2 Africa       #2             [Continent].&[2]      AFR               Africa                               Continent                45        1          0 ALL                  [ALL].[WORLD]
AO            AGO        Angola                                             2 Africa       AO             [Country].&[AO]       AGO               Angola                               Country                  46        2          1 Continent            [Continent].&[2]
BF            BFA        Burkina_Faso                                       2 Africa       BF             [Country].&[BF]       BFA               Burkina_Faso                         Country                  47        2          1 Continent            [Continent].&[2]
...

I’ve removed many rows for clarity, but there is one row for all countries, the deepest level, plus one row for each continent, plus one row for the top summary (‘WORLD’). This is how we avoid GROUP BY in the end-user query: we just mention the level: LEVEL_NAME=’ALL’, LEVEL_NAME=’Continent’, LEVEL_NAME=’Country’. Or query the DEPTH: 0 for the global summary, 1 for continents, 2 for countries. The countries, being the most detailed level can also be queried by IS_LEAF=1. The attributes may be NULL for non-leaf levels, like “Country name” when at ‘Continent’ level, or “Continent” when at ‘ALL’ level.

In addition to the attributes, we have the standardized names, so that the user GUI can see the same column names for all dimensions. I don’t show all countries and I don’t query MEMBER_NAME and MEMBER_CAPTION to get it short here:


SQL>
select MEMBER_NAME,MEMBER_UNIQUE_NAME,LEVEL_NAME,PARENT_LEVEL_NAME,PARENT_UNIQUE_NAME,HIER_ORDER,DEPTH,IS_LEAF
 from "Countries" order by DEPTH,HIER_ORDER fetch first 10 rows only;

   MEMBER_NAME    MEMBER_UNIQUE_NAME    LEVEL_NAME    PARENT_LEVEL_NAME    PARENT_UNIQUE_NAME    HIER_ORDER    DEPTH    IS_LEAF
______________ _____________________ _____________ ____________________ _____________________ _____________ ________ __________
WORLD          [ALL].[WORLD]         ALL                                                                  0        0          0
#1             [Continent].&[1]      Continent     ALL                  [ALL].[WORLD]                     1        1          0
#2             [Continent].&[2]      Continent     ALL                  [ALL].[WORLD]                    45        1          0
#3             [Continent].&[3]      Continent     ALL                  [ALL].[WORLD]                   101        1          0
#4             [Continent].&[4]      Continent     ALL                  [ALL].[WORLD]                   156        1          0
#5             [Continent].&[5]      Continent     ALL                  [ALL].[WORLD]                   165        1          0
AE             [Country].&[AE]       Country       Continent            [Continent].&[1]                  2        2          1
AF             [Country].&[AF]       Country       Continent            [Continent].&[1]                  3        2          1
BD             [Country].&[BD]       Country       Continent            [Continent].&[1]                  4        2          1
BH             [Country].&[BH]       Country       Continent            [Continent].&[1]                  5        2          1

A row can be identified by the level (LEVEL_NAME or DEPTH) and its name but a unique name is generated here with the full path (in MDX style). This is MEMBER_UNIQUE_NAME and we have also the PARENT_UNIQUE_NAME if we want to follow the hierarchy.

Analytic View

Now that I have a view on the hierarchy, I want to join it to the fact table, in order to display the measures at different levels of aggregation. Again, I don’t want the user to think about joins and aggregation functions, and this must be encapsulated in a view, an ANALYTIC VIEW:


create or replace analytic view "COVID cases"
using CASES
dimension by (
  COUNTRIES_DIM_ATT key COUNTRY_ID references "Country ID"
  hierarchies ( "Countries")
 )
measures (
  "Cases"          fact CASES aggregate by sum,
  "Highest cases"  fact CASES aggregate by max
)
/

The USING clause just mentions the fact table. The DIMENSION clause lists all the dimensions (I have only one here for the simplicity of the example, but you will have all dimensions here) and how they join to the dimension (foreign key REFERENCES the lowest level key of the dimension). The MEASURES defines the fact columns and the aggregation function to apply to them. This can be complex to be sure it always makes sense. What is stored in one fact column can be exposed as multiple business objects attribute depending on the aggregation.

There are many functions for measures calculated. For example in the screenshot you will see at the end, I added the following to show the country covid cases as a ration on their continent ones.


 "cases/continent" as 
  ( share_of("Cases" hierarchy COUNTRIES_DIM_ATT."Countries"  level "Continent") )
  caption 'Cases Share of Continent' description 'Cases Share of Continent'

But for the moment I keep it simple with only “Cases” and “Highest cases”.

Here is the description:


SQL> desc "COVID cases"

            Dim Name    Hier Name                  Name    Role            Type
____________________ ____________ _____________________ _______ _______________
COUNTRIES_DIM_ATT    Countries    Country ID            KEY     VARCHAR2(10)
COUNTRIES_DIM_ATT    Countries    Country               PROP    VARCHAR2(3)
COUNTRIES_DIM_ATT    Countries    Country name          PROP    VARCHAR2(50)
COUNTRIES_DIM_ATT    Countries    Continent ID          KEY     NUMBER
COUNTRIES_DIM_ATT    Countries    Continent             PROP    VARCHAR2(10)
COUNTRIES_DIM_ATT    Countries    MEMBER_NAME           HIER    VARCHAR2(41)
COUNTRIES_DIM_ATT    Countries    MEMBER_UNIQUE_NAME    HIER    VARCHAR2(95)
COUNTRIES_DIM_ATT    Countries    MEMBER_CAPTION        HIER    VARCHAR2(12)
COUNTRIES_DIM_ATT    Countries    MEMBER_DESCRIPTION    HIER    VARCHAR2(50)
COUNTRIES_DIM_ATT    Countries    LEVEL_NAME            HIER    VARCHAR2(9)
COUNTRIES_DIM_ATT    Countries    HIER_ORDER            HIER    NUMBER
COUNTRIES_DIM_ATT    Countries    DEPTH                 HIER    NUMBER(10)
COUNTRIES_DIM_ATT    Countries    IS_LEAF               HIER    NUMBER
COUNTRIES_DIM_ATT    Countries    PARENT_LEVEL_NAME     HIER    VARCHAR2(9)
COUNTRIES_DIM_ATT    Countries    PARENT_UNIQUE_NAME    HIER    VARCHAR2(95)
                     MEASURES     Cases                 BASE    NUMBER
                     MEASURES     Highest cases         BASE    NUMBER

I have columns from all hierarchies, with KEY and PROPERTY attributes, and standardized names from the HIERARCHY, and the measures. You must remember that it is a virtual view: you will never query all columns and all rows. You SELECT the columns and filter (WHERE) the rows and levels and you get the result you want without GROUP BY and JOIN. If you look at the execution plan you will see the UNION ALL, JOIN, GROUP BY on the star or snowflake table. But this is out of the end-user concern. As a DBA you can create some materialized views to pre-build some summaries and query rewrite will used them.

We are fully within the initial SQL philosophy: a logical view provides an API that is independent of the physical design and easy to query, on a simple row/column table easy to visualize.

Analytic query

A query on the analytic view is then very simple. In the FROM clause, instead of tables with joins, I mention the analytic view, and instead of mentioning table aliases, I mention the hierarchy. I reference only the standard column names. Only the hierarchy names and the measures are specific. In the where clause, I can also reference the LEVEL_NAME:


SQL> 
select MEMBER_DESCRIPTION, "Cases"
 from "COVID cases" hierarchies ("Countries")
 where ( "Countries".level_name='Country' and "Countries".MEMBER_CAPTION in ('USA','CHN') )
    or ( "Countries".level_name in ('Continent','ALL') )
 order by "Cases";

         MEMBER_DESCRIPTION      Cases
___________________________ __________
Oceania                           8738
China                            84198
Africa                          203142
Asia                           1408945
United_States_of_America       1979850
Europe                         2100711
America                        3488230
                               7209766

Here I wanted to see the total covid-19 cases for all countries (‘ALL’), for each continent, and only two ones at the country level: USA and China. And this was a simple SELECT … FROM … WHERE … ORDER BY without joins and group by. Like a query on an OLAP cube.

If I had no analytic views, here is how I would have queried the tables:


SQL>
select coalesce(CONTINENT_NAME, COUNTRY_NAME,'ALL'), CASES from (
select CONTINENT_NAME, COUNTRY_NAME, sum(CASES) cases, COUNTRY_CODE, grouping(COUNTRY_CODE) g_country
from CASES join COUNTRIES using(COUNTRY_ID) join CONTINENTS using(CONTINENT_ID)
group by grouping sets ( () , (CONTINENT_NAME) , (COUNTRY_CODE,COUNTRY_NAME) )
)
where COUNTRY_CODE in ('USA','CHN') or g_country >0
order by cases
/

   COALESCE(CONTINENT_NAME,COUNTRY_NAME,'ALL')      CASES
______________________________________________ __________
Oceania                                              8738
China                                               84198
Africa                                             203142
Asia                                              1408945
United_States_of_America                          1979850
Europe                                            2100711
America                                           3488230
ALL                                               7209766

This was with GROUPING SETS to add multiple levels and GROUPING() function to detect the level. Without GROUPING SETS I may have done it with many UNION ALL between GROUP BY subqueries.

Back to roots of SQL

You may think that you don’t need Analytic Views because the same can be done by some BI reporting tools. But this should belong to the database. SQL was invented to provide a simple API to users. If you need an additional layer with a large repository of metadata and complex transformations between the user-defined query and the SQL to execute, then something is missed from the initial goal. One consequence is people going to NoSQL hierarchical databases with the idea that they are easier to visualize: simple API (a key-value get) and embedded metadata (as JSON for example). While SQL was more and more powerful to process data in the database, the complexity was going too far and developers prefered to come back to their procedural code rather than learning something new. And the first step of many current developments is to move the data out of the database, to NoSQL, or to an OLAP cube in our case.

Analytic views bring back the power of SQL: the view exposes a Data Mart as one simple table with columns and rows, containing all dimensions and levels of aggregation. The metadata that describes the data model is back where it belongs: the data dictionary. My example here is a very simple one but it can go further, with classification to add more metadata for self-documentation, with more hierarchies (and a special one for the time dimension), and many calculated measures.
SQL on it is simplified, and there are also some GUI over analytic views, like APEX, or SQL Developer:

And if SQL is still too complex, it seems that we can query Analytic Views with MDX (MultiDimensional eXpressions). The MEMBER_UNIQUE_NAME follows the MDX syntax and we can find this in ?/mesg/oraus.msg list of error messages:


/============================================================================
/
/    18200 - 18699 Reserved for Analytic View Sql (HCS) error messages
/
/============================================================================
/
/// 18200 - 18219 reserved for MDX Parser
/

HCS is the initial name of this feature (Hierarchical Cubes). I’ve not seen other mentions of MDX in the Oracle Database documentation, so I’ve no idea if it is already implemented.

L’article No{Join,GroupBy}SQL – Analytic Views for BI est apparu en premier sur dbi Blog.

↧

Oracle non-linguistic varchar2 columns to order by without sorting

June 16, 2020, 11:51 am

≫ Next: Duplicate Database on ODA X4

≪ Previous: No{Join,GroupBy}SQL – Analytic Views for BI

By Franck Pachot

.
Sorting data is an expensive operation and many queries declare an ORDER BY. To avoid the sort operation you can build an index as it maintains a sorted structure. This helps with Top-N queries as you don’t have to read all rows but only those from a range of index entries. However, indexes are sorted by binary values. For NUMBER or DATE datatypes, the internal storage ensures that the order is preserved in the binary format. For character strings, the binary format is ASCII, which follows the English alphabet. That’s fine when your session language, NLS_LANGUAGE, defines an NLS_SORT that follows this BINARY order. But as soon as you set a language that has some specific alphabetical order, having an index on a VARCHAR2 or CHAR column does not help to avoid a SORT operation. However, in Oracle 12.2 we can define the sort order at column level with the SQL Standard COLLATE. One use case is for alpha-numeric columns that have nothing to do with any language. Like some natural keys combining letters and numbers. The user expects them to be listed in alphabetical order but, storing only 7-bits ASCII characters, you don’t care about linguistic collation.

I am running this on the Oracle 20c preview in the Oracle Cloud.

VARCHAR2

It can happen that a primary key is not a NUMBER but a CHAR or VARCHAR2, like this:


SQL> create table demo (ID constraint demp_pk primary key) as
  2  select cast(dbms_random.string('U',1)||to_char(rownum,'FM0999') as varchar2(5)) ID
  3  from xmltable('1 to 10');

Table created.

SQL> select * from demo order by ID;

      ID
________
K0003
K0009
L0007
L0010
M0008
O0002
S0001
W0005
Y0006
Z0004

10 rows selected.

I query with ORDER BY because sorting can make sense on a natural key.

Index

I have an index on this column, which is sorted, and then the execution plan is optimized:


SQL> select * from dbms_xplan.display_cursor(format=>'basic');

                      PLAN_TABLE_OUTPUT
_______________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID

Plan hash value: 1955576728

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  INDEX FULL SCAN | DEMP_PK |
------------------------------------

13 rows selected.

There’s no SORT operation because the INDEX FULL SCAN follows the index entries in order.

NLS_LANGUAGE

However, there are many countries where we don’t speak English:


SQL> alter session set nls_language='French';

Session altered.

In French, like in many languages, we have accentuated characters and other specificities so that the language-alphabetical order does not always follow the ASCII order.

I’m running exactly the same query:


SQL> select * from demo order by ID;

      ID
________
K0003
K0009
L0007
L0010
M0008
O0002
S0001
W0005
Y0006
Z0004

10 rows selected.

SQL> select * from dbms_xplan.display_cursor(format=>'basic');

                      PLAN_TABLE_OUTPUT
_______________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID

Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

14 rows selected.

This time, there’s a SORT operation. even if I’m still reading with INDEX FULL SCAN.

NLS_SORT

The reason is that, by setting the ‘French’ language, I’ve also set the French sort collating sequence.


SQL> select * from nls_session_parameters;
                 PARAMETER                           VALUE
__________________________ _______________________________
NLS_LANGUAGE               FRENCH
NLS_SORT                   FRENCH

And this is different from the BINARY one that I had when my language was ‘American’.

Actually, only a few languages follow the BINARY order of the ASCII table:


SQL>
  declare
   val varchar2(64);
  begin
    for i in (select VALUE from V$NLS_VALID_VALUES where PARAMETER='LANGUAGE') loop
    execute immediate 'alter session set nls_language='''||i.value||'''';
    select value into val from NLS_SESSION_PARAMETERS where PARAMETER='NLS_SORT';
    if val='BINARY' then dbms_output.put(i.value||' '); end if;
    end loop;
    dbms_output.put_line('');
  end;
/

AMERICAN JAPANESE KOREAN SIMPLIFIED CHINESE TRADITIONAL CHINESE ENGLISH HINDI TAMIL KANNADA TELUGU ORIYA MALAYALAM ASSAMESE GUJARATI MARATHI PUNJABI BANGLA MACEDONIAN LATIN SERBIAN IRISH

PL/SQL procedure successfully completed.

This is ok for real text but not for my primary key where ASCII order is ok. I can set the NLS_SORT=BINARY for my session, but that’s too wide as my problem is only with a column.

Or I can create an index for the French collation. Actually, this is what is used internally:


SQL> explain plan for select * from demo order by ID;
Explained.

SQL> select * from dbms_xplan.display(format=>'basic +projection');
                                                      PLAN_TABLE_OUTPUT
_______________________________________________________________________
Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) NLSSORT("DEMO"."ID",'nls_sort=''GENERIC_M''')[50],
       "DEMO"."ID"[VARCHAR2,5]
   2 - "DEMO"."ID"[VARCHAR2,5]

GENERIC_M is the sort collation for many European languages.

But that again, does not fit the scope of my problem as I don’t want to create an index for any possible NLS_SORT setting.

COLLATE

The good solution is to define the collation for my table column: this ID is a character string, but it is an ASCII character string which has nothing to do with my language. In 18c I can do that:


SQL> alter table demo modify ID collate binary;

Table altered.

The COLLATE is a SQL Standard syntax that exists in other databases, and it came to Oracle in 12cR2.

And that’s all:


SQL> explain plan for select * from demo order by ID;

Explained.

SQL> select * from dbms_xplan.display(format=>'basic +projection');

                                             PLAN_TABLE_OUTPUT
______________________________________________________________
Plan hash value: 1955576728

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  INDEX FULL SCAN | DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "DEMO"."ID"[VARCHAR2,5]

No SORT operation needed, whatever the language I set for my session.

Here is the DDL for my table:


SQL> ddl demo

  CREATE TABLE "SYS"."DEMO"
   (    "ID" VARCHAR2(5) COLLATE "BINARY",
         CONSTRAINT "DEMP_PK" PRIMARY KEY ("ID")
  USING INDEX  ENABLE
   )  DEFAULT COLLATION "USING_NLS_COMP" ;

My column explicitly follows the BINARY collation.

Extended Data Types

Now, all seems easy, but there’s a prerequisite:


SQL> show parameter max_string_size

NAME            TYPE   VALUE
--------------- ------ --------
max_string_size string EXTENDED

I have set my PDB to EXTENDED string size.

If I try the same in a PDB with the ‘old’ limit of 4000 bytes:


SQL> alter session set container=PDB1;

Session altered.

SQL> show parameter max_string_size

NAME            TYPE   VALUE
--------------- ------ --------
max_string_size string STANDARD

SQL> drop table demo;

Table dropped.

SQL> create table demo (ID varchar2(5) collate binary constraint demp_pk primary key);

create table demo (ID varchar2(5) collate binary constraint demp_pk primary key)
 *
ERROR at line 1:
ORA-43929: Collation cannot be specified if parameter MAX_STRING_SIZE=STANDARD is set.

This new feature is allowed only with the Extended Data Types introduced in 12c release 2.

ORDER BY COLLATE

Ok, let’s create the table with the default collation:


SQL> create table demo (ID constraint demp_pk primary key) as
  2  select cast(dbms_random.string('U',1)||to_char(rownum,'FM0999') as varchar2(5)) ID
  3  from xmltable('1 to 10');

Table created.

SQL> select * from dbms_xplan.display_cursor(format=>'basic +projection');

                                                   PLAN_TABLE_OUTPUT
____________________________________________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID

Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) NLSSORT("DEMO"."ID",'nls_sort=''FRENCH''')[50],
       "DEMO"."ID"[VARCHAR2,5]
   2 - "DEMO"."ID"[VARCHAR2,5]

As my NLS_SORT is ‘French’ there is a SORT operation.

But I can explicitly request a BINARY sort for this:


SQL> select * from demo order by ID collate binary;

      ID
________
D0003
H0002
L0009
N0008
P0010
Q0005
R0004
W0007
Y0001
Z0006

10 rows selected.

SQL> select * from dbms_xplan.display_cursor(format=>'basic +projection');

                                             PLAN_TABLE_OUTPUT
______________________________________________________________
EXPLAINED SQL STATEMENT:
------------------------
select * from demo order by ID collate binary

Plan hash value: 2698718808

------------------------------------
| Id  | Operation        | Name    |
------------------------------------
|   0 | SELECT STATEMENT |         |
|   1 |  SORT ORDER BY   |         |
|   2 |   INDEX FULL SCAN| DEMP_PK |
------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) "DEMO"."ID" COLLATE "BINARY"[5],
       "DEMO"."ID"[VARCHAR2,5]
   2 - "DEMO"."ID"[VARCHAR2,5]

I have no idea why there is still a sort operation. I think that the INDEX FULL SCAN returns already the rows in binary order. And that should require additional sorting for the ORDER BY … COLLATE BINARY.

L’article Oracle non-linguistic varchar2 columns to order by without sorting est apparu en premier sur dbi Blog.

↧

Duplicate Database on ODA X4

September 18, 2020, 7:30 am

≫ Next: Upgrade to Oracle 19c – performance issue

≪ Previous: Oracle non-linguistic varchar2 columns to order by without sorting

The ODA X4 is still in use for some customers. The last time I was asked to validate the backups . Let me explain the context. Actually the backups are done via RMAN on local in a NFS share. After these backups are backep up by Netback on tape.
The goal was just to validate that the backups done by Netback can be used to restore if needed.

So the backup teams restored the backups of one database in a directory and then we duplicate this database using these backups.

The source database is SRCDB
The target database will be named TESTQ
The backup from tape are copied in /shareback/backup/test_restauration
Below the the server characteristics

[root@ ~]# oakcli show server

        Power State              : On
        Open Problems            : 0
        Model                    : ODA X4-2
        Type                     : Rack Mount
        Part Number              : 32974004+1+1
        Serial Number            : 1435NMP00A
        Primary OS               : Not Available
        ILOM Address             : 10.120.128.111
        ILOM MAC Address         : 00:10:E0:5F:4D:2E
        Description              : Oracle Database Appliance X4-2 1435NMP00A
        Locator Light            : Off
        Actual Power Consumption : 234 watts
        Ambient Temperature      : 23.000 degree C
        Open Problems Report     : System is healthy

[root@ ~]#

On the source the datafiles are stored here

/u02/app/oracle/oradata/datastore/.ACFS/snaps/SRCDB/SRCDB/

As for every duplicate we have to prepare the directories for the target database. But there is a problem with the ODA as I connot create any directory under the snaps directory

oracle@:/u02/app/oracle/oradata/datastore/.ACFS/snaps/ [TESTQ] mkdir TESTQ
mkdir: cannot create directory `TT': Permission denied
oracle@:/u02/app/oracle/oradata/datastore/.ACFS/snaps/ [TESTQ]

As I cannot manually create any directory, I have two solutions
1-Create an empty database named TESTQ with oakcli create database and then remove the datafiles after
2-Create the storage for the future database TESTQ using oakcli create dbstorage.

[root@srvodap01n1test_restauration]# oakcli create dbstorage -h
Usage:
      oakcli create dbstorage -db  [-cdb]


      where:
         db_name      -  Setup the required ACFS storage structure for the database 
         cdb          -  This needs to be passed in case of cdb database

         This storage structure can be used for migrating databases from ASM to ACFS e.t.c

[root@srvodap01n1test_restauration]#

As we can see create dbstorage will create all required directories for the new database. So we use the 2nd methode. We were using an X4 ODA and the command create storage has to be launched from the first node

[root@srvodap01n0 snaps]# oakcli create dbstorage -db TESTQ
INFO: 2020-09-17 13:49:47: Please check the logfile  '/opt/oracle/oak/log/srvodap01n0/tools/12.1.2.12.0/createdbstorage_TESTQ_1793.log' for more details

Please enter the 'SYSASM'  password : (During deployment we set the SYSASM password to 'welcome1'):
Please re-enter the 'SYSASM' password:
Please select one of the following for Database Class  [1 .. 3] :
1    => odb-01s  (   1 cores ,     4 GB memory)
2    =>  odb-01  (   1 cores ,     8 GB memory)
3    =>  odb-02  (   2 cores ,    16 GB memory)
1
The selected value is : odb-01s  (   1 cores ,     4 GB memory)
...SUCCESS: Ran /usr/bin/rsync -tarqvz /opt/oracle/oak/onecmd/ root@192.168.16.28:/opt/oracle/oak/onecmd --exclude=*zip --exclude=*gz --exclude=*log --exclude=*trc --exclude=*rpm and it returned: RC=0

.........
SUCCESS: All nodes in /opt/oracle/oak/onecmd/tmp/db_nodes are pingable and alive.
INFO: 2020-09-17 13:53:44: Successfully setup the storage structure for the database 'TESTQ'
INFO: 2020-09-17 13:53:45: Set the following directory structure for the Database TESTQ
INFO: 2020-09-17 13:53:45: DATA: /u02/app/oracle/oradata/datastore/.ACFS/snaps/TESTQ
INFO: 2020-09-17 13:53:45: REDO: /u01/app/oracle/oradata/datastore/TESTQ
INFO: 2020-09-17 13:53:45: RECO: /u01/app/oracle/fast_recovery_area/datastore/TESTQ
SUCCESS: 2020-09-17 13:53:45: Successfully setup the Storage for the Database : TESTQ
[root@srvodap01n0 snaps]#

Once the storage created we start the new instance TESTQ on nomount state with a minimum of configuration parameters

oracle@srvodap01n1:/u01/app/oracle/local/dmk/etc/ [TESTQ] sqh

SQL*Plus: Release 12.1.0.2.0 Production on Thu Sep 17 14:11:43 2020

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/u02/app/oracle/oradata/datastore/.ACFS/snaps/TESTQ/TESTQ/initTESTQ.ora'
ORACLE instance started.

Total System Global Area 4294967296 bytes
Fixed Size                  2932632 bytes
Variable Size             889192552 bytes
Database Buffers         3372220416 bytes
Redo Buffers               30621696 bytes
SQL> show parameter db_uni

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      TESTQ
SQL> show parameter db_name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_name                              string      TESTQ
SQL> show parameter control_files

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_files                        string      /u01/app/oracle/product/12.1.0
                                                 .2/dbhome_2/dbs/cntrlTESTQ.dbf
SQL>

SQL> show parameter db_cre

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_create_file_dest                  string      /u02/app/oracle/oradata/datast
                                                 ore/.ACFS/snaps/TESTQ
db_create_online_log_dest_1          string      /u01/app/oracle/oradata/datast
                                                 ore/TESTQ
db_create_online_log_dest_2          string      /u01/app/oracle/oradata/datast
                                                 ore/TESTQ
db_create_online_log_dest_3          string
db_create_online_log_dest_4          string
db_create_online_log_dest_5          string
SQL>

Once the instance started, we can now lunch the duplicate command. Juste note that the output was truncated

Recovery Manager: Release 12.1.0.2.0 - Production on Thu Sep 17 14:18:17 2020

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect auxiliary /

connected to auxiliary database: TESTQ (not mounted)

run
{
ALLOCATE AUXILIARY CHANNEL c1 DEVICE TYPE DISK;
ALLOCATE AUXILIARY CHANNEL c2 DEVICE TYPE DISK;
DUPLICATE DATABASE TO TESTQ BACKUP LOCATION '/shareback/backup/test_restauration';
release channel c2;
release channel c2;
8> }



allocated channel: c1
channel c1: SID=17 device type=DISK

allocated channel: c2
channel c2: SID=177 device type=DISK

Starting Duplicate Db at 17-SEP-2020 14:18:46

contents of Memory Script:
{
   sql clone "create spfile from memory";
}
executing Memory Script

sql statement: create spfile from memory

contents of Memory Script:
{
   shutdown clone immediate;
   startup clone nomount;
}
executing Memory Script

Oracle instance shut down

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area    4294967296 bytes

Fixed Size                     2932632 bytes
Variable Size                889192552 bytes
Database Buffers            3372220416 bytes
Redo Buffers                  30621696 bytes
allocated channel: c1
channel c1: SID=16 device type=DISK
allocated channel: c2
channel c2: SID=177 device type=DISK

contents of Memory Script:
{
   sql clone "alter system set  control_files =
  ''/u01/app/oracle/oradata/datastore/TESTQ/TESTQ/controlfile/o1_mf_hp6o2sb2_.ctl'', ''/u01/app/oracle/oradata/datastore/TESTQ/TESTQ/controlfile/o1_mf_hp6o2sbm_.ctl'' comment=
 ''Set by RMAN'' scope=spfile";
   sql clone "alter system set  db_name =
 ''SRCDB'' comment=
 ''Modified by RMAN duplicate'' scope=spfile";
   sql clone "alter system set  db_unique_name =
 ''TESTQ'' comment=
 ''Modified by RMAN duplicate'' scope=spfile";
   shutdown clone immediate;
   startup clone force nomount
   restore clone primary controlfile from  '/shareback/test_restauration/20200916_214502_c-2736611334-20200916-04';
   alter clone database mount;
}
executing Memory Script

sql statement: alter system set  control_files =   ''/u01/app/oracle/oradata/datastore/TESTQ/TESTQ/controlfile/o1_mf_hp6o2sb2_.ctl'', ''/u01/app/oracle/oradata/datastore/TESTQ/TESTQ/controlfile/o1_mf_hp6o2sbm_.ctl'' comment= ''Set by RMAN'' scope=spfile

sql statement: alter system set  db_name =  ''SRCDB'' comment= ''Modified by RMAN duplicate'' scope=spfile

sql statement: alter system set  db_unique_name =  ''TESTQ'' comment= ''Modified by RMAN duplicate'' scope=spfile

Oracle instance shut down


Oracle instance started

Total System Global Area    4294967296 bytes

Fixed Size                     2932632 bytes
Variable Size                889192552 bytes
Database Buffers            3372220416 bytes
Redo Buffers                  30621696 bytes
allocated channel: c1
channel c1: SID=16 device type=DISK
allocated channel: c2
channel c2: SID=177 device type=DISK

Starting restore at 17-SEP-2020 14:20:21

channel c2: skipped, AUTOBACKUP already found
channel c1: restoring control file
channel c1: restore complete, elapsed time: 00:00:11
output file name=/u01/app/oracle/oradata/datastore/TESTQ/TESTQ/controlfile/o1_mf_hp6o2sb2_.ctl
output file name=/u01/app/oracle/oradata/datastore/TESTQ/TESTQ/controlfile/o1_mf_hp6o2sbm_.ctl
Finished restore at 17-SEP-2020 14:20:32


...
....
Executing: alter database force logging

contents of Memory Script:
{
   Alter clone database open resetlogs;
}
executing Memory Script

database opened
Executing: alter database flashback on
Cannot remove created server parameter file
Finished Duplicate Db at 17-SEP-2020 14:38:31

The duplicate was successful

oracle@srvodap01n1:/u01/app/oracle/local/dmk/etc/ [TESTQ] TESTQ
********* dbi services Ltd. *********
STATUS                 : OPEN
DB_UNIQUE_NAME         : TESTQ
OPEN_MODE              : READ WRITE
LOG_MODE               : ARCHIVELOG
DATABASE_ROLE          : PRIMARY
FLASHBACK_ON           : YES
FORCE_LOGGING          : YES
VERSION                : 12.1.0.2.0
CDB Enabled            : NO
*************************************
oracle@srvodap01n1:/u01/app/oracle/local/dmk/etc/ [TESTQ]

Hope that will help

L’article Duplicate Database on ODA X4 est apparu en premier sur dbi Blog.

↧

Upgrade to Oracle 19c – performance issue

October 15, 2020, 4:37 am

≪ Previous: Duplicate Database on ODA X4

In this blog I want to introduce you to a workaround for a performance issue which randomly appeared during the upgrades of several Oracle 12c databases to 19c I performed for a financial services provider. During the upgrades we ran into a severe performance issue after the upgrades of more than 40 databases had worked just fine. While most of them finished in less than one hour, we run into one which would have taken days to complete.

Issue

After starting the database upgrade from Oracle 12.2.0.1.0 to Production Version 19.8.0.0.0 the upgrade locked up during compiling:

@utlrp

Reason

One select-statement on the unified_audit_trail was running for hours with no result, blocking the upgrade progress and consuming nearly all database resources. The size of the audit_trail itself was about 35MB, so not the size you would expect such a bottleneck from:

SQL> SELECT count(*) from gv$unified_audit_trail;

Solution

After some research and testing (see notes below) I found the following workaround (after killing the upgrade process):

SQL> begin
DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL(
audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_UNIFIED,
use_last_arch_timestamp => FALSE);
end;
/
SQL> set timing on;
SELECT count(*) from gv$unified_audit_trail;
exec DBMS_AUDIT_MGMT.FLUSH_UNIFIED_AUDIT_TRAIL;

Note

As a first attempt I used the procedure below, described in Note 2212196.1.

But flush_unified_audit_trail lasted too long, so I killed the process after it ran for one hour. The flash procedure again worked fine after using clean_audit_trail as described above:

SQL> begin
DBMS_AUDIT_MGMT.FLUSH_UNIFIED_AUDIT_TRAIL;
for i in 1..10 loop
DBMS_AUDIT_MGMT.TRANSFER_UNIFIED_AUDIT_RECORDS;
end loop;
end;
/

A few days later we encountered the same issue on an Oracle 12.1.0.2 database which requires Patch 25985768 for executing dbms_audit_mgmt.transfer_unified_audit_records.

This procedure is available out of the box in the Oracle 12.2 database and in the Oracle 12.1.0.2 databases which have been patched with Patch 25985768.

To avoid to get caught in this trap it is my advise that you gather all relevant statistics before any upgrade from Oracle 12c to 19c and to query gv$unified_audit_trail in advance. This query usually finishes within a few seconds.