Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 12 hours 14 min ago

A few scripting languages for Documentum

13 hours 54 min ago

Beside the obsolete dmbasic, the autistic dmawk, the verbose java with the DfCs, the limited iapi (for API) and idql (for DQL) command-line tools, Documentum does not offer any scripting language for the administrator and the out-of-the-box experience is quite frustrating in this respect. It has been so even before the java trend so it is not a maneuver to force the use of the DfCs or to rely on it for administrative tasks. It looks more like an oversight or like this was considered as a low-priority need.
Of course, people didn’t stop at this situation and developed their own bindings with their preferred scripting language. I found db:Documentum for perl, yours truly’s DctmAPI.py for python (refer to the article here), dmgawk for gawk (see here), and of course all the JVM-based languages that leverage the DfCs such as groovy, beanshell, jython, jruby, etc… Such JVM-based scripting languages actually only need to import the DfCs library and off they go for the next killer script. In this article, I’ll show how to set up the binding for a few of those languages under the linux O/S.

db::Documentum

This is a perl v5 module that permits access to the Documentum api from the perl interpreter. It was developed by M. Scott Roth, see here, originally only for the Windows O/S and EDMS v3.1.5. Thanks to other contributors, it is now compilable under several flavors of Unices, including Linux. It is downloadable from here.
You’ll need the GNU C compiler to generate the module. Here is a detailed, step by step description of the installation procedure.

# download the archive Db-Documentum-1.64.tar.gz from here http://www.perl.com/CPAN/modules/by-module/Db/
# decompress it in, say, db::Documentum
tar -zxvf Db-Documentum-1.64.tar.gz
 
# move to the newly created directory Db-Documentum-1.64;
cd Db-Documentum-1.64
 
# prepare the following needed paths;
# DM_HOME
# path to the Documentum installation, e.g. /home/dmadmin/documentum
# DM_LIB
# path to the Documentum libdmcl.so library, e.g. ${DM_HOME}/product/7.3/bin
# note: there is also the obsolescent libdmcl40.so but I've encountered problems with it, mostly "Segmentation fault (core dumped)" crashes, so use the JNI-based libdmcl.so instead; it starts more slowly as it uses java but it is more reliable and is still supported;
# DM_INCLUDE
# path to the include file dmapp.h, e.g. ${DM_HOME}/share/sdk/include
 
# edit the linux section in Makefile.PL and provide the above paths;
# also, move up the $DM_LIB initialization before the dmcl.so test and comment the line beginning with $DM_CLIENT_LIBS =
# here is how that section looks like after editing it:

elsif ( $OS =~ /linux/i ) {
 
# The path to your Documentum client installation.
$DM_HOME = '/home/dmadmin/documentum';
 
# This is kinda a gottcha, the Linux stuff is in unix/linux
# You may have to tweak these.
 
# Path to documentum client libraries.
#$DM_LIB = "$DM_HOME/unix/linux";
$DM_LIB = "$DM_HOME/product/7.3/bin";
 
# dmcl.so file
if (! -e "$DM_LIB/libdmcl.so") {
warn "\n*** WARNING *** Could not find $DM_LIB/libdmcl.so.\nThe module will not make without " .
"libdmcl.so.\n";
}
 
# Path to directory where dmapp.h lives.
#$DM_INCLUDE = "-I/documentum/share/sdk/include/";
$DM_INCLUDE = "-I$DM_HOME/share/sdk/include/";
 
#$DM_CLIENT_LIBS = "-L$DM_LIB -ldmapi -ldmupper -ldmlower -ldmcommon -ldmupper -lcompat";
$DM_RPC_LIBS = "-L$DM_LIB -lnwrpc -lnwstcp";
$OS_LIBS = "-lsocket -lnsl -lintl";
$CPP_LIBS = "-lC";
$LD_LIBS = "-ldl";
$CPP_INC = "";
$CCFLAGS = "";
}

 
# execute the Makefile.PL;
perl Makefile.PL
 
# if the error below occurs, you must install the perl-devel module using the native package deployment tool for your distribution,
# e.g. sudo yum install perl-devel for centos:
# Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at Makefile.PL line 1.
#BEGIN failed--compilation aborted at Makefile.PL line 1.
 
# a Makefile file has been generated; correct the 2 lines below as showed;
EXTRALIBS = -L/home/dmadmin/documentum/product/7.3/bin -ldmcl
LDLOADLIBS = -L/home/dmadmin/documentum/product/7.3/bin -ldmcl -lc
 
# use the newly produced Makefile;
make
 
# run some tests to check the new module;
make test
 
# the test completes successfully but, sometimes, it is followed by SIGSEGV in the JVM;
# as it occurs at program termination, it can be ignored;
 
# install the new perl module system-wide;
sudo make install

Now that we have the module, let’s use it in a simple test case: dump of all the dm_sysobject linked in cabinet /dmadmin (its id is 0c00c35080000105) in the out-of-the box dmtest repository.

vi cat my-test.pl 
#!/usr/bin/perl

use Db::Documentum qw(:all);
use Db::Documentum::Tools qw(:all);

# print version
Db::Documentum::version;

$docbase = "dmtest";
$username = "dmadmin";
$password = "dmadmin";

# connect;
$result = dm_Connect($docbase, $username, $password) || die("could not connect in " . $docbase . " as " . $username . " with password " . $password);

# run the query;
$status = dmAPIExec("execquery,c,,select r_object_id, r_object_type, object_name from dm_sysobject where folder(ID('0c00c35080000105'))");
if (1 != $status) {
   $err_mess = dmAPIGet("getmessage,c");
   print $err_mess;
   die();
}
$query_id = dmAPIGet("getlastcoll,c");
printf "%-16s  %-20s  %s\n", "r_object_id", "r_object_type", "object_name";
while (dmAPIExec("next,c," . $query_id)) {
   $r_object_id = dmAPIGet("get,c," . $query_id . ",r_object_id");
   $r_object_type = dmAPIGet("get,c," . $query_id . ",r_object_type");
   $object_name = dmAPIGet("get,c," . $query_id . ",object_name");
   printf "%16s  %-20s  %s\n", $r_object_id, $r_object_type, $object_name;
}
dmAPIExec("close,c," . $query_id);

# disconnect;
dmAPIExec("disconnect,c");
exit;

The script is very trivial and needs little explanation. Note the new functions dm_Connect, dmAPIExec and dmAPIGet. dmAPISet, dmAPIInit and dmAPIDeInit are also available but the last two don’t need to be invoked explicitly for they are automatically at module load-time.
Example of execution:

perl my-test.pl
 
Perl version: 5.016003
Db::Documentum version: 1.64
DMCL version: 7.3.0000.0205
 
r_object_id r_object_type object_name
0800c3508000019b dm_job dm_PropagateClientRights
0800c3508000019c dm_job dm_PostUpgradeAction
0800c35080000408 dmc_wfsd_type_info integer
0800c35080000409 dmc_wfsd_type_info boolean
0800c3508000040a dmc_wfsd_type_info double
0800c3508000040b dmc_wfsd_type_info string
0800c3508000040c dmc_wfsd_type_info date
0800c3508000040d dmc_wfsd_type_info repeating integer
0800c3508000040e dmc_wfsd_type_info repeating boolean
0800c3508000040f dmc_wfsd_type_info repeating double
0800c35080000410 dmc_wfsd_type_info repeating string
0800c35080000411 dmc_wfsd_type_info repeating date
0800c35080000426 dm_sysobject dm_indexAgentAcquireLock
0800c35080000587 dm_client_rights dfc_localhost_c0XP4a
0800c35080001065 dm_jms_config JMS dmtest:9080 for dmtest.dmtest
0800c35080001066 dm_jms_config JMS localhost.localdomain:9080 for dmtest.dmtest
0b00c35080000233 dm_folder Workspace Customizations
1700c3508000015d dm_outputdevice Default LP Printer
3a00c3508000013f dm_location storage_01
3a00c35080000140 dm_location common
3a00c35080000141 dm_location events
3a00c35080000142 dm_location log
3a00c35080000143 dm_location config
3a00c35080000144 dm_location dm_dba
3a00c35080000145 dm_location auth_plugin
3a00c35080000146 dm_location ldapcertdb_loc
3a00c35080000147 dm_location temp
3a00c35080000148 dm_location dm_ca_store_fetch_location
3a00c35080000153 dm_location convert
3a00c35080000154 dm_location dsearch
3a00c35080000155 dm_location nls_chartrans
3a00c35080000156 dm_location check_signature
3a00c35080000157 dm_location validate_user
3a00c35080000158 dm_location assume_user
3a00c35080000159 dm_location secure_common_area_writer
3a00c3508000015a dm_location change_password_local
3a00c3508000015b dm_location thumbnail_storage_01
3a00c3508000015c dm_location streaming_storage_01
3a00c35080000226 dm_location replicate_location
3a00c35080000227 dm_location replica_storage_01
3e00c35080000149 dm_mount_point share
6700c35080000100 dm_plugin CSEC Plugin
6700c35080000101 dm_plugin Snaplock Connector

Now, the power of perl and its more than 25’000 modules are at our fingertips to help us tackle those hairy administrative tasks !

groovy

Being a JVM-based language, groovy runs on top of a JVM, and therefore benefits of all its advantages such as automatic garbage collection (although this is not an exclusivity of java) and portability (ditto), and can tap into the uncountable existing JAVA libraries (ditto).
groovy is used in Documentum’s xPlore.
groovy is a powerful, yet easy to learn, programing language still actively maintained by the Apache foundation, cf. here. Similar to java but without its verbosity, it should instantly appeal to java developers doing Documentum administrative tasks.
groovy comes with groovysh, a comfortable and powerful interactive shell for trying out statements and experimenting with the language.
By importing the DfCs, we can use groovy to access Documentum repositories. Knowledge of the DfCs are required of course.
To install groovy, use the distribution’s package manager; e.g. on my Ubuntu, I’ve used:

sudo apt-get install groovy

while on Centos, the following command will do it:

sudo yum install groovy

To test it, let’s use the same program as for perl, but rewritten a la groovy:

#! /usr/bin/groovy

import System.*;
import java.io.*;

import com.documentum.fc.client.*;
import com.documentum.fc.common.*;

   static void main(String[] args) {
      docbroker = "dmtest";
      docbase = " dmtest";
      username = "dmadmin";
      password = "dmadmin";
   
      println("attempting to connect to " + docbase + " as " + username + "/" + password + " via docbroker " + docbroker);
   
      try {
         client = DfClient.getLocalClient();
      
         config = client.getClientConfig();
         config.setString ("primary_host", docbroker);
      
         logInfo = new DfLoginInfo();
         logInfo.setUser(username);
         logInfo.setPassword(password);
         docbase_session = client.newSession(docbase, logInfo);
      
         if (docbase_session != null) {
            println("Got a session");
      
            // do something in the session;
            folderId = new DfId("0c00c35080000105");
            folder = docbase_session.getObject(folderId);
            attrList = "r_object_id,r_object_type,object_name";
            coll = folder.getContents(attrList);
      
            while (coll.next())
               System.out.printf("ObjectId=%-16s r_object_type=%-20s ObjectName=%s\n",
                                 coll.getString("r_object_id"),
                                 coll.getString("r_object_type"),
                                 coll.getString("object_name"));
            println("Finished");
            docbase_session.disconnect();
         }
         else
            println("Didn't get a session");
      }
      catch (e) {
         println("Exception was: " + e);
      }
   }

Lines 6 & 7 import the DfC so don’t forget to add them to the CLASSPATH; normally they are because the execution environment is a Documentum client, e.g.:

export JAVA_HOME=/home/dmadmin/documentum/shared/java64/1.8.0_77
export CLASSPATH=/home/dmadmin/documentum/shared/dfc/dfc.jar
export PATH=$JAVA_HOME/bin:$PATH

Line 15 & 38 show that besides its own built-in functions, groovy can also use equivalent functions from the java libraries.
To invoke the script:

groovy tgroovy.gry
# or make it executable and call it:
chmod +x tgroovy.gry
./tgroovy.gry

Here is its output:

attempting to connect to dmtest as dmadmin/dmadmin via docbroker dmtest
Got a session
ObjectId=0800c3508000019b r_object_type=dm_job ObjectName=dm_PropagateClientRights
ObjectId=0800c3508000019c r_object_type=dm_job ObjectName=dm_PostUpgradeAction
ObjectId=0800c35080000408 r_object_type=dmc_wfsd_type_info ObjectName=integer
ObjectId=0800c35080000409 r_object_type=dmc_wfsd_type_info ObjectName=boolean
ObjectId=0800c3508000040a r_object_type=dmc_wfsd_type_info ObjectName=double
...
ObjectId=3a00c35080000227 r_object_type=dm_location ObjectName=replica_storage_01
ObjectId=3e00c35080000149 r_object_type=dm_mount_point ObjectName=share
ObjectId=6700c35080000100 r_object_type=dm_plugin ObjectName=CSEC Plugin
ObjectId=6700c35080000101 r_object_type=dm_plugin ObjectName=Snaplock Connector
Finished

jython

jython is a python implementation written in java, see here.
A such, it offers most of the features of the powerful python language although it stays behind the latest python version (v2.5.3 vs. 3.7).
Like java, groovy, jruby, scala, etc …, jython runs inside a JVM and can use all the java libraries such as the DfCs and become a Documentum client with no changes except adding the DfCs to the $CLASSPATH.
jython appeals especially to people who already know python; like for groovy, a basic level knowledge of the DfCs is required.
To install jython, use your distribution’s package manager, e.g.

# for ubuntu:
sudo apt-get install jython

Make sure the DfCs are present in $CLASSPATH, otherwise add them:

export CLASSPATH=/home/dmadmin/documentum/shared/dfc/dfc.jar:

When runing the test script below, the DfCs complain about a bad date format:

20:42:05,482 ERROR [File Watcher] com.documentum.fc.common.DfPreferences - [DFC_PREFERENCE_BAD_VALUE] Bad value for preference "dfc.date_format", value="M/d/yyyy, h:mm:ss a"
com.documentum.fc.common.DfException: Illegal syntax found in the date format 'M/d/yyyy, h:mm:ss a'. The default localized short date format will be used.
at com.documentum.fc.common.DfException.newIllegalDateFormatException(DfException.java:109)

Evidently, they are unhappy with the default date format. The work-around is to force one in the dfc.properties file by adding the line below (choose a date format that best suits you):

dfc.date_format=dd.MM.yyyy HH:mm:ss

Since the error did not occur with groovy (nor in the provided JNI-enabled command-line tools such as iapi, idql and dmawk), it is not the DfCs per se that have this problem but the combination of java + jython + DfCs.
Here comes the test script:

#!/usr/bin/env jython

# install jython via your O/S package manager;
# export CLASSPATH=/home/dmadmin/documentum/shared/dfc/dfc.jar:$CLASSPATH
# edit the documentum/shared/config/dfc.properties and add a dfc.date property, e.g.:
# dfc.date_format=dd.MM.yyyy HH:mm:ss
# execute:
#   jython test.jy
# or:
#   chmod +x test.jy; ./test.jy
# can also be execute interactively as follows:
# start jython:
#    jython
# call the test script;
#    execfile("/home/dmadmin/test.jy")

import traceback
import com.documentum.fc.client as DFCClient
import com.documentum.fc.common as DFCCommon

docbroker = "dmtest"
docbase = " dmtest"
username = "dmadmin"
password = "dmadmin"
print("attempting to connect to " + docbase + " as " + username + "/" + password + " via docbroker " + docbroker)
try:
  client = DFCClient.DfClient.getLocalClient()

  config = client.getClientConfig()
  config.setString ("primary_host", docbroker)

  logInfo = DFCCommon.DfLoginInfo()
  logInfo.setUser(username)
  logInfo.setPassword(password)
  docbase_session = client.newSession(docbase, logInfo)

  if docbase_session is not None:
    print("Got a session")
    # do something in the session;
    folderId = DFCCommon.DfId("0c00c35080000105");
    folder = docbase_session.getObject(folderId);
    attrList = "r_object_id,r_object_type,object_name";
    coll = folder.getContents(attrList);
    while(coll.next()):
       print("ObjectId=" + "%-16s" % coll.getString("r_object_id") + 
             " r_object_type=" + "%-20s" % coll.getString("r_object_type") +
             " ObjectName=" + coll.getString("object_name"));
    print("Finished");
    docbase_session.disconnect()
  else:
    print("Didn't get a session")
except Exception:
    print(Exception)

Execution:

jython test.jy
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.python.core.PySystemState (file:/usr/share/java/jython-2.5.3.jar) to method java.io.Console.encoding()
WARNING: Please consider reporting this to the maintainers of org.python.core.PySystemState
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
attempting to connect to dmtest as dmadmin/dmadmin via docbroker dmtest
Got a session
ObjectId=0800c3508000019b r_object_type=dm_job ObjectName=dm_PropagateClientRights
ObjectId=0800c3508000019c r_object_type=dm_job ObjectName=dm_PostUpgradeAction
ObjectId=0800c35080000408 r_object_type=dmc_wfsd_type_info ObjectName=integer
ObjectId=0800c35080000409 r_object_type=dmc_wfsd_type_info ObjectName=boolean
ObjectId=0800c3508000040a r_object_type=dmc_wfsd_type_info ObjectName=double
...
ObjectId=3a00c35080000227 r_object_type=dm_location ObjectName=replica_storage_01
ObjectId=3e00c35080000149 r_object_type=dm_mount_point ObjectName=share
ObjectId=6700c35080000100 r_object_type=dm_plugin ObjectName=CSEC Plugin
ObjectId=6700c35080000101 r_object_type=dm_plugin ObjectName=Snaplock Connector
Finished

Ironically, the jython’s launcher is a perl script; it basically initializes java and python environment variables such as classpath, java options and jython home and path. If the initial WARNINGs are disruptive, edit that script and correct the problem or just redirect the stderr to null, e.g.:

jython test.jy 2> /dev/null

So, which one to choose ?

To summarize, the decision tree below may help choosing one scripting language among the preceding ones.

DfCs knowledge ?:
java proficiency ?:
choose groovy with the DfCs
else python proficiency ?:
choose jython with the DfCs
else select one of the following ones:
get acquainted with one of the above languages
| choose another JVM-based language
| give up the DfCs and use DQL/API with perl, python of gawk instead (see below)
else perl proficiency ?:
choose db::Documentum
else python proficiency ?:
choose python and DctmAPI.py
else nawk/gawk proficiency ?
choose gawk and dmgawk binding
else select one of the following:
learn one of the above scripting languages
| develop a Documentum binding for your preferred scripting language not in the list
| hire dbi-services for your administrative tasks or projects ;-)

DfCs are clearly indicated to java programmers. They are still supported and new features are always accessible from them. There are tasks which cannot be done through the API or DQL and only through the DfCs, but generally those are out of the scope of an administrator. Note that even the non java and DfCs languages still finish up invoking the DfCs in the background because they are linked with the libdmcl.so library and that one makes JNI behind-the-scene calls to the DfCs for them, thus hiding their complexity. Hopefully, this shared library will stay with us still for some time otherwise our scripting language choice will be seriously restricted to JVM-based languages and the DfCs.

Cet article A few scripting languages for Documentum est apparu en premier sur Blog dbi services.

Docker EE – UCP 3.1.2 installation failed in Azure

Thu, 2019-02-14 09:06

Docker EE 2.1 is a Containers-as-a-Service (CaaS) platform on-premise or in the cloud. To manage your cluster of containers, Docker EE allows you to install UCP (Universal Control Panel).

When you plan to install the UCP 3.1.2 in Azure, the installation fails during the deployment of the UCP service. You will get the following error message.

Screenshot 2019-02-14 at 15.44.13

[docker@docker-manager000 ~]$ docker logs ucp-reconcile

Screenshot 2019-02-12 at 08.28.37

 

If you google the error message you will get a lot of answers but almost all of them are not suitable to solve the issue. According to the Docker community, the current issue is related to UCP 3.1.0, 3.1.1, 3.1.2 in certain cases.

The best solution provided by the community is to install a previous version of UCP (3.0.7) and upgrade it from within the UCP dashboard.

Before installing another version of UCP, remove the current installation:

[docker@docker-manager000 ~]$ sudo docker swarm leave --force
[docker@docker-manager000 ~]$ sudo docker stop $(sudo docker ps -aq)
[docker@docker-manager000 ~]$ sudo docker rm $(sudo docker ps -aq) --force
[docker@docker-manager000 ~]$ sudo docker rmi $(sudo docker images -aq) --force
[docker@docker-manager000 ~]$ sudo docker network prune
[docker@docker-manager000 ~]$ sudo docker system prune --force

Re-try the installation with UCP version 3.0.7

[docker@docker-manager000 ~]$ docker container run --rm -it \
>   --name ucp \
>   -v /var/run/docker.sock:/var/run/docker.sock \
>   docker/ucp:3.0.7 install \
>   --host-address 52.170.5.187 \
>   --pod-cidr 10.0.0.0/24 \
>   --cloud-provider Azure \
>   --interactive

Output:

Unable to find image 'docker/ucp:3.0.7' locally

3.0.7: Pulling from docker/ucp

ff3a5c916c92: Pull complete

a52011fa0ead: Pull complete

dbfd538ef7b5: Pull complete

Digest: sha256:21219684bfafd78c49669865b769c775f132c5b0915a5ec5fd8b41d9a6f52597

Status: Downloaded newer image for docker/ucp:3.0.7

INFO[0000] Your engine version 18.09.2, build 1ac774d (3.10.0-862.11.6.el7.x86_64) is compatible with UCP 3.0.7 (a422a90)

Admin Username: test

Admin Password:

Confirm Admin Password:

INFO[0016] Pulling required images... (this may take a while)

INFO[0016] Pulling docker/ucp-cfssl:3.0.7

INFO[0017] Pulling docker/ucp-dsinfo:3.0.7

INFO[0036] Pulling docker/ucp-metrics:3.0.7

INFO[0039] Pulling docker/ucp-hyperkube:3.0.7

INFO[0046] Pulling docker/ucp-kube-compose:3.0.7

INFO[0074] Pulling docker/ucp-kube-dns-dnsmasq-nanny:3.0.7

INFO[0078] Pulling docker/ucp-controller:3.0.7

INFO[0084] Pulling docker/ucp-etcd:3.0.7

INFO[0085] Pulling docker/ucp-auth:3.0.7

INFO[0087] Pulling docker/ucp-calico-cni:3.0.7

INFO[0090] Pulling docker/ucp-kube-dns-sidecar:3.0.7

INFO[0092] Pulling docker/ucp-interlock-extension:3.0.7

INFO[0093] Pulling docker/ucp-auth-store:3.0.7

INFO[0095] Pulling docker/ucp-agent:3.0.7

INFO[0096] Pulling docker/ucp-compose:3.0.7

INFO[0097] Pulling docker/ucp-swarm:3.0.7

INFO[0099] Pulling docker/ucp-pause:3.0.7

INFO[0100] Pulling docker/ucp-calico-kube-controllers:3.0.7

INFO[0101] Pulling docker/ucp-calico-node:3.0.7

INFO[0110] Pulling docker/ucp-kube-dns:3.0.7

INFO[0135] Pulling docker/ucp-interlock:3.0.7

INFO[0135] Pulling docker/ucp-interlock-proxy:3.0.7

WARN[0139] None of the hostnames we'll be using in the UCP certificates [docker-manager000 127.0.0.1 172.17.0.1 52.170.5.187] contain a domain component.  Your generated certs may fail TLS validation unless you only use one of these shortnames or IPs to connect.  You can use the --san flag to add more aliases

You may enter additional aliases (SANs) now or press enter to proceed with the above list.

Additional aliases:

INFO[0000] Initializing a new swarm at 52.170.5.187

INFO[0020] Installing UCP with host address 52.170.5.187 - If this is incorrect, please specify an alternative address with the '--host-address' flag

INFO[0020] Deploying UCP Service...

INFO[0120] Installation completed on docker-manager000 (node 5nhn0oqi9h8tjo99t15v3ckr5)

INFO[0120] UCP Instance ID: ccraf0c9vbrcnkxix11r7bdue

INFO[0120] UCP Server SSL: SHA-256 Fingerprint=9D:E3:7D:4C:12:03:B9:03:21:C6:5A:0C:87:C3:AA:A2:48:B0:C1:91:46:D4:12:A1:6F:05:64:C3:72:13:4B:6F

INFO[0120] Login to UCP at https://52.170.5.187:443

INFO[0120] Username: test

INFO[0120] Password: (your admin password)

StackOverflow post : https://stackoverflow.com/questions/54189511/docker-unable-to-reconcile-state-of-kubernetes-addons-component-error-while-dep/54646555#54646555

Cet article Docker EE – UCP 3.1.2 installation failed in Azure est apparu en premier sur Blog dbi services.

How To Create A New Container Group Using Azure CLI

Thu, 2019-02-14 07:18
Steps To Create A New Container Group Using Azure CLI

Before starting, a MSDN account is needed to use the Azure platform, please refer to your manager for further information.

In this tutorial, the Azure CLI command-line tool will be install on Windows and we will deploy a website running the open source GHOST platform.

Choose the right operating system to install the command-line tool. The information can be found here.

When the installation is done, launch the command prompt and type az login to authenticate on the Azure platform (see below):
az login
Wait until a browser is launched so that you can login (see below):

az_authentication1
You have logged into Microsoft Azure!

To create a new container, the following command will be used:

  • az group create
  • az container create
  • az container show
  • az container logs
  • az group delete

Open the command prompt and type powershell to start.

Define the variable resource group that will hold the container group:

  • $resourceGroup = “TutoGhost” (hit enter)

Define the variable location:

  • $location = “westeurope” (hit enter)

Create the resource group and the location:

  • az group create -n $resourceGroup -l $location (hit enter)

The resource group and the location are now created.

Define the variable container group (only lowercase letters, numbers and dashes):

  • $containerGroupName = “site1-ghost” (hit enter)

Create the container group:

  • az container create -g $resourceGroup -n $containerGroupName `
  • –image ghost `
  • –ports 2368 `
  • –ip-address public `
  • –dns-name-label testtutoghost

The dns-name-label should be unique.

Hit enter to execute the commands.

During the execution process, use the following command-line to check the status of the container group:

  • az container show `
  • -g $resourceGroup -n $containerGroupName

The provisioningState variable display the three state of creation of the container group which is pending, creating and succeeded.

A public random ip address is assigned and a fully qualified domain name which is testtutoghost.westeurope.azurecontainer.io

To check if the container is up and running, open a browser and type the FQDN followed by colon 2368.

To check the container group logs, type the following command:

  • az container logs `
  • -n $containerGroupName -g $resourceGroup

As a test environment and to avoid paying extra cost, make sure to delete the resource group (command-line below) created if not used.

  • az group delete -n $resourceGroup -y

Cet article How To Create A New Container Group Using Azure CLI est apparu en premier sur Blog dbi services.

OBUG 2019 – First event from Oracle Users Group in Belgium!

Sat, 2019-02-09 09:00
Introduction

It’s the first edition of the Techdays, the Oracle community event in Belgium. This event happened in Antwerp these past 2 days, and a lot of speakers came from around the world to talk about their experience on focused subjects. Really amazing to heard such high-level conferences.
And it was a great pleasure for me because I’ve been working in Belgium for several years before.

I’ll will try to give you a glimpse of what I found the most interesting among the sessions I chose.

Cloud: you cannot ignore it anymore!

Until now, I did not have much interest in the cloud because my job is actually helping customers to build on-premise (that means not in the cloud) environments. If you can live without the cloud, you cannot ignore it anymore now because it deals with budget, infrastructure optimization, strategy, flexibility, scalability.
Cloud is bringing a modern pay-what-you-need-now model compared to monolitic and costly infrastructures you’ll have to pay off in years. Cloud is bringing a service for a problem.

Cloud providers have understood that now or later, customers will move at least parts of their infrastructure into the cloud.

Going into the cloud is not something you answer yes or no. It’s a real project that you’ll have to study as it needs to rethink nearly everything. Migrating your current infrastructure to the cloud without any changes would probably be a mistake. Don’t consider the cloud as just putting your servers elsewhere.

I learned that Oracle datacenters are actually not dedicated datacenters: most of the time, their cloud machines are located in existing datacenters from different providers, making your connection to the cloud sometimes only meters away from your actual servers!

And for those who still don’t want to move their critical data somewhere outside, Oracle brings another solution named Cloud-at-customer. It’s basically the same as pure cloud in terms of management, but Oracle delivers the servers in your datacenter, keeping your data secured. At least for those who are thinking that way :-)

EXADATA

You probably know that EXADATA is the best database machine you can buy, and it’s true. But EXADATA is not the machine every customer needs (actually ODA is much more affordable and popular), only very demanding databases can benefit from this platform.

Gurmeet Goindi, the EXADATA product manager at Oracle, told us that EXADATA will still increase the gap from classic technologies.

For example, I heard that EXADATA’s maximum numbers of PDB will increase to 4’000 in one CDB, even you’ll probably never reach this limit, it’s quite an amazing number.

If I didn’t heard about new hardware coming shortly, major Exadata enhancements will come with 19c software stack release.

19c is coming

Maria Colgan from Oracle introduced the 19c new features and enhancements.

We were quite used to previous Oracle releases, R1 and R2, R1 being the very first release with a low adoption from the customer, and R2 being the mature release with longuer term support. But after the big gap between 12cR1 and 12cR2, Oracle changed the rules for a more Microsoft-like versioning: version number is now the year of product delivery. Is there still longer term release like 11gR2 was? For sure, and 19c will be the one you were waiting for. You may know that 18c was some kind of 12.2.0.2. 19c will be the lastest version of the 12.2 kernel, 12.2.0.3. If you plan to migrate your databases from previous releases this year, you should know that 19c will be available shortly, and that this could worth the wait.

19c will bring bunch of new features, like automatic indexing, which could be part of a new option. PDB could now have separate encryptions keys, and not only one for the CDB.

InMemory option will receive enhancements and now supports the storage of objects in both column and row. InMemory content can now be different between the primary and the active standby, making distribution of read only statements more efficient. If your memory is not big enough to store all your InMemory data, it will soon be possible (on Exadata only) to put the columnar data into flash to keep the benefit of columnar scans outside memory. It makes sense.

Brief overview of new “runaway queries” feature, there will be a kind of quarantine for statements that reach resource plan limits. Goal is to avoid the need for the DBA to connect and kill the session to free up system resources.

Autonomous database will also be there in 19c, but first limited to Cloud-at-customer Exadatas. It will take some years for all databases to become automomous :-)

Zero Downtime Migration tool

What you’ve been dreaming of for years is now nearly there. A new automatic migration tool without downtime. But don’t dream too much because it seems to be limited to migration to the cloud and source and target should be in the same version (11g, 12c or 18c). Don’t know actually if it will support migrations from on-premise to on-premise.

Ricardo Gonzalez told us that with this new tool, you will be able to migrate to the cloud very easily. It’s a one button approach, with a lot of intelligence inside the tool to maximize the security of the operation. And as described it looks great, and everything is considered, pre-checks, preparation, migration, post-migration, post-checks and so on. You’ll still have to do the final switchover yourself, and yes it’s based on Data Guard, so you can trust the tool as it relies on something reliable. If something goes wrong, you can still move back to on-premise database.

Autoupgrade tool

Another great tool presented by Mike Dietrich is coming with 19c. It’s a java based tool able to plan and manage database upgrades with a single input file describing the environment. It seems very useful if you have a lot of databases to upgrade. Refer to MOS Note: 2485457.1 if you’re interested.

Conclusion

So many interesting things to learn in these 2 days! Special thanks to Philippe Fierens and Pieter Van Puymbroeck for the organization.

Cet article OBUG 2019 – First event from Oracle Users Group in Belgium! est apparu en premier sur Blog dbi services.

What are custom and generic plans in PostgreSQL?

Tue, 2019-02-05 08:28

I have already written a post about prepared statements in PostgreSQL some time ago. What I did not mention in that post is the concept of generic and custom plans. So lets have a look at those.

As always, we start with creating a demo table and populate that table with some sample data:

pgbench=# create table demo ( a int, b text );
CREATE TABLE
pgbench=# insert into demo select i, 'aaa' from generate_series (1,100) i;
INSERT 0 100
pgbench=# insert into demo select i, 'bbb' from generate_series (101,200) i;
INSERT 0 100
pgbench=# insert into demo select i, 'ccc' from generate_series (201,300) i;
INSERT 0 100
pgbench=# analyze demo;
ANALYZE

Now that we have some data we can prepare a statement we would like to execute with various values:

pgbench=# prepare my_stmt as select * from demo where b = $1;
PREPARE

Btw: You can check for the currently available prepared statements in your session by querying the pg_prepared_statements catalog view:

pgbench=# select * from pg_prepared_statements;
  name   |                      statement                      |         prepare_time          | parameter_types | from_sql 
---------+-----------------------------------------------------+-------------------------------+-----------------+----------
 my_stmt | prepare my_stmt as select * from demo where b = $1; | 2019-02-05 13:15:39.232521+01 | {text}          | t

When we explain(analyze) that statement what do we see?

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.111..0.230 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.344 ms
 Execution time: 0.285 ms
(5 rows)

In the “Filter” line of the execution plan we can see the real value (‘aaa’) we passed to our prepared statement. When you see that, it is a so called custom plan. When PostgreSQL goes for a custom plan that means the statement will be re-planned for the provided set of parameters. When you execute that a few times more:

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.045..0.158 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.243 ms
 Execution time: 0.225 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.035..0.123 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.416 ms
 Execution time: 0.173 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.036..0.124 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.195 ms
 Execution time: 0.178 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.035..0.126 rows=100 loops=1)
   Filter: (b = 'aaa'::text)
   Rows Removed by Filter: 200
 Planning time: 0.192 ms
 Execution time: 0.224 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'aaa' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.038..0.130 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning time: 0.191 ms
 Execution time: 0.183 ms
(5 rows)

… you will see that the “Filter” line changes from displaying the actual parameter to a place holder. Now we have a generic plan. This generic plan will not change anymore for the lifetime of the prepared statement no matter which value you pass into the prepared statement:

pgbench=# explain (analyze) execute my_stmt ( 'bbb' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.096..0.219 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning time: 0.275 ms
 Execution time: 0.352 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'ccc' );
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.090..0.132 rows=100 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 200
 Planning time: 0.084 ms
 Execution time: 0.204 ms
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( null );
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.75 rows=100 width=8) (actual time=0.033..0.033 rows=0 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 300
 Planning time: 0.018 ms
 Execution time: 0.051 ms
(5 rows)

When you take a look at the source code (src/backend/utils/cache/plancache.c) you will see why it changes after 5 executions:

/*
 * choose_custom_plan: choose whether to use custom or generic plan
 *
 * This defines the policy followed by GetCachedPlan.
 */
static bool
choose_custom_plan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
        double          avg_custom_cost;

        /* One-shot plans will always be considered custom */
        if (plansource->is_oneshot)
                return true;

        /* Otherwise, never any point in a custom plan if there's no parameters */
        if (boundParams == NULL)
                return false;
        /* ... nor for transaction control statements */
        if (IsTransactionStmtPlan(plansource))
                return false;

        /* See if caller wants to force the decision */
        if (plansource->cursor_options & CURSOR_OPT_GENERIC_PLAN)
                return false;
        if (plansource->cursor_options & CURSOR_OPT_CUSTOM_PLAN)
                return true;

        /* Generate custom plans until we have done at least 5 (arbitrary) */
        if (plansource->num_custom_plans < 5)
                return true;

Even if we change the data and analyze the table again we will still get a generic plan once PostgreSQL went for it:

pgbench=# insert into demo select i, 'ddd' from generate_series (201,210) i;
INSERT 0 10
pgbench=# insert into demo select i, 'ee' from generate_series (211,211) i;
INSERT 0 1
pgbench=# analyze demo;
ANALYZE
pgbench=# select b,count(*) from demo group by b order by b;
  b  | count 
-----+-------
 aaa |   100
 bbb |   100
 ccc |   100
 ddd |    10
 ee  |     1
(5 rows)

pgbench=# explain (analyze) execute my_stmt ( 'ddd' );
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Seq Scan on demo  (cost=0.00..5.88 rows=78 width=8) (actual time=0.147..0.151 rows=10 loops=1)
   Filter: (b = $1)
   Rows Removed by Filter: 300
 Planning time: 0.293 ms
 Execution time: 0.190 ms
(5 rows)

The situation changes when we have much more data, data is not uniformly distributed and we have an index on the column “b”:

pgbench=# truncate demo;
TRUNCATE TABLE
pgbench=# insert into demo select i, 'aaa' from generate_series (1,1000000) i;
INSERT 0 1000000
pgbench=# insert into demo select i, 'bbb' from generate_series (1000001,2000000) i;
INSERT 0 1000000
pgbench=# insert into demo select i, 'ccc' from generate_series (2000001,3000000) i;
INSERT 0 1000000
pgbench=# insert into demo select i, 'eee' from generate_series (3000001,3000010) i;
INSERT 0 10
pgbench=# create index i1 on demo (b);
CREATE INDEX
pgbench=# select b,count(*) from demo group by b order by b;
  b  |  count  
-----+---------
 aaa | 1000000
 bbb | 1000000
 ccc | 1000000
 eee |      10
(4 rows)

pgbench=# prepare my_stmt as select * from demo where b = $1;
PREPARE

No matter how often we execute the following statement (which asks for ‘eee’), we never get a generic plan:

pgbench=# explain (analyze) execute my_stmt ('eee');
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Index Scan using i1 on demo  (cost=0.43..4.45 rows=1 width=8) (actual time=0.054..0.061 rows=10 loops=1)
   Index Cond: (b = 'eee'::text)
 Planning time: 0.249 ms
 Execution time: 0.106 ms
(4 rows)

-----> REPEAT THAT HOW OFTEN YOU WANT BUT AT LEAST 10 TIMES

pgbench=# explain (analyze) execute my_stmt ('eee');
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Index Scan using i1 on demo  (cost=0.43..4.45 rows=1 width=8) (actual time=0.054..0.061 rows=10 loops=1)
   Index Cond: (b = 'eee'::text)
 Planning time: 0.249 ms
 Execution time: 0.106 ms

This is because the custom plan (which includes the costs for re-planning) is always cheaper than the generic plan (which does not include the costs for re-planning) when we have a data distribution like that. The documentation is very clear about that: “Using EXECUTE values which are rare in columns with many duplicates can generate custom plans that are so much cheaper than the generic plan, even after adding planning overhead, that the generic plan might never be used”.

Hope that helps.

Cet article What are custom and generic plans in PostgreSQL? est apparu en premier sur Blog dbi services.

Documentum – MigrationUtil – 2 – Change Docbase Name

Tue, 2019-02-05 02:01

You are attending the second episode of the MigrationUtil series, today we will change the Docbase Name. If you missed the first one, you can find it here. I did this change on Documentum CS 16.4 with Oracle database, on the same docbase I already used to change the docbase ID.
My goal is to do both changes on the same docbase because that’s what I will need in the future.

So, we will be interested in the docbase RepoTemplate to change his name to repository1.

1. Migration preparation

I will not give the overview of the MigrationUtil, as I already did in the previous blog.
1.a Update the config.xml file
Below is the updated version of config.xml file to change the Docbase Name:

[dmadmin@vmtestdctm01 ~]$ cat $DOCUMENTUM/product/16.4/install/external_apps/MigrationUtil/config.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>Database connection details</comment>
<entry key="dbms">oracle</entry> <!-- This would be either sqlserver, oracle, db2 or postgres -->
<entry key="tgt_database_server">vmtestdctm01</entry> <!-- Database Server host or IP -->
<entry key="port_number">1521</entry> <!-- Database port number -->
<entry key="InstallOwnerPassword">install164</entry>
<entry key="isRCS">no</entry>    <!-- set it to yes, when running the utility on secondary CS -->

<!-- <comment>List of docbases in the machine</comment> -->
<entry key="DocbaseName.1">RepoTemplate</entry>

<!-- <comment>docbase owner password</comment> -->
<entry key="DocbasePassword.1">install164</entry>
...
<entry key="ChangeDocbaseName">yes</entry>
<entry key="NewDocbaseName.1">repository1</entry>
...

Put all other entry to no.
The tool will use above information, and load more from the server.ini file.

2. Before the migration (optional)

– Get docbase map from the docbroker:

[dmadmin@vmtestdctm01 ~]$ dmqdocbroker -t vmtestdctm01 -c getdocbasemap
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0000.0185
Targeting port 1489
**************************************************
**     D O C B R O K E R    I N F O             **
**************************************************
Docbroker host            : vmtestdctm01
Docbroker port            : 1490
Docbroker network address : INET_ADDR: 02 5d2 c0a87a01 vmtestdctm01 192.168.122.1
Docbroker version         : 16.4.0000.0248  Linux64
**************************************************
**     D O C B A S E   I N F O                  **
**************************************************
--------------------------------------------
Docbase name        : RepoTemplate
Docbase id          : 1000600
Docbase description : Template Repository
Govern docbase      : 
Federation name     : 
Server version      : 16.4.0000.0248  Linux64.Oracle
Docbase Roles       : Global Registry
...

– Create a document in the docbase:
Create an empty file

touch /home/dmadmin/DCTMChangeDocbaseExample.docx

Create document in the repository using idql

create dm_document object
SET title = 'DCTM Change Docbase Document Example',
SET subject = 'DCTM Change Docbase Document Example',
set object_name = 'DCTMChangeDocbaseExample.docx',
SETFILE '/home/dmadmin/DCTMChangeDocbaseExample.docx' with CONTENT_FORMAT= 'msw12';

Result:

object_created  
----------------
090f449880001125
(1 row affected)

note the r_object_id.

3. Execute the migration

3.a Stop the Docbase and the Docbroker

$DOCUMENTUM/dba/dm_shutdown_RepoTemplate
$DOCUMENTUM/dba/dm_stop_DocBroker

3.b Update the database name in the server.ini file
It is a workaround to avoid below error:

Database Details:
Database Vendor:oracle
Database Name:DCTMDB
Databse User:RepoTemplate
Database URL:jdbc:oracle:thin:@vmtestdctm01:1521/DCTMDB
ERROR...Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor

In fact, the tool deal with the database name as a database service name, and put “/” in the url instead of “:”. The best workaround I found is to update database_conn value in the server.ini file, and put the service name instead of the database name.
Check the tnsnames.ora and note the service name, in my case is dctmdb.local.

[dmadmin@vmtestdctm01 ~]$ cat $ORACLE_HOME/network/admin/tnsnames.ora 
DCTMDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vmtestdctm01)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = dctmdb.local)
    )
  )

Make the change in the server.ini file:

[dmadmin@vmtestdctm01 ~]$ vi $DOCUMENTUM/dba/config/RepoTemplate/server.ini
...
[SERVER_STARTUP]
docbase_id = 1000600
docbase_name = RepoTemplate
server_config_name = RepoTemplate
database_conn = dctmdb.local
database_owner = RepoTemplate
database_password_file = /app/dctm/product/16.4/dba/config/RepoTemplate/dbpasswd.txt
service = RepoTemplate
root_secure_validator = /app/dctm/product/16.4/dba/dm_check_password
install_owner = dmadmin
...

Don’t worry, we will roll back this change before docbase start ;)

3.c Execute the MigrationUtil script

[dmadmin@vmtestdctm01 ~]$ $DM_HOME/install/external_apps/MigrationUtil/MigrationUtil.sh

Welcome... Migration Utility invoked.
 
Skipping Docbase ID Changes...
Skipping Host Name Change...
Skipping Install Owner Change...
Skipping Server Name Change...

Changing Docbase Name...
Created new log File: /app/dctm/product/16.4/product/16.4/install/external_apps/MigrationUtil/MigrationUtilLogs/DocbaseNameChange.log
Finished changing Docbase Name...

Skipping Docker Seamless Upgrade scenario...
Migration Utility completed.

No Error encountred here but it doesn’t mean that everything is ok… Please check the log file:

[dmadmin@vmtestdctm01 ~]$ cat /app/dctm/product/16.4/product/16.4/install/external_apps/MigrationUtil/MigrationUtilLogs/DocbaseNameChange.log
Start: 2019-02-01 19:32:10.631
Changing Docbase Name
=====================

DocbaseName: RepoTemplate
New DocbaseName: repository1
Retrieving server.ini path for docbase: RepoTemplate
Found path: /app/dctm/product/16.4/dba/config/RepoTemplate/server.ini

Database Details:
Database Vendor:oracle
Database Name:dctmdb.local
Databse User:RepoTemplate
Database URL:jdbc:oracle:thin:@vmtestdctm01:1521/dctmdb.local
Successfully connected to database....

Processing Database Changes...
Created database backup File '/app/dctm/product/16.4/product/16.4/install/external_apps/MigrationUtil/MigrationUtilLogs/DocbaseNameChange_DatabaseRestore.sql'
select r_object_id,object_name from dm_sysobject_s where r_object_type = 'dm_docbase_config' and object_name = 'RepoTemplate'
update dm_sysobject_s set object_name = 'repository1' where r_object_id = '3c0f449880000103'
select r_object_id,docbase_name from dm_docbaseid_map_s where docbase_name = 'RepoTemplate'
update dm_docbaseid_map_s set docbase_name = 'repository1' where r_object_id = '440f449880000100'
select r_object_id,file_system_path from dm_location_s where file_system_path like '%RepoTemplate%'
update dm_location_s set file_system_path = '/app/dctm/product/16.4/data/repository1/content_storage_01' where r_object_id = '3a0f44988000013f'
...
update dm_job_s set target_server = 'repository1.RepoTemplate@vmtestdctm01' where r_object_id = '080f4498800003e0'
...
select i_stamp from dmi_vstamp_s where i_application = 'dmi_dd_attr_info'
...
Successfully updated database values...
...
Backed up '/app/dctm/product/16.4/dba/dm_start_RepoTemplate' to '/app/dctm/product/16.4/product/16.4/install/external_apps/MigrationUtil/MigrationUtilLogs/dm_start_RepoTemplate_docbase_RepoTemplate.backup'
Updated dm_startup script.
Renamed '/app/dctm/product/16.4/dba/dm_start_RepoTemplate' to '/app/dctm/product/16.4/dba/dm_start_repository1'
Backed up '/app/dctm/product/16.4/dba/dm_shutdown_RepoTemplate' to '/app/dctm/product/16.4/product/16.4/install/external_apps/MigrationUtil/MigrationUtilLogs/dm_shutdown_RepoTemplate_docbase_RepoTemplate.backup'
Updated dm_shutdown script.
Renamed '/app/dctm/product/16.4/dba/dm_shutdown_RepoTemplate' to '/app/dctm/product/16.4/dba/dm_shutdown_repository1'
WARNING...File /app/dctm/product/16.4/dba/config/RepoTemplate/rkm_config.ini doesn't exist. RKM is not configured
Finished processing File changes...

Processing Directory Changes...
Renamed '/app/dctm/product/16.4/data/RepoTemplate' to '/app/dctm/product/16.4/data/repository1'
Renamed '/app/dctm/product/16.4/dba/config/RepoTemplate' to '/app/dctm/product/16.4/dba/config/repository1'
Renamed '/app/dctm/product/16.4/dba/auth/RepoTemplate' to '/app/dctm/product/16.4/dba/auth/repository1'
Renamed '/app/dctm/product/16.4/share/temp/replicate/RepoTemplate' to '/app/dctm/product/16.4/share/temp/replicate/repository1'
Renamed '/app/dctm/product/16.4/share/temp/ldif/RepoTemplate' to '/app/dctm/product/16.4/share/temp/ldif/repository1'
Renamed '/app/dctm/product/16.4/server_uninstall/delete_db/RepoTemplate' to '/app/dctm/product/16.4/server_uninstall/delete_db/repository1'
Finished processing Directory Changes...
...
Processing Services File Changes...
Backed up '/etc/services' to '/app/dctm/product/16.4/product/16.4/install/external_apps/MigrationUtil/MigrationUtilLogs/services_docbase_RepoTemplate.backup'
ERROR...Couldn't update file: /etc/services (Permission denied)
ERROR...Please update services file '/etc/services' manually with root account
Finished changing docbase name 'RepoTemplate'

Finished changing docbase name....
End: 2019-02-01 19:32:23.791

Here it is a justified error… Let’s change the service name manually.

3.d Change the service
As root, change the service name:

[root@vmtestdctm01 ~]$ vi /etc/services
...
repository1				49402/tcp               # DCTM repository native connection
repository1_s       	49403/tcp               # DCTM repository secure connection

3.e Change back the Database name in the server.ini file

[dmadmin@vmtestdctm01 ~]$ vi $DOCUMENTUM/dba/config/repository1/server.ini
...
[SERVER_STARTUP]
docbase_id = 1000600
docbase_name = repository1
server_config_name = RepoTemplate
database_conn = DCTMDB
...

3.f Start the Docbroker and the Docbase

$DOCUMENTUM/dba/dm_launch_DocBroker
$DOCUMENTUM/dba/dm_start_repository1

3.g Check the docbase log

[dmadmin@vmtestdctm01 ~]$ tail -5 $DOCUMENTUM/dba/log/RepoTemplate.log
...
2019-02-01T19:43:15.677455	16563[16563]	0000000000000000	[DM_WORKFLOW_I_AGENT_START]info:  "Workflow agent master (pid : 16594, session 010f449880000007) is started sucessfully."
IsProcessAlive: Process ID 0 is not > 0
2019-02-01T19:43:15.677967	16563[16563]	0000000000000000	[DM_WORKFLOW_I_AGENT_START]info:  "Workflow agent worker (pid : 16595, session 010f44988000000a) is started sucessfully."
IsProcessAlive: Process ID 0 is not > 0
2019-02-01T19:43:16.680391	16563[16563]	0000000000000000	[DM_WORKFLOW_I_AGENT_START]info:  "Workflow agent worker (pid : 16606, session 010f44988000000b) is started sucessfully." 

You are saying the log name is still RepoTemplate.log ;) Yes! because in my case the docbase name and the server name were the same before I change the docbase name:

[dmadmin@vmtestdctm01 ~]$ vi $DOCUMENTUM/dba/config/repository1/server.ini
[SERVER_STARTUP]
docbase_id = 1000600
docbase_name = repository1
server_config_name = RepoTemplate
database_conn = DCTMDB
database_owner = RepoTemplate
database_password_file = /app/dctm/product/16.4/dba/config/repository1/dbpasswd.txt
service = repository1
root_secure_validator = /app/dctm/product/16.4/dba/dm_check_password
install_owner = dmadmin

Be patient, in the next episode we will see how we can change the server name :)

4. After the migration (optional)

Get docbase map from the docbroker:

[dmadmin@vmtestdctm01 ~]$ dmqdocbroker -t vmtestdctm01 -c getdocbasemap
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0000.0185
Targeting port 1489
**************************************************
**     D O C B R O K E R    I N F O             **
**************************************************
Docbroker host            : vmtestdctm01
Docbroker port            : 1490
Docbroker network address : INET_ADDR: 02 5d2 c0a87a01 vmtestdctm01 192.168.122.1
Docbroker version         : 16.4.0000.0248  Linux64
**************************************************
**     D O C B A S E   I N F O                  **
**************************************************
--------------------------------------------
Docbase name        : repository1
Docbase id          : 1000600
Docbase description : Template Repository
Govern docbase      : 
Federation name     : 
Server version      : 16.4.0000.0248  Linux64.Oracle
Docbase Roles       : Global Registry
...

it’s not very nice to keep the old description of the docbase… Use below idql request to change it:

Update dm_docbase_config object set title='Renamed Repository' where object_name='repository1';

Check after change:

[dmadmin@vmtestdctm01 ~]$ dmqdocbroker -t vmtestdctm01 -c getdocbasemap
...
Docbase name        : repository1
Docbase id          : 1000600
Docbase description : Renamed Repository
...

Check the document created before the migration:
docbase id : 090f449880001125

API> dump,c,090f449880001125
...
USER ATTRIBUTES

  object_name                     : DCTMChangeDocbaseExample.docx
  title                           : DCTM Change Docbase Document Example
  subject                         : DCTM Change Docbase Document Example
...
5. Conclusion

Well, the tool works, but as you saw we need a workaround to make the change. Which is not great, hope that it will be fixed in the future versions.
In the next episode I will change the server config name, see you there ;)

Cet article Documentum – MigrationUtil – 2 – Change Docbase Name est apparu en premier sur Blog dbi services.

Documentum – Process Builder Installation Fails

Mon, 2019-02-04 01:25

A couple of weeks ago, at a customer I received an incident from the application team regarding an error occurred when installing Process Builder. The error message was:
The Process Engine license has not been enabled or is invalid in the ‘RADEV’ repository.
The Process Engine license must be enabled to use the Process Builder.
Please see your system administrator
.”

The error appears when selecting the repository:

Before I investigate on this incident I had to learn more about the Process Builder as it is usually managed by the application team.
In fact, The Documentum Process Builder is a software for creating a business process templates, used to formalize the steps required to complete a business process such as an approval process, so the goal is to extend the basic functionality of Documentum Workflow Manager.
It is a client application that can be installed on any computer, but before installing Process Builder you need to prepare your content server and repository by installing the Process Engine, because the CS handle the check in, check out, versioning, archiving, and all processes created are saved in the repository… Hummm, so maybe the issue is that my content server or repository is not well configured?

To be clean from the client side, I asked the application team to confirm the docbroker and port configured in C:\Documentum\Config\dfc.properties.

From the Content Server side, we used Process Engine installer, which install the Process Engine on all repositories that are served by the Content Server, deploy the bpm.ear file on Java Method Server and install the DAR files on each repository.

So let’s check the installation:

1. The BPM url http://Server:9080/bpm/modules.jsp is reachable:

2. No error in the bpm log file $JBOSS_HOME/server/DctmServer_MethodServer/logs/bpm-runtime.log.

3. BPM and XCP DARs are correctly installed in the repository:

select r_object_id, object_name, r_creation_date from dmc_dar where object_name in ('BPM', 'xcp');
080f42a480026d98 BPM 8/29/2018 10:43:35
080f42a48002697d xcp 8/29/2018 10:42:11

4. The Process Engine module is missed in the docbase configuration:

	API> retrieve,c,dm_docbase_config
	...
	3c0f42a480000103
	API> dump,c,l
	...
	USER ATTRIBUTES

		object_name                : RADEV
		title                      : RADEV Repository
	...
	SYSTEM ATTRIBUTES

		r_object_id                : 3c0f42a480000103
		r_object_type              : dm_docbase_config
		...
		r_module_name           [0]: Snaplock
								[1]: Archive Service
								[2]: CASCADING_AUTO_DELEGATE
								[3]: MAX_AUTO_DELEGATE
								[4]: Collaboration
		r_module_mode           [0]: 0
								[1]: 0
								[2]: 0
								[3]: 1
								[4]: 3

We know the root cause of this incident now :D
To resolve the issue, add the Process Engine module to the docbase config:

API>fetch,c,docbaseconfig
API>append,c,l,r_module_name
Process Engine
API>append,c,l,r_module_mode
3
API>save,c,l

Check after update:

	API> retrieve,c,dm_docbase_config
	...
	3c0f42a480000103
	API> dump,c,l
	...
	USER ATTRIBUTES

		object_name                : RADEV
		title                      : RADEV Repository
	...
	SYSTEM ATTRIBUTES

		r_object_id                : 3c0f42a480000103
		r_object_type              : dm_docbase_config
		...
		r_module_name           [0]: Snaplock
								[1]: Archive Service
								[2]: CASCADING_AUTO_DELEGATE
								[3]: MAX_AUTO_DELEGATE
								[4]: Collaboration
								[5]: Process Engine
		r_module_mode           [0]: 0
								[1]: 0
								[2]: 0
								[3]: 1
								[4]: 3
								[5]: 3
		...

Then I asked the application team to retry the installation, the issue has been resolved.

No manual docbase configuration required in the Process Engine Installation Guide. I guess the Process Engine Installer should do it automatically.
I will install a new environment in the next few days/weeks, and keep you informed if any news ;)

Cet article Documentum – Process Builder Installation Fails est apparu en premier sur Blog dbi services.

Italian Oracle User Group Tech Days 2019

Wed, 2019-01-30 15:28

The Italian Oracle User Group (ITOUG) is an independent group of Oracle enthusiasts and experts which work together as volunteers to promote technical knowledge sharing in Italy.

Here the ITOUG Board members:
ITOUG Board

This year ITOUG Tech Days take place in Milan on 30th January and in Rome on 1st February. Two different streams for each event:
– Database
– Analytics and Big Data
Today I participated to the event in Milan.
But before talking about that, ITOUG Tech Days started with the speakers’ dinner on Tuesday evening in Milan: aperitif, good Italian food and very nice people.
ITOUG Speakers Dinner

On Wednesday morning, we all met at Oracle Italia in Cinisello Balsamo (MI):
ITOUG Milan

After the welcome message by some ITOUG Board members:
ITOUG Welcome  Msg
sessions finally started. I attended the following ones of the Database stream:

- “Instrumentation 2.0: Performance is a feature” by Lasse Jenssen from Norway
Lasse
We have to understand what’s going on into a system, performance is a feature and we need instrumentation. Oracle End-to-End metrics, new tags in 12c, v$sql_monitor, dbms_monitor… And work in progress for instrumentation 3.0 with ElasticSearch, LogStash and Kibana.

- “Hash Join Memory Optimization” by one of the ITOUG Board member, Donatello Settembrino
Donatello
How Hash Join works and how to improve PGA consumption performances. Examples of partitioning (to exclude useless data), (full) Partitioning Wise Join (to use less resources) and parallelism. Differences between Right-Deep Join Trees and Left-Deep Join Trees, and concept of Bushy Join Trees in 12R2.

- “Design your databases using Oracle SQL Developer Data Modeler” by Heli Helskyaho from Finland
Heli
Oracle SQL Developer Data Modeler with SQL Developer or in a standalone mode to design your database. It uses Subversion integrated in the tool for the version control and management. It also has support for other databases, MySQL for example. And it’s free.

- “Bringing your Oracle Database alive with APEX” by Dimitri Gielis from Belgium
Dimitri
Two things to learn from this session:
1) Use Oracle Application Express to design and develop a web application.
2) And Quick SQL to create database objects and build a data model
And all that in a very fast way.

- “Blockchain beyond the Hype” by one of the ITOUG Board member, Paolo Gaggia
Paolo
The evolution of blockchain from bitcoin to new Enterprise-Oriented implementation and some interesting use cases.

Every session was very interesting: thanks to the great and amazing speakers (experts working on Oracle technologies, Oracle ACE, Oracle ACE Director…) for their sharing.

Follow the Italian Oracle User Group on Twitter (IT_OUG) and see you at the next ITOUG event!

Cet article Italian Oracle User Group Tech Days 2019 est apparu en premier sur Blog dbi services.

Recover a corrupted datafile in your DataGuard environment 11G/12C.

Wed, 2019-01-30 07:30

On a DG environment, a datafile needs to be recovered on the STANDBY site, in two situations : when is deleted or corrupted.
Below, I will explain  how to recover a corrupted datafile, in order to be able to repair the Standby database, without to be necessary to restore entire database.

Initial situation :

DGMGRL> connect /
Connected to "PROD_SITE2"
Connected as SYSDG.
DGMGRL> show configuration;

Configuration - CONFIG1

  Protection Mode: MaxPerformance
  Members:
  PROD_SITE2 - Primary database
    PROD_SITE1 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 15 seconds ago)

On this  environment, we have a table called EMP with 100 rows, owned by the user TEST (default tablespace TEST).

SQL> set linesize 220;
SQL> select username,default_tablespace from dba_users where username='TEST';

USERNAME     DEFAULT_TABLESPACE
-------------------------------
TEST         TEST

SQL> select count(*) from test.emp;

  COUNT(*)
----------
       100

By mistake, the datafile on Standby site, get corrupted.

SQL> alter database open read only;
alter database open read only
*
ORA-01578: ORACLE data block corrupted (file # 5, block # 3)
ORA-01110: data file 5: '/u02/oradata/PROD/test.dbf'

As is corrupted, the apply of the redo log is stopped until will be repaired. So the new inserts into the EMP table will not be applied:

SQL> begin
  2  for i in 101..150 loop
  3  insert into test.emp values (i);
  4  end loop;
  5  END;
  6  /

PL/SQL procedure successfully completed.

SQL> COMMIT;

Commit complete.

SQL> select count(*) from test.emp;

  COUNT(*)
----------
       150

SQL> select name,db_unique_name,database_role from v$database;

NAME      DB_UNIQUE_NAME                 DATABASE_ROLE
--------- ------------------------------ ----------------
PROD      PROD_SITE2                     PRIMARY

To repair it, we will use PRIMARY site to backup controlfile and the related datafile.

oracle@dbisrv03:/home/oracle/ [PROD] rman target /

connected to target database: PROD (DBID=410572245)

RMAN> backup current controlfile for standby format '/u02/backupctrl.ctl';


RMAN> backup datafile 5 format '/u02/testbkp.dbf';

Starting backup at 29-JAN-2019 10:59:37
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=276 device type=DISK

We will transfer the backuppieces on the STANDBY server, using scp:

 scp backupctrl.ctl oracle@dbisrv04:/u02/
 scp testbkp.dbf oracle@dbisrv04:/u02/

Now, will start the restore/recover on the STANDBY server :

SQL> startup nomount
ORACLE instance started.

Total System Global Area 1895825408 bytes
Fixed Size                  8622048 bytes
Variable Size             570425376 bytes
Database Buffers         1308622848 bytes
Redo Buffers                8155136 bytes
SQL> exit
oracle@dbisrv04:/u02/oradata/PROD/ [PROD] rman target /


Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PROD (not mounted)

RMAN> restore controlfile from '/u02/backupctrl.ctl'; 
.........
RMAN> alter database mount;


RMAN> catalog start with '/u02/testbkp.dbf';

searching for all files that match the pattern /u02/testbkp.dbf

List of Files Unknown to the Database
=====================================
File Name: /u02/testbkp.dbf

Do you really want to catalog the above files (enter YES or NO)? YES
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u02/testbkp.dbf




RMAN> restore datafile 5;

Starting restore at 29-JAN-2019 11:06:31
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00005 to /u02/oradata/PROD/test.dbf
channel ORA_DISK_1: reading from backup piece /u02/testbkp.dbf
channel ORA_DISK_1: piece handle=/u02/testbkp.dbf tag=TAG20190129T105938
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 29-JAN-2019 11:06:33

RMAN> exit

Now, we will start to apply the logs again and try to resync the STANDBY database.
!!! Here you need to stop recovery process if you do not have a dataguard active license.

SQL> recover managed standby database using current logfile disconnect from session;
Media recovery complete.
SQL> recover managed standby database cancel;
SQL> alter database open read only;

Database altered.

SQL> select count(*) from test.emp;

  COUNT(*)
----------
       150

Now, we can see the last insert activity on the PRIMARY site that is available on the STANDBY site.

On 12c environment, with an existing container PDB1, the things are easier, with the feature RESTORE/RECOVER from service :

connect on the standby site
rman target /
restore tablespace PDB1:USERS from service PROD_PRIMARY;
recover tablespace PDB1:USERS;

Cet article Recover a corrupted datafile in your DataGuard environment 11G/12C. est apparu en premier sur Blog dbi services.

PostgreSQL 12: Detach postmaster process from pg_ctl’s session at server startup

Wed, 2019-01-30 00:37

Recently a commit landed in the PostgreSQL development tree that made me aware of something I did not know so far: When you start PostgreSQL with a script using pg_ctl and that script was cancelled the postmaster was killed as well before PostgreSQL 12. Sounds weird? Lets do a little demo.

The little demo script is quite simple: Print the version of pg_ctl, startup PostgreSQL and then sleep for 10 seconds:

postgres@pgbox:/home/postgres/ [PG10] cat start.sh 
#!/bin/bash
/u01/app/postgres/product/10/db_3/bin/pg_ctl --version
/u01/app/postgres/product/10/db_3/bin/pg_ctl -D /u02/pgdata/10/PG103 start
sleep 10

When you execute that against PostgreSQL before version 12 and then CRTL-C the script while it is sleeping the postmaster will be killed as well:

postgres@pgbox:/home/postgres/ [PG10] ./start.sh 
pg_ctl (PostgreSQL) 10.5
waiting for server to start....2019-01-25 13:00:07.258 CET - 1 - 6853 -  - @ LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-01-25 13:00:07.259 CET - 2 - 6853 -  - @ LOG:  listening on IPv6 address "::", port 5432
2019-01-25 13:00:07.263 CET - 3 - 6853 -  - @ LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-01-25 13:00:07.282 CET - 4 - 6853 -  - @ LOG:  redirecting log output to logging collector process
2019-01-25 13:00:07.282 CET - 5 - 6853 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
^C
postgres@pgbox:/home/postgres/ [PG10] ps -ef | grep postgres

Starting with PostgreSQL 12 this will not happen anymore:

postgres@pgbox:/home/postgres/ [PG10] cat start.sh 
#!/bin/bash
/u01/app/postgres/product/DEV/db_1/bin/pg_ctl --version
/u01/app/postgres/product/DEV/db_1/bin/pg_ctl -D /u02/pgdata/DEV start
sleep 10
postgres@pgbox:/home/postgres/ [PGDEV] ./start.sh 
pg_ctl (PostgreSQL) 12devel
waiting for server to start....2019-01-25 13:02:51.690 CET - 1 - 9408 -  - @ LOG:  listening on IPv6 address "::1", port 5433
2019-01-25 13:02:51.690 CET - 2 - 9408 -  - @ LOG:  listening on IPv4 address "127.0.0.1", port 5433
2019-01-25 13:02:51.700 CET - 3 - 9408 -  - @ LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2019-01-25 13:02:51.734 CET - 4 - 9408 -  - @ LOG:  redirecting log output to logging collector process
2019-01-25 13:02:51.734 CET - 5 - 9408 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
^C
postgres@pgbox:/home/postgres/ [PGDEV] ps -ef | grep postgres
postgres  9408     1  0 13:02 ?        00:00:00 /u01/app/postgres/product/DEV/db_1/bin/postgres -D /u02/pgdata/DEV
postgres  9409  9408  0 13:02 ?        00:00:00 postgres: logger   
postgres  9411  9408  0 13:02 ?        00:00:00 postgres: checkpointer   
postgres  9412  9408  0 13:02 ?        00:00:00 postgres: background writer   
postgres  9413  9408  0 13:02 ?        00:00:00 postgres: walwriter   
postgres  9414  9408  0 13:02 ?        00:00:00 postgres: autovacuum launcher   
postgres  9415  9408  0 13:02 ?        00:00:00 postgres: stats collector   
postgres  9416  9408  0 13:02 ?        00:00:00 postgres: logical replication launcher   
postgres  9422 29009  0 13:03 pts/2    00:00:00 ps -ef
postgres  9423 29009  0 13:03 pts/2    00:00:00 grep --color=auto postgres
root     29005  2910  0 12:53 ?        00:00:00 sshd: postgres [priv]
postgres 29008 29005  0 12:53 ?        00:00:01 sshd: postgres@pts/2
postgres 29009 29008  0 12:53 pts/2    00:00:00 -bash
root     29463  2910  0 12:54 ?        00:00:00 sshd: postgres [priv]
postgres 29466 29463  0 12:54 ?        00:00:00 sshd: postgres@pts/1
postgres 29467 29466  0 12:54 pts/1    00:00:00 -bash

Learned something new again.

Cet article PostgreSQL 12: Detach postmaster process from pg_ctl’s session at server startup est apparu en premier sur Blog dbi services.

How To Deploy Office Web Apps Server 2013

Tue, 2019-01-29 10:30
The 4 Steps Of Office Web Apps Server 2013 Installation

Office Web Apps provides browser-based versions of Excel, One Note, Word and PowerPoint. It also helps users who access files through SharePoint 2013.

The objective of this topic is to define the steps to install office web apps 2013, create the farm and the binding so that it can be used within SharePoint 2013 test environment.

For this example, we have the following systems in place:

  • Windows Server 2012 r2
  • SharePoint Server 2013
1) Install Server roles, features & Role services

Server roles:

  • Web server

Features:

  • Ink and Handwriting services

Role services:

  • Dynamic Content Compression
  • Windows Authentication
  • .Net Extensibility 4.5
  • ASP.Net 4.5
  • Server Side Includes

Restart the server.

Note that if your installation is done on Windows Server 2016, the feature “Ink and Handwriting services” is now a default part of the server and no longer requires a separate package.

2) Install Office Web Apps

Launch the setup from the DVD file and wait until the installation is finished.

3) Create Office Web Apps Farm

1) Specify the internal URL for the server name
2) Use administrative privileges
3) run the Power Shell command “New-OfficeWebAppsFarm -InternalURL http://servername -AllowHttp -EditingEnabled”

This command allows HTTP as it is internal and the function enable editing to allow users to edit documents.

To verify that the farm is successfully created, type in the browser the URL “http://servername/hosting/delivery”.

4) Bind Office Web Apps and SharePoint

The communication between both sides still need to be done through HTTP protocol.

1) Use administrative privileges
2) Switch over SharePoint management shell
3) Run the command “New-SPWOPIBinding -ServerName servername -AllowHTTP”

The command should return that HTTP protocol is used internally and a list of bindings.

Check SharePoint default internal zone:

Get-SPWOPIZone

If it is HTTPS, change it into HTTP:

Set-SPWOPIZone -Zone internal-http

Set the authentication OAuth over HTTP to true:

  • $config = (Get-SPSecurityTokenServiceConfig)
  • $config.AllowOAuthOverHttp = $true
  • $config.update()

SharePoint can now use Office Web Apps.

To avoid errors, few points need to be verify before testing Office Web apps within SharePoint:

a) Check SharePoint authentication mode (claims-based and not classic) using PowerShell:

  • $WebApp=”http://webapp/”
  • (Get-SPWebApplication $WebAppURL).UseClaimsAuthentication

b) Check that the login account is not a system account but a testing account.

c) Enabling editing Office Web Apps, if it is false, set it to true using the PowerShell command:

  • Set-OfficeWebAppsFarm -EditingEnabled:$true

d) Check that Office Web Apps has enough memory

Need help, more details can be found on here.

Cet article How To Deploy Office Web Apps Server 2013 est apparu en premier sur Blog dbi services.

Documentum – MigrationUtil – 1 – Change Docbase ID

Mon, 2019-01-28 02:04

This blog is the first one of a series that I will publish in the next few days/weeks regarding how to change a Docbase ID, Docbase name, aso in Documentum CS.
So, let’s dig in with the first one: Docbase ID. I did it on Documentum CS 16.4 with Oracle database on a freshly installed docbase.

We will be interested by the docbase repo1, to change the docbase ID from 101066 (18aca) to 101077 (18ad5).

1. Migration tool overview and preparation

The tool we will use here is MigrationUtil, and the concerned folder is:

[dmadmin@vmtestdctm01 ~]$ ls -rtl $DM_HOME/install/external_apps/MigrationUtil
total 108
-rwxr-xr-x 1 dmadmin dmadmin 99513 Oct 28 23:55 MigrationUtil.jar
-rwxr-xr-x 1 dmadmin dmadmin   156 Jan 19 11:09 MigrationUtil.sh
-rwxr-xr-x 1 dmadmin dmadmin  2033 Jan 19 11:15 config.xml

The default content of MigrationUtil.sh:

[dmadmin@vmtestdctm01 ~]$ cat $DM_HOME/install/external_apps/MigrationUtil/MigrationUtil.sh
#!/bin/sh
CLASSPATH=${CLASSPATH}:MigrationUtil.jar
export CLASSPATH
java -cp "${CLASSPATH}" MigrationUtil

Update it if you need to overload the CLASSPATH only during migration. It was my case, I had to add the oracle driver path to the $CLASSPATH, because I received the below error:

...
ERROR...oracle.jdbc.driver.OracleDriver
ERROR...Database connection failed.
Skipping changes for docbase: repo1

To make the blog more readable, I will not show you all the contents of config.xml, below is the updated version to change the Docbase ID:

...
<properties>
<comment>Database connection details</comment>
<entry key="dbms">oracle</entry> <!-- This would be either sqlserver, oracle, db2 or postgres -->
<entry key="tgt_database_server">vmtestdctm01</entry> <!-- Database Server host or IP -->
<entry key="port_number">1521</entry> <!-- Database port number -->
<entry key="InstallOwnerPassword">install164</entry>
<entry key="isRCS">no</entry>    <!-- set it to yes, when running the utility on secondary CS -->

<!-- <comment>List of docbases in the machine</comment> -->
<entry key="DocbaseName.1">repo1</entry>

<!-- <comment>docbase owner password</comment> -->
<entry key="DocbasePassword.1">install164</entry>

<entry key="ChangeDocbaseID">yes</entry> <!-- To change docbase ID or not -->
<entry key="Docbase_name">repo1</entry> <!-- has to match with DocbaseName.1 -->
<entry key="NewDocbaseID">101077</entry> <!-- New docbase ID -->
...

Put all other entry to no.
The tool will use above information, and load more from the server.ini file.

Before you start the migration script, you have to adapt the maximum open cursors in the database. In my case, with a freshly installed docbase, I had to set open_cursors value to 1000 (instead of 300):

alter system set open_cursors = 1000

See with your DB Administrator before any change.

Otherwise, I got below error:

...
Changing Docbase ID...
Database owner password is read from config.xml
java.sql.SQLException: ORA-01000: maximum open cursors exceeded
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399)
	at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1059)
	at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:522)
	at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
	at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:587)
	at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:225)
	at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:53)
	at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:943)
	at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1150)
	at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4798)
	at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:4875)
	at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:1361)
	at SQLUtilHelper.setSQL(SQLUtilHelper.java:129)
	at SQLUtilHelper.processColumns(SQLUtilHelper.java:543)
	at SQLUtilHelper.processTables(SQLUtilHelper.java:478)
	at SQLUtilHelper.updateDocbaseId(SQLUtilHelper.java:333)
	at DocbaseIDUtil.(DocbaseIDUtil.java:61)
	at MigrationUtil.main(MigrationUtil.java:25)
...
2. Before the migration (optional)

Get docbase map from the docbroker:

[dmadmin@vmtestdctm01 ~]$ dmqdocbroker -t vmtestdctm01 -c getdocbasemap
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0000.0185
Targeting port 1489
**************************************************
**     D O C B R O K E R    I N F O             **
**************************************************
Docbroker host            : vmtestdctm01
Docbroker port            : 1490
Docbroker network address : INET_ADDR: 02 5d2 c0a87a01 vmtestdctm01 192.168.122.1
Docbroker version         : 16.4.0000.0248  Linux64
**************************************************
**     D O C B A S E   I N F O                  **
**************************************************
--------------------------------------------
Docbase name        : repo1
Docbase id          : 101066
Docbase description : repo1 repository
...

Create a document in the docbase
Create an empty file

touch /home/dmadmin/DCTMChangeDocbaseExample.txt

Create document in the repository using idql

create dm_document object
SET title = 'DCTM Change Docbase Document Example',
SET subject = 'DCTM Change Docbase Document Example',
set object_name = 'DCTMChangeDocbaseExample.txt',
SETFILE '/home/dmadmin/DCTMChangeDocbaseExample.txt' with CONTENT_FORMAT= 'msww';

Result:

object_created  
----------------
09018aca8000111b
(1 row affected)

note the r_object_id

3. Execute the migration

Before you execute the migration you have to stop the docbase and the docbroker.

$DOCUMENTUM/dba/dm_shutdown_repo1
$DOCUMENTUM/dba/dm_stop_DocBroker

Now, you can execute the migration script:

[dmadmin@vmtestdctm01 ~]$ $DM_HOME/install/external_apps/MigrationUtil/MigrationUtil.sh

Welcome... Migration Utility invoked.
 
Created log File: /app/dctm/product/16.4/product/16.4/install/external_apps/MigrationUtil/MigrationUtilLogs/DocbaseIdChange.log
Changing Docbase ID...
Database owner password is read from config.xml
Finished changing Docbase ID...

Skipping Host Name Change...
Skipping Install Owner Change...
Skipping Server Name Change...
Skipping Docbase Name Change...
Skipping Docker Seamless Upgrade scenario...

Migration Utility completed.

No Error, sounds good ;) All changes have been recorded in the log file:

[dmadmin@vmtestdctm01 ~]$ cat /app/dctm/product/16.4/product/16.4/install/external_apps/MigrationUtil/MigrationUtilLogs/DocbaseIdChange.log
Reading config.xml from path: config.xmlReading server.ini parameters

Retrieving server.ini path for docbase: repo1
Found path: /app/dctm/product/16.4/dba/config/repo1/server.ini
Set the following properties:

Docbase Name:repo1
Docbase ID:101066
New Docbase ID:101077
DBMS: oracle
DatabaseName: DCTMDB
SchemaOwner: repo1
ServerName: vmtestdctm01
PortNumber: 1521
DatabaseOwner: repo1
-------- Oracle JDBC Connection Testing ------
jdbc:oracle:thin:@vmtestdctm01:1521:DCTMDB
Connected to database
Utility is going to modify Objects with new docbase ID
Sun Jan 27 19:08:58 CET 2019
-----------------------------------------------------------
Processing tables containing r_object_id column
-----------------------------------------------------------
-------- Oracle JDBC Connection Testing ------
jdbc:oracle:thin:@vmtestdctm01:1521:DCTMDB
Connected to database
...
...
-----------------------------------------------------------
Update the object IDs of the Table: DMC_ACT_GROUP_INSTANCE_R with new docbase ID:18ad5
-----------------------------------------------------------
Processing objectID columns
-----------------------------------------------------------
Getting all ID columns from database
-----------------------------------------------------------

Processing ID columns in each documentum table

Column Name: R_OBJECT_ID
Update the ObjectId columns of the Table: with new docbase ID

Processing ID columns in each documentum table

Column Name: R_OBJECT_ID
Update the ObjectId columns of the Table: with new docbase ID
...
...
-----------------------------------------------------------
Update the object IDs of the Table: DM_XML_ZONE_S with new docbase ID:18ad5
-----------------------------------------------------------
Processing objectID columns
-----------------------------------------------------------
Getting all ID columns from database
-----------------------------------------------------------
Processing ID columns in each documentum table
Column Name: R_OBJECT_ID
Update the ObjectId columns of the Table: with new docbase ID
-----------------------------------------------------------
Updating r_docbase_id of dm_docbase_config_s and dm_docbaseid_map_s...
update dm_docbase_config_s set r_docbase_id = 101077 where r_docbase_id = 101066
update dm_docbaseid_map_s set r_docbase_id = 101077 where r_docbase_id = 101066
Finished updating database values...
-----------------------------------------------------------
-----------------------------------------------------------
Updating the new DocbaseID value in dmi_vstamp_s table
...
...
Updating Data folder...
select file_system_path from dm_location_s where r_object_id in (select r_object_id from dm_sysobject_s where r_object_type = 'dm_location' and object_name in (select root from dm_filestore_s))
Renamed '/app/dctm/product/16.4/data/repo1/replica_content_storage_01/00018aca' to '/app/dctm/product/16.4/data/repo1/replica_content_storage_01/00018ad5
Renamed '/app/dctm/product/16.4/data/repo1/replicate_temp_store/00018aca' to '/app/dctm/product/16.4/data/repo1/replicate_temp_store/00018ad5
Renamed '/app/dctm/product/16.4/data/repo1/streaming_storage_01/00018aca' to '/app/dctm/product/16.4/data/repo1/streaming_storage_01/00018ad5
Renamed '/app/dctm/product/16.4/data/repo1/content_storage_01/00018aca' to '/app/dctm/product/16.4/data/repo1/content_storage_01/00018ad5
Renamed '/app/dctm/product/16.4/data/repo1/thumbnail_storage_01/00018aca' to '/app/dctm/product/16.4/data/repo1/thumbnail_storage_01/00018ad5
select file_system_path from dm_location_s where r_object_id in (select r_object_id from dm_sysobject_s where r_object_type = 'dm_location' and object_name in (select log_location from dm_server_config_s))
Renamed '/app/dctm/product/16.4/dba/log/00018aca' to '/app/dctm/product/16.4/dba/log/00018ad5
select r_object_id from dm_ldap_config_s
Finished updating folders...
-----------------------------------------------------------
-----------------------------------------------------------
Updating the server.ini with new docbase ID
-----------------------------------------------------------
Retrieving server.ini path for docbase: repo1
Found path: /app/dctm/product/16.4/dba/config/repo1/server.ini
Backed up '/app/dctm/product/16.4/dba/config/repo1/server.ini' to '/app/dctm/product/16.4/dba/config/repo1/server.ini_docbaseid_backup'
Updated server.ini file:/app/dctm/product/16.4/dba/config/repo1/server.ini
Docbase ID Migration Utility completed!!!
Sun Jan 27 19:09:52 CET 2019

Start the Docbroker and the Docbase:

$DOCUMENTUM/dba/dm_launch_DocBroker
$DOCUMENTUM/dba/dm_start_repo1
4. After the migration (optional)

Get docbase map from the docbroker:

[dmadmin@vmtestdctm01 ~]$ dmqdocbroker -t vmtestdctm01 -c getdocbasemap
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0000.0185
Targeting port 1489
**************************************************
**     D O C B R O K E R    I N F O             **
**************************************************
Docbroker host            : vmtestdctm01
Docbroker port            : 1490
Docbroker network address : INET_ADDR: 02 5d2 c0a87a01 vmtestdctm01 192.168.122.1
Docbroker version         : 16.4.0000.0248  Linux64
**************************************************
**     D O C B A S E   I N F O                  **
**************************************************
--------------------------------------------
Docbase name        : repo1
Docbase id          : 101077
Docbase description : repo1 repository
...

Check the document created before the migration:
Adapt the r_object_id with the new docbase id : 09018ad58000111b

API> dump,c,09018ad58000111b    
...
USER ATTRIBUTES
  object_name                     : DCTMChangeDocbaseExample.txt
  title                           : DCTM Change Docbase Document Example
  subject                         : DCTM Change Docbase Document Example
...
  r_object_id                     : 09018ad58000111b
...
  i_folder_id                  [0]: 0c018ad580000105
  i_contents_id                   : 06018ad58000050c
  i_cabinet_id                    : 0c018ad580000105
  i_antecedent_id                 : 0000000000000000
  i_chronicle_id                  : 09018ad58000111b
5. Conclusion

After a lot of tests on my VMs, I can say that changing docbase id is reliable on a freshly installed docbase. On the other hand, each time I tried it on a “used” Docbase, I got errors like:

Changing Docbase ID...
Database owner password is read from config.xml
java.sql.SQLIntegrityConstraintViolationException: ORA-00001: unique constraint (GREPO5.D_1F00272480000139) violated

	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399)
	at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1059)
	at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:522)
	at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
	at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:587)
	at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:225)
	at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:53)
	at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:943)
	at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1150)
	at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4798)
	at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:4875)
	at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:1361)
	at SQLUtilHelper.setSQL(SQLUtilHelper.java:129)
	at SQLUtilHelper.processColumns(SQLUtilHelper.java:543)
	at SQLUtilHelper.processTables(SQLUtilHelper.java:478)
	at SQLUtilHelper.updateDocbaseId(SQLUtilHelper.java:333)
	at DocbaseIDUtil.(DocbaseIDUtil.java:61)
	at MigrationUtil.main(MigrationUtil.java:25)

I didn’t investigate enough on above error, it deserves more time but it wasn’t my priority. Anyway, the tool made a correct rollback.

Now, it is your turn to practice, don’t hesitate to comment this blog to share your own experience and opinion :)
In the next blog, I will try to change the docbase name.

Cet article Documentum – MigrationUtil – 1 – Change Docbase ID est apparu en premier sur Blog dbi services.

PostgreSQL: When wal_level to logical

Sun, 2019-01-27 04:50

wal_level determines the quantity of information written to the WAL. With PostgreSQL 11 the parameter wal_level can have 3 values:
-minimal : only information needed to recover from a crash or an immediate shutdown
-replica : enough data to support WAL archiving and replication
-logical : enough information to support logical decoding.

If we want to use logical decoding, wal_level should be set to logical. Logical decoding is the process of extracting all persistent changes to a database’s tables into a coherent, easy to understand format which can be interpreted without detailed knowledge of the database’s internal state.
In PostgreSQL, logical decoding is implemented by decoding the contents of the write-ahead log, which describe changes on a storage level, into an application-specific form such as a stream of tuples or SQL statements.

In this blog we are going to see some easy examples which will allow us to better understand this concept.

Before we can use logical decoding the parameter wal_level should be set to logical. As we will create replications slots, the parameter max_replication_slots should also be at least 1.
Below our values for these parameters

postgres=# show max_replication_slots ;
 max_replication_slots
-----------------------
 10
(1 row)

postgres=# show wal_level ;
 wal_level
-----------
 logical
(1 row)

postgres=#

First let’s create a slot. For this we will use the function pg_create_logical_replication_slot()

postgres=# SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding');
 slot_name |    lsn
-----------+-----------
 my_slot   | 0/702B658
(1 row)

postgres=#

To inspect the changes at WAL level we can use the function pg_logical_slot_get_changes(). So let’s call this function

postgres=# SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL);
 lsn | xid | data
-----+-----+------
(0 rows)

postgres=#

This above output is expected because there is no change yet in our database
Now let’s do some insert in the database and let’s call again the function pg_logical_slot_get_changes()

postgres=# begin;
BEGIN
postgres=# insert into mytab values (1,'t1');
INSERT 0 1
postgres=# insert into mytab values (2,'t2');
INSERT 0 1
postgres=# commit;
COMMIT

postgres=# SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL);
    lsn    | xid |                                  data
-----------+-----+------------------------------------------------------------------------
 0/703F538 | 582 | BEGIN 582
 0/703F538 | 582 | table public.mytab: INSERT: id[integer]:1 name[character varying]:'t1'
 0/703F5B0 | 582 | table public.mytab: INSERT: id[integer]:2 name[character varying]:'t2'
 0/703F620 | 582 | COMMIT 582
(4 rows)

postgres=#

As expected we can see changes that were made.
Now what happen if we call again the same function?

postgres=# SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL);
 lsn | xid | data
-----+-----+------
(0 rows)

postgres=#

The changes are no longer reported. It’s normal because with the function pg_logical_slot_get_changes(), changes are consumed (will not be returned again). If we want the changes not to be consumed we can use the function pg_logical_slot_peek_changes(). This function behaves like the first one, except that changes are not consumed; that is, they will be returned again on future calls.

postgres=# begin;
BEGIN
postgres=# insert into mytab values (3,'t3');
INSERT 0 1
postgres=# insert into mytab values (4,'t4');
INSERT 0 1
postgres=# commit;
COMMIT                   
postgres=# delete from mytab where id=1;
DELETE 1

postgres=# SELECT * FROM pg_logical_slot_peek_changes('my_slot', NULL, NULL);
    lsn    | xid |                                  data
-----------+-----+------------------------------------------------------------------------
 0/703F738 | 583 | BEGIN 583
 0/703F738 | 583 | table public.mytab: INSERT: id[integer]:3 name[character varying]:'t3'
 0/703F838 | 583 | table public.mytab: INSERT: id[integer]:4 name[character varying]:'t4'
 0/703F8A8 | 583 | COMMIT 583
 0/703F8E0 | 584 | BEGIN 584
 0/703F8E0 | 584 | table public.mytab: DELETE: (no-tuple-data)
 0/703F948 | 584 | COMMIT 584
(7 rows)

postgres=# SELECT * FROM pg_logical_slot_peek_changes('my_slot', NULL, NULL);
    lsn    | xid |                                  data
-----------+-----+------------------------------------------------------------------------
 0/703F738 | 583 | BEGIN 583
 0/703F738 | 583 | table public.mytab: INSERT: id[integer]:3 name[character varying]:'t3'
 0/703F838 | 583 | table public.mytab: INSERT: id[integer]:4 name[character varying]:'t4'
 0/703F8A8 | 583 | COMMIT 583
 0/703F8E0 | 584 | BEGIN 584
 0/703F8E0 | 584 | table public.mytab: DELETE: (no-tuple-data)
 0/703F948 | 584 | COMMIT 584
(7 rows)

postgres=#

Logical decoding can also be managed using pg_recvlogical included in the PostgreSQL distribution.
Let’s create a slot using pg_recvlogical

[postgres@dbi-pg-essentials_3 PG1]$ pg_recvlogical -d postgres --slot=myslot_2  --create-slot

And let’s start the streaming in a first terminal

[postgres@dbi-pg-essentials_3 PG1]$ pg_recvlogical -d postgres --slot=myslot_2  --start -f -

If we do an insert in the database from a second terminal

postgres=# insert into mytab values (9,'t9');
INSERT 0 1
postgres=#

We will see following in the first terminal

[postgres@dbi-pg-essentials_3 PG1]$ pg_recvlogical -d postgres --slot=myslot_2  --start -f -   
BEGIN 587
table public.mytab: INSERT: id[integer]:9 name[character varying]:'t9'
COMMIT 587
Conclusion

In this blog we have seen that if we want to do logical decoding, we have to set the parameter wal_level to logical. Be aware that setting wal_level to logical can increase the volume of generated WAL. If we just want replication or archiving WALs, the value replica is enough with PostgreSQL.

Cet article PostgreSQL: When wal_level to logical est apparu en premier sur Blog dbi services.

Looping sssd_nss

Thu, 2019-01-24 02:17

Often Linux systems are connected to LDAP via sssd. A high CPU consuming sssd can be seen in top, like below ( e.g. 89.4 % CPU usage):


9020 root 20 0 1296344 466780 333364 R 89.4 0.8 41:20.17 sssd_nss
27227 oracle 20 0 2371676 48320 29732 S 4.3 0.1 27:00.70 oracle

One reason could be default configuration of /etc/nsswitch.conf:


passwd: db sss files
shadow: db sss files
group: db sss files

This configuration causes sssd_nss contacted by every user / group lookup if in local databases nothing is found, which may result in looping sssd_nss.

This /etc/nsswitch.conf configuration may help:


passwd: db files sss
shadow: db files sss
group: db files sss

Now first the local files are queried before sssd is contacted.

What also may help is to clear sssd cache after change of /etc/nsswitch.conf:


systemctl stop sssd
rm -rf /var/lib/sss/db/*
systemctl start sssd

Cet article Looping sssd_nss est apparu en premier sur Blog dbi services.

Red Hat Forum Switzerland 2019

Wed, 2019-01-23 05:11

Today (actually right now) we are a proud sponsor of the Red Hat Forum Switzerland 2019 in Geneva. With around 300 people attending this is quite a huge event and we had a lot of interesting discussions with other partners as well as existing customers or just people we met and didn’t know before.

First of all we focused on Red Hat’ing our marketing crew, of course. Image is everything:

sdr

As expected a lot of discussions and sessions are around cloud, hybrid cloud and containers. OpenShift was a topic everywhere and we really could feel that there is high demand but also many, many questions around that topic. The exhibition area filled quickly during the breaks and that was the great possibility to step into the very interesting discussions:

sdr

Even at the event we have been busy and sometimes it is required to have more than one phone. Interesting comments here:

sdr

This is when Microsoft meets open source:

sdr

Red Hat deserves a big thank you for the organization, everything was well prepared and working, here is our booth:

rptoz

… and here are the dbi services people who attended today:

rptoz

See you at the next Red Hat Forum in Zürich later this year.

Cet article Red Hat Forum Switzerland 2019 est apparu en premier sur Blog dbi services.

SharePoint Application Server Role, Web Server IIS Role Error

Wed, 2019-01-23 03:22
Bypassing SharePoint Server 2013 Prerequisites Installation Error On Windows Server 2016

 

SYMPTOMS

Before running the setup of SharePoint 2013 on Windows Server 2016, the prerequisites as the application server role and the web server role have to be installed and during that process, the following error message appears:

Prerequisite Installation Error

ROOT CAUSE

This error occurs when one or more of the following conditions is true:

  • The product preparation tool does not progress past the configuring application server role, web server role stage.
  • The product preparation tool may be unable to configure and install properly the required windows features for SharePoint.
  • The Application Server Role has been deprecated from Windows Server 2016
WORKAROUND

To workaround this issue, please follow this step:

Method:

Install the following software:

Copy and paste the Windows Server AppFabric software (do not install it) on the C drive and run the following PowerShell command:

C:\>.\WindowsServerAppFabricSetup_x64.exe /i CacheClient,CachingService,CacheAdmin /gac

When the installation is done, reboot the server and install the AppFabric cumulative update Server App Fabric CU and reboot again the Windows Server.

Run the setup.exe from the .iso file to complete the installation wizard.

 

 

 

 

 

 

 

 

 

Cet article SharePoint Application Server Role, Web Server IIS Role Error est apparu en premier sur Blog dbi services.

systemd configurations for Documentum

Tue, 2019-01-22 15:56
systemd configurations for Documentum

systemd has been with us for several years now and has slowly made its way into most Linux distributions. While it has generated much controversy among sysV init hard core, the fact is that it is here to stay and we, Documentum administrators, don’t have our say in this topic. In effect, it does not impact us very much, except that a little translation work is necessary to switch to it, provided that we already went the service way. Most of the time, our custom monolithic script to stop, start and inquiry the status of the several Documentum components can be reused as-is, it is just its invocation that changes. On the other hand, we can take profit of this opportunity to refactor that big script and define separate units for each components. Since systemd lets us define dependencies between components, we can externalize these out of the script, into systemd units. As a result, our stop/start script become slenderer, more readable and easier to maintain. So let’s see how to do all this.

Invocation of the big script

Such a big, monolithic script, let’s call it documentum.sh, is executed by dmadmin and has the typical following layout:

...
start:
launch the docbrokers
start the method server
start the docbases
stop:
shut the docbases down
stop the method server
stop the docbrokers
status:
check the docbrokers
check the method server
check the docbases
...

For simplicity, let’s assume henceforth that we are logged as root when typing all the systemd-related commands below.
To invoke this script from within systemd, let’s create the documentum.service unit:

cat - <<EndOfUnit > /etc/systemd/system/documentum.service
[Unit]
Description=Documentum components controls;
Type=oneshot
RemainAfterExit=yes

ExecStart=sudo -u dmadmin -i /app/dctm/server/dbi/documentum.sh start
ExecStop=sudo -u dmadmin -i /app/dctm/server/dbi/documentum.sh stop
 
[Install]
WantedBy=multi-user.target
EndOfUnit

The clause Type is oneshot because the unit runs commands that terminate, not services.
Unlike real services whose processes keep running after they’ve been started, dm_* scripts terminate after they have done their job, which is to launch some Documentum executables as background processes; thus, RemainAfterExit is needed to tell systemd that the services are still running once started.
ExecStart and ExecStop are obviously the commands to run in order to start, respectively stop the service.
See here for a comprehensive explanation of all the unit’s directives.
Now, activate the service:

systemctl enable documentum.service

This unit has no particular dependencies because all the Documentum-related stuff is self-contained and the needed system dependencies are all available at that point.
On lines 7 and 8, root runs the big script as user dmadmin. Extra care should be taken in the “change user” command so the script is run as dmadmin. It is of paramount importance that sudo be used instead of su. Both are very closely related: a command must be executed as another user, provided the real user has the right to do so (which is the case here because systemd runs as root).

man sudo:
SUDO(8) BSD System Manager's Manual SUDO(8)
 
NAME
sudo, sudoedit — execute a command as another user
 
man su:
SU(1) User Commands SU(1)
 
NAME
su - change user ID or become superuser

However, they behave differently in relation to systemd. With “su – dmadmin -c “, the command gets attached to a session in dmadmin’s slice:

systemd-cgls
 
├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
├─user.slice
│ ├─user-618772.slice
│ │ └─session-41.scope
│ │ ├─ 5769 sshd: adm_admin2 [priv
│ │ ├─ 5783 sshd: adm_admin2@pts/
│ │ ├─ 5784 -bash
│ │ ├─11277 systemd-cgls
│ │ └─11278 less
│ └─user-509.slice
│ ├─session-c11.scope
│ │ ├─10988 ./documentum -docbase_name global_registry -security acl -init_file /app/dctm/server/dba/config/global_registry/server.ini
│ └─session-c1.scope
│ ├─6385 ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/server/dba/config/dmtest/server.ini

...

Here, user id 509 is dmadmin. We see that 2 docbase processes are attached to dmadmin’s slice, itself attached to the global user.slice.
With “sudo -u dmadmin -i “, the command gets attached to the system.slice:

system-cgls
 
├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
├─user.slice
│ └─user-618772.slice
│ └─session-10.scope
│ ├─4314 sshd: adm_admin2 [priv
│ ├─4589 sshd: adm_admin2@pts/
│ ├─4590 -bash
│ ├─5927 /usr/share/centrifydc/libexec/dzdo service documentum.service start
│ ├─5928 /bin/systemctl start documentum.service
│ ├─5939 /usr/bin/systemd-tty-ask-password-agent --watch
│ ├─5940 /usr/bin/pkttyagent --notify-fd 5 --fallback
│ ├─6219 systemd-cgls
│ └─6220 less
└─system.slice
├─documentum.service
│ ├─5944 /usr/bin/sudo -u dmadmin -i /app/dctm/server/dbi/documentum.sh start
│ ├─5945 /bin/bash /app/dctm/server/dbi/documentum.sh start
│ ├─5975 ./dmdocbroker -port 1489 -init_file /app/dctm/server/dba/Docbroker.ini
│ ├─5991 ./dmdocbroker -port 1491 -init_file /app/dctm/server/dba/Docbrokerdmtest.ini
│ ├─6013 ./documentum -docbase_name global_registry -security acl -init_file /app/dctm/server/dba/config/global_registry/server.ini
│ ├─6023 ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/server/dba/config/dmtest/server.ini
│ ├─6024 sleep 30
│ ├─6055 /app/dctm/server/product/7.3/bin/mthdsvr master 0xfd070016, 0x7fa02da79000, 0x223000 1000712 5 6013 global_registry /app/dctm/server/dba/log
│ ├─6056 /app/dctm/server/product/7.3/bin/mthdsvr master 0xfd070018, 0x7f261269c000, 0x223000 1000713 5 6023 dmtest /app/dctm/server/dba/log

...

Here, user 618772 ran the command “dzdo service documentum.service start” (dzdo is a Centrify command analog to sudo but with privileges checked against Active Directory) to start the documentum.service, which started the command “sudo -u dmadmin -i /app/dctm/server/dbi/documentum start” as defined in the unit and attached its processes under system.slice.
The difference is essential: at shutdown, sessions are closed abruptly, so if a stop/start script is running in it, its stop option will never be invoked.
Processes running under the system.slice on the other hand have their command’s stop option invoked properly so they can cleanly shut down.
This distinction is rarely necessary because generally all the services run as root even though their installation may be owned by some other user. E.g. an apache listening on the default port 80 must run as root. Documentum stuff was not designed to be a service, just background processes running as dmadmin. But thanks to this trick, they can still be managed as services.
At boot time, the unit will be processed and its start commands (there can be many, but here only one for the big script) executed.
It is also possible to invoke the service documentum.service manually:

systemctl start | stop | status documentum.service

The old sysvinit syntax is still available too:

service documentum.service start | stop | status

Thus, everything is in one place and uses a common management interface, which is specially appealing to a system administrator with no particular knowledge of each product installed on each machine under their control, e.g. to become dmadmin and invoke the right dm_* script.
The direct invocation of the unit file is still possible:

/etc/systemd/system/documentum.service start | stop | status

but the service interface is so much simpler.
One remark here: the status clause implemented in the big script above is not the one invoked by the command “systemctl status”:

systemctl status documentum.service
● documentum.service - Documentum Content Server controls for the runtime lifecycle
Loaded: loaded (/etc/systemd/system/documentum.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Mon 2018-10-22 14:03:09 CEST; 4min 6s ago
Process: 25388 ExecStop=/bin/su - dmadmin -c /app/dctm/server/dbi/startstop stop (code=exited, status=0/SUCCESS)
Process: 24069 ExecStart=/bin/su - dmadmin -c sh -c 'echo " ** Starting documentum"' (code=exited, status=0/SUCCESS)
Main PID: 924 (code=exited, status=0/SUCCESS)

Instead, the latter just returns the status of the service per se, not of the resources exposed by the service. It is indeed possible to display the current status of those programs in the same output but some special work need to be done for this. Basically, those processes need to periodically push their status to their service by calling systemd-notify; this could be done by a monitoring job for example. See systemd-notify’s man page for more details.
There is no ExecStatus clause in the unit either, although it would make some sense to define a command that asks the service’s processes about its status. We still need some custom script for this.

Splitting the big script

As the full systemd way is chosen, why not introduce a finer service granularity ? To do this, each Documentum component can be extracted from the big script and turned into a service of its own, as illustrated below.
Unit documentum.docbrokers.service

cat - <<EndOfUnit > /etc/systemd/system/documentum.docbrokers.service
[Unit]
Description=Documentum docbrokers controls;
Type=oneshot
RemainAfterExit=yes

# no dependencies;

# there are 2 docbrokers here;
ExecStart=sudo -u dmadmin -i /app/dctm/server/dba/dm_launch_docbroker
ExecStart=sudo -u dmadmin -i /app/dctm/server/dba/dm_launch_docbrokerdmtest
ExecStop=sudo -u dmadmin -i /app/dctm/server/dba/dm_stop_docbroker
ExecStop=sudo -u dmadmin -i /app/dctm/server/dba/dm_stop_docbrokerdmtest
 
[Install]
WantedBy=multi-user.target
EndOfUnit

Now, activate the service:

systemctl enable documentum.docbrokers.service

Lines 10 to 13 call the standard docbroker’s dm_* scripts.

Unit documentum.method-server.service

cat - <<EndOfUnit > /etc/systemd/system/documentum.method-server.service
[Unit]
Description=Documentum method server controls;

After=documentum.docbrokers.service
Requires=documentum.docbrokers.service

Type=oneshot
RemainAfterExit=yes

ExecStart=sudo -u dmadmin -i /app/dctm/server/shared/wildfly9.0.1/server/startMethodServer.sh start
ExecStop=sudo -u dmadmin -i /app/dctm/server/shared/wildfly9.0.1/server/stopMethodServer.sh stop
 
[Install]
WantedBy=multi-user.target
EndOfUnit

Now, activate the service:

systemctl enable documentum.method-server.service

While the dependency with the docbrokers is defined explicitly on line 5 and 6 (see later for an explanation of these clauses), the one with the docbases is a bit ambiguous. Traditionally, the method server is started after the docbases even though, as its name implies, it is a server for the docbases, which are thus its clients. So, logically, it should be started before the docbases, like the docbrokers, and not the other way around. However, the method server executes java code that may use the DfCs and call back into the repository, so the dependency between repositories and method server is two-way. Nevertheless, since it is the docbases that initiate the calls (methods don’t execute spontaneously on the method server), it makes sense to start the method server before the repositories and define a dependency from the latter to the former. This will also simplify the systemd configuration if a passphrase is to be manually typed to start the docbases (see paragraph below).
Lines 11 and 12 call the standard Documentum script for starting and stopping the method server.

Unit documentum.docbases.service

cat - <<EndOfUnit > /etc/systemd/system/documentum.docbases.service
[Unit]
Description=Documentum docbases controls;

After=documentum.docbrokers.service documentum.method-server.service
Requires=documentum.docbrokers.service documentum.method-server.service

Type=oneshot
RemainAfterExit=yes

ExecStart=sudo -u dmadmin -i /app/dctm/server/dba/dm_start_global_registry
ExecStart=sudo -u dmadmin -i /app/dctm/server/dba/dm_start_dmtest
ExecStop=sudo -u dmadmin -i /app/dctm/server/dbi/dm_shutdown_global_registry
ExecStart=sudo -u dmadmin -i /app/dctm/server/dba/dm_shutdown_dmtest
 
[Install]
WantedBy=multi-user.target
EndOfUnit

Now, activate the service:

systemctl enable documentum.docbases.service

Here, the dependencies must be explicitly defined because the docbases need the docbrokers to start. The method server is needed for executing java code requested by the docbases. The After= on line 5 clause says that the current unit documentum.docbases.service waits until the units listed here have been started. The Requires= clause on line 6 says that the current unit documentum.docbases.service cannot start without the other two units so they must all be started successfully, otherwise documentum.docbases.service fails. By default, they start concurrently but the After= clause postpones starting documentum.docbases.service until after the other 2 have started.
Lines 11 to 14 call the standard Documentum script for starting and stopping a docbase.
This alternative does not use the custom script any more but exclusively the ones provided by Documentum; one less thing to maintain at the cost of some loss of flexibility, should any special startup logic be required someday. Thus, don’t bin that big script so quickly, just in case.

Hybrid alternative

The custom monolithic script does everything in one place but lacks the differentiation between components. E.g. the start option starts everything and there is no way to address a single component. An enhanced script, dctm.sh, with the syntax below would be nice:

dctm.sh start|stop|status component

i.e.

dctm.sh start|stop|status docbrokers|docbases|method-server

It could even go as far as differentiating among the repositories and docbrokers:

dctm.sh start|stop|status docbroker:docbroker|docbase:docbase|method-server

A plural keyword syntax could also be used when differentiation is not wanted (or when too lazy to specify the component, or when the component’s exact name is not known/remembered), to collectively address a given type of component:

dctm.sh start|stop|status [--docbrokers|--docbroker docbroker{,docbroker}|--docbases|--docbase docbase{,docbase}|--method-server]

i.e. a list of components can be specified, or all or each of them at once. If none are specified, all of them are addressed. The highlighted target names are keywords while the italicized ones are values. This is a good exercise in parsing command-line parameters, so let’s leave it to to reader !
All these components could be addressed individually either from the corresponding service unit (or from systemd-run, see next paragraph):
Unit documentum.docbrokers.service

...
ExecStart=sudo -u dmadmin -i /app/dctm/server/dbi/dctm.sh start --docbrokers
ExecStop=sudo -u dmadmin -i /app/dctm/server/dbi/dctm.sh stop --docbrokers
...

Unit documentum.method-server.service

...
ExecStart=sudo -u dmadmin -i /app/dctm/server/dbi/dctm.sh start --method-server
ExecStop=sudo -u dmadmin -i /app/dctm/server/dbi/dctm.sh stop --method-server
...

Unit documentum.docbases.service

...
ExecStart=sudo -u dmadmin -i /app/dctm/server/dbi/dctm.sh start --docbases
ExecStop=sudo -u dmadmin -i /app/dctm/server/dbi/dctm.sh stop --docbases
...

As explained above, dctm.sh’s status parameter is not reachable from systemctl but a monitoring agent could put it to good use.
Thus, we have here the granularity of the previous alternative while retaining the flexibility of the monolithic script, e.g. for checking a status (see the next paragraph for another reason to keep the custom script). Each variant has it pros and cons and, as it is often the case, flexibility comes at the cost of complexity.

The case of the missing lockbox passphrase

If a lockbox is in use and a passphrase must be entered interactively by an administrator to start the database, then that service cannot be started by systemd at boot time because at that time the passphrase is still missing from dmadmin’s shared memory. Thus, the docbase start must be delayed until after the passphrase has been loaded. If the service’s start clause is removed and missing, systemd will complain but if we leave it, the start will effectively fail because of the missing lockbox’ passphrase. So, how to exit this dead end ?
A fake start through the clause ExecStart=/bin/true could replace the real start but then how to start the docbases via systemctl once the passphrase has been entered ?
One possible trick is to leave the invocation of the custom script in the service’s start clause but add some logic in that script so it can determine itself how its start clause was invoked. If it was within, say, a 1 minute uptime, then it is obviously an automatic invocation at boot time. The script then aborts and returns false so the service is marked “not started” and can be started manually with no need to first stop it (which would be necessary if it simply returned a 0 exit code). An administrator would then enter the lockbox passphrase, typically with the command below:

sudo -u dmadmin -i dm_crypto_boot -all –passphrase
then, type the passphrase at the prompt

and manually start the service as written above.
A possible implementation of this logic is:

start)):
MAX_BOOT_TIME=60
ut=$(read tot idle < /proc/uptime; echo ${tot%.*})
[ $ut -lt $MAX_BOOT_TIME ] && exit 1

If the service is later stopped and restarted without rebooting, the uptime would be larger than 1 minute and therefore the enhanced custom script dctm.sh (we need this one because only the docbases need to be started, the other components have been already started as services at this point) would do the start itself directly, assuming that the passphrase is now in dmadmin’s shared memory (if it’s not, the start will fail again and the service stay in the same state).
This 1 minute delay can look short but systemd attempts to start as much as possible in parallel, except when dependencies are involved, in which case some serialization is performed. This is another advantage of systemd: a shorter boot time for faster reboots. The fact that most installations run now inside virtual machines makes the reboot even faster. The delay value must not be too large because it is possible that an administrator, who may have done the shutdown themself, is waiting behind their keyboard for the reboot to complete, log in, enter the passphrase and start the service, which will be rejected by the above logic as it considers that it is too soon to do it.

Running a command as a service

systemd makes it possible to run a command as a service, in which case no unit file is necessary. This is an alternative to the missing lockbox passphrase case. An administrator would first load the passphrase in dmadmin’s shared memory and later manually invoke a custom script, with no special logic involving the uptime, as follows:

dzdo systemd-run --unit=dctm.docbases --slice=system.slice --remain-after-exit sudo -u dmadmin -i /app/dctm/server/dbi/dctm.sh start --docbases

Such services without unit files are called transient services.
Thus, only the docbrokers and the method server would have their respective unit, while the docbases would be started manually as transient services. The enhanced custom script, dctm.sh, is directly invoked here, not the documentum.docbases.service unit file (there is no need for one any more), with the special command-line argument −−docbases, as discussed in the previous paragraph.
Thanks to the parameter −−slice, the processes with be attached under system.slice and therefore be treated like a service:

├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
├─user.slice
│ └─user-618772.slice
│ └─session-10.scope
│ ├─ 4314 sshd: adm_admin2 [priv
│ ├─ 4589 sshd: adm_admin2@pts/
│ ├─ 4590 -bash
│ ├─15561 systemd-cgls
│ └─15562 systemd-cgls
└─system.slice
├─dctm.docbases.service
│ ├─15347 /usr/bin/sudo -u dmadmin -i /app/dctm/server/dbi/dctm.sh start --docbases
│ ├─15348 /bin/bash /app/dctm/server/dbi/dctm.sh start
│ ├─15378 ./dmdocbroker -port 1489 -init_file /app/dctm/server/dba/Docbroker.ini
│ ├─15395 ./dmdocbroker -port 1491 -init_file /app/dctm/server/dba/Docbrokerdmtest.ini
│ ├─15416 ./documentum -docbase_name global_registry -security acl -init_file /app/dctm/server/dba/config/global_registry/server.ini
│ ├─15426 ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/server/dba/config/dmtest/server.ini

Note how “.service” has been suffixed to the given dynamic unit name dctm.docbases.
The stop and status options are available too for transient services with “systemctl stop|status dctm.docbases.service”.

Useful commands

The following systemd commands can be very useful while troubleshooting and checking the services:

systemctl --all
systemctl list-units --all
systemctl list-units --all --state=active
systemctl list-units --type=service
systemctl list-unit-files
systemctl list-dependencies documentum.docbases.service
systemctl cat documentum.docbases.service
systemctl show documentum.docbases.service
show documentum.docbases.service -p After
show documentum.docbases.service -p Before
systemctl mask ...
systemctl unmask ...
rm -r /etc/systemd/system/bad.service.d
rm /etc/systemd/system/bad.service
 
# don't forget to do this after a unit file has been edited;
systemctl daemon-reload
 
# check the journal, e.g. to verify how the processes are stopped at shutdown and restarted at reboot;
journalctl --merge
journalctl -u documentum.docbases.service
 
# reboot the machine;
/sbin/shutdown -r now

Check systemctl’s man pages for more details.

User services

All the systemd commands can be run as an ordinary user (provided the command-line option −−user is present) and services can be created under a normal account too. The unit files will be stored in the user’s ~/.config/systemd/user directory. The managing interface will be the same; it is even possible to have such user services started automatically at boot time (cf. the lingering option), and stopped at system shut down. Thus, if all we want is a smooth, no brain managing interface for the Documentum processes accessible to the unprivileged product’s administrators as dmadmin, this is a handy feature.

Conclusion

Configuring systemd-style services for Documentum is not such a big a deal once we have a clear idea of what we want.
The main advantage to go the service way is to benefit from a uniform management interface so that any administrator, even without knowledge of the product, can start it, inquiry its status, and stop it. When a passphrase to be entered interactively is in use, there is no real advantage to use a service, except to have the guarantee that its stop sequence will be invoked at shutdown so the repository will be in a consistent state at the end. Actually, for Documentum, especially in the passphrase case, going the service way or staying with a custom script or the standard dm_* scripts is more a matter of IT policies rather than a technical incentive, i.e. the final decision will be more procedural than technical. Nevertheless, having a services’ standard management interface, while still keeping custom scripts for more complicate logic, can be very convenient.

Cet article systemd configurations for Documentum est apparu en premier sur Blog dbi services.

Documentum – xPlore – How to Improve the Search results in DA or D2

Tue, 2019-01-22 07:22

When supporting Documentum, we often got complaining from users like “The search is not working” or “I cannot find my document”.
The first reflex is to verify if the xPlore is working properly. The second reflex is to perform a search test but all is working correctly.
Then we contact user to get more details about his search.
Sometimes the only problem is that user does not use the search properly. Indeed the Documentum search use its “own” language with wildcard and the user just don’t use the right syntax.
Here I propose to set a kind of Rosetta stone which make the link between Human and xPlore language.

Ready to “talk” with the Full Text ?
Let’s start !

Search comprehension:

A word: this is a set of alphanumeric characters in between space characters.

A wildcard character: this is a kind of placeholder represented by a single character, such as an asterisk [ * ], takes the place of any other single character or a string or zero charactere.

 

Search with one word and quotation marks [ ” ” ] When user type
in search field
Dsearch understand Will match Will
NOT
match
“word1″ “word1″: Here Fulltext will search for an exact match of terms contained between the quotation marks word1 word
word11
xword1

 

Search with one word When user type
in search field
Dsearch understands Will match Will
NOT
match
word1 “word1*”
Here FullText add an asterisk [ * ] at the end of the word which matches single character or a string or zero character.
And quotation marks are also automatically added at the beginning and the end of the word. word1
word1A
word14A
word132 word
wo
rword
1word
word4A

 

Search with one word and wildcard question marks [ ? ] When user type
in search field
Dsearch understands Will match Will
NOT
match
word1? “word1?”
Here Fulltext will replace the question mark [ ? ] with zero or one single character.
And quotation marks are also automatically added at the beginning and the end of the word. word1
word11
word12
word1a
word1x
word
word123
xword11

 

Search with one word and wildcard star [ * ] When user type
in search field
Dsearch understands Will match Will
NOT
match
word1* “word1*”
The asterisk [ * ] matches single character or a string or zero character.
And quotation marks are also automatically added at the beginning and the end of the word. word1
word12
word12a
word1x2b
word
word1

 

Search with several words When user type
in search field
Dsearch understands Will match Will
NOT
match
word1 word2 “word1*” OR “word2*”
The [ space ] between two words is translated as “OR“.
The rule “1) search with one word” is applied for each word word1 OR word2
word1A
word14A
word132
word2
word2A
word24A
word232
xword1 OR xword2

 

Search with several words and quotation marks When user type
in search field
Dsearch understands Will match Will
NOT
match
“word1 word2″ “word1 word2″
Here Fulltext will search for an exact match of terms contained within the quotation mark word1 word2 word1 OR word2
word1 AND xword2
xword1 AND word2

 

Search with several words and [+] character When user type
in search field
Dsearch understands Will match Will
NOT
match
word1 + word2 “word1*” AND “word2*”
Here combination of word [ space ][ + ]word is translated as “AND“.
The rule “1) search with one word” is applied for each word
Note: Search can be combined with several [+] word1 +word2 +word3 word1 AND word2
word1 AND word2x
word1x AND word2
word1x AND word2x word1
word2
xword1 AND word2
word1 AND xword2

 

These are the basic search “tips” mostly out of the box, be aware these search behaviors can be customized with some parameters modifications.

My advice is “Abuse quotation marks ¨” ;-)

I hope this has been helpfull !

Cet article Documentum – xPlore – How to Improve the Search results in DA or D2 est apparu en premier sur Blog dbi services.

Two techniques for cloning a repository filestore, part II

Fri, 2019-01-18 03:27

This is part II of a two-part article. In part I, we introduced the Documentum repository file structure and saw a first way to transfer content files from one filesystem to another using the repository to get their full path on disk. We saw how to generate an easy to parallelize set of rsync commands to completely or partially copy a repository’s content files. Here, we’ll show another way to do that which does not imply querying the repository.
Before this, however, we need test data, i.e. sub-directories with files on disk. The good thing is that those don’t have to be linked to documents in the repository, they can just be loose files, even empty ones, scattered over Documentum-style sub-trees, and they are easy to produce.

Creating a test sub-tree

The usual test repository is an out of the box, practically empty one; therefore we need to create a dense Documentum-like sub-tree with lots of directories (up to 128 x 256 x 256 = 8128K ones per filestore, but we won’t go so far) and a few files, say 1 per subdirectory. No need to actually create related documents in a repository since we want to perform the copy of the files from outside the repository. As said above, the number of files is not really important as only their parent directory (and indirectly, its content) is being passed to rsync. A one-file terminal sub-directory would be enough for it to trigger its transfer. Also, their size does not matter either (we don’t want to measure the transfer speed, just to prepare performant rsync statements), so 0-byte files will do perfectly.
The short python program below creates a Documentum-style subdirectory.

#! /usr/bin/python3
import datetime
import time
import sys
import os
import random
import multiprocessing
import multiprocessing.pool
import functools
import traceback
import getopt

# creates a Documentum tree at a given filesystem location using multi-processes for speed;
# Usage:
#   ./make-tree.py root_point
# Example:
#   ./make-tree.py "/home/dmadmin/documentum/data/dmtest/content_storage01/0000c35c/80"

# prerequisite to prevent the error OSError: [Errno 24] Too many open files if too many processes are forked;
# ulimit -n 10000

# 12/2018, C. Cervini, dbi-services;

def Usage():
   print("""Generates a dense Documentum-style sub-tree (i.e. a 3-level subtree with files only in the last one) with 255 x 256 sub-directories and 256 x 256 x 100 empty files under a given sub-directory;
Randomization of number of subdirectories and number of files at each level is possible within given limits;
This is useful to test different tree walking algorithms.
Created directories and files under the root directory have hexadecimal names, i.e. 00, 01, ... ff.
Usage:
   make-tree.py [-h|--help] | [-d|--dry-run] | [-r|--root-dir ]
A dry-run will just print to stdout the command for the nodes to be created instead of actually creating them, which is useful to process them later by some more efficient parallelizing tool;
Example:
   make-tree.py --root-dir "dmtest/content_storage01/0000c35c/80"
If a forest is needed, just invoke this program in a shell loop, e.g.:
   for i in 80 81 82 83 84 85; do
      make-tree.py --root-dir "dmtest/content_storage01/0000c35c/${i}" &
   done
This command will start 6 O/S python processes each creating a subtree named 80 .. 85 under the root directory.
The created 6-tree forest will have the following layout:
dmtest/content_storage01/0000c35c                                                d
   /80                                                                           d
      /00                                                                        d
         /00                                                                     d
            /00                                                                  f
            /01                                                                  f
            ...                                                                ...
            /63                                                                  f
         /01
            /00
            /01
            ...
            /63
         ...
         /ff
            /00
            /01
            ...
            /63
      /01
         /00
            /00
            /01
            ...
            /63
         /01
            /00
            /01
            ...
            /63
         ...
         /ff
            /00
            /01
            ...
            /63
      ...
      /ff
         /00
            /00
            /01
            ...
            /63
         /01
            /00
            /01
            ...
            /63
         ...
         /ff
            /00
            /01
            ...
            /63
   /81
   ...
   /82
   ...
   /83
   ...
   /84
   ...
   /85
   ...
It will contain 6 x 256 x 256 directories and 6 x 256 x 256 x 1 files, unless randomization is requested, see the gp.* parameters;
   """)

# this function cannot be local to make_tree(), though it is only used there, because of the following error:
#     Can't pickle local object 'make_tree..make_files'
# the error prevents it to be invoked as a callback;
# actually, functions used in processes must have been entirely defined prior their usage; this implies that functions cannot be fork processes that calls themselves;
# therefore, the master function make_tree is needed that detaches the processes that execute the functions defined earlier, make_files and make_level;
# moreover, processes in a pool are daemonic and are not allowed to fork other processes;
# we must therefore use a global pool of processes allocated in the master function;
def make_files(dir):
   if gp.bRandomFiles:
      nbFiles = random.randint(gp.minFiles, gp.maxFiles)
   else:
      nbFiles = gp.maxFiles
   for nf in range(nbFiles):
      fullPath = dir + "/" + (gp.fileFormat % nf)
      if gp.bTest:
         print(f"touch {fullPath}")
      else:
         try:
            fd = os.open(fullPath, os.O_CREAT); os.close(fd)
         except:
           traceback.print_exc()
           print("ignoring ...")
   return nbFiles

# ditto;
def make_level(dir):
   """
   create a directory level under dir;
   """
   global gp

   if gp.bRandomSubDirs:
      nbSubDirs = random.randint(gp.minSubDirs, gp.maxSubDirs)
   else:
      nbSubDirs = gp.maxSubDirs
   level_dirs = []
   for nd in range(nbSubDirs):
      subDir = dir + "/" + (gp.dirFormat % nd)
      if gp.bTest:
         print("mkdir", subDir)
      else:
         try:
            os.mkdir(subDir)
         except:
            traceback.print_exc()
            print("ignoring ...")
      level_dirs.append(subDir)
   return level_dirs

# the master function;
# it creates 2 levels of directories and then empty files under the deep-most directories;
def make_tree(root):
   global gp, stats
   sub_level_dirs = []

   # list_dirs contains a list of the values returned by parallelized calls to function make_level, i.e. a list of lists;
   # get_dirs is called as a callback by map_async at job completion;
   def get_dirs(list_dirs):
      global stats
      nonlocal sub_level_dirs
      for l in list_dirs:
         stats.nb_created_dirs += len(l)
         sub_level_dirs.extend(l)

   # nb_files contains a list of the values returned by parallelized calls to function make_files, i.e. a list of numbers;
   # get_nb_files is called as a callback by map_async at job completion;
   def get_nb_files(nb_files):
      global stats
      stats.nb_dirs_at_bottom += len(nb_files)
      stats.nb_created_files += functools.reduce(lambda x, y: x + y, nb_files)
      stats.nb_created_dirs_with_files += functools.reduce(lambda x, y: x + y, [1 if x > 0 else 0 for x in nb_files])

   # callback for error reporting;
   def print_error(error):
      print(error)

   # first directory level;
   level_dirs = make_level(root)
   stats.nb_created_dirs += len(level_dirs)

   # 2nd directory level;
   sub_level_dirs = []
   gp.pool = multiprocessing.pool.Pool(processes = gp.maxWorkers)
   gp.pool.map_async(make_level, level_dirs, len(level_dirs), callback = get_dirs, error_callback = print_error)
   gp.pool.close()
   gp.pool.join()

   # make dummy files at the bottom-most directory level;
   gp.pool = multiprocessing.pool.Pool(processes = gp.maxWorkers)
   gp.pool.map_async(make_files, sub_level_dirs, len(sub_level_dirs), callback = get_nb_files, error_callback = print_error)
   gp.pool.close()
   gp.pool.join()

# -----------------------------------------------------------------------------------------------------------
if __name__ == "__main__":
# main;

   # global parameters;
   # we use a typical idiom to create a cheap namespace;
   class gp: pass
   
   # dry run or real thing;
   gp.bTest = 0
   
   # root directory;
   gp.root = None
   try:
       (opts, args) = getopt.getopt(sys.argv[1:], "hdr:", ["help", "dry-run", "root-dir="])
   except getopt.GetoptError:
      print("Illegal option")
      print("./make-tree.py -h|--help | [-d|--dry-run] | [-r|--root-dir ]")
      sys.exit(1)
   for opt, arg in opts:
      if opt in ("-h", "--help"):
         Usage()
         sys.exit()
      elif opt in ("-d", "--dry-run"):
         gp.bTest = 1
      elif opt in ("-r", "--root-dir"):
         gp.root = arg
   if None == gp.root:
      print("Error: root_dir must be specified")
      print("Use -h or --help for help")
      sys.exit()

   # settings for Documentum;
   # nb sub-directories below the "80";
   gp.maxLevels = 2
   
   # nb tree depth levels;
   gp.maxDepth = gp.maxLevels + 1
   
   # random nb sub-directories in each level;
   # maximum is 256 for Documentum;
   gp.bRandomSubDirs = 0
   gp.minSubDirs = 200
   gp.maxSubDirs = 256
   
   # random nb files in each level;
   # maximum is 256 for Documentum;
   gp.bRandomFiles = 0
   gp.minFiles = 0
   gp.maxFiles = 1
   
   # node names' format;
   gp.dirFormat = "%02x"
   gp.fileFormat = "%02x"
   
   # maximum numbers of allowed processes in the pool; tasks will wait until some processes are available;
   # caution not to choose huge values for disk I/Os are saturated and overall performance drops to a crawl;
   gp.maxWorkers = 40
   
   # again but for the counters;
   class stats: pass
   stats.nb_dirs_at_bottom = 0
   stats.nb_created_files = 0
   stats.nb_created_dirs = 0
   stats.nb_created_dirs_with_files = 0

   # initialize random number generator;
   random.seed()
         
   startDate = datetime.datetime.now()
   print(f"started building a {gp.maxDepth}-level deep tree starting at {gp.root} at {startDate}")
   if not gp.bTest:
      try:
         os.makedirs(gp.root)
      except:
         pass
   make_tree(gp.root)
   print("{:d} created files".format(stats.nb_created_files))
   print("{:d} created dirs".format(stats.nb_created_dirs))
   print("{:d} created dirs at bottom".format(stats.nb_dirs_at_bottom))
   print("{:d} total created dirs with files".format(stats.nb_created_dirs_with_files))
   print("{:d} total created nodes".format(stats.nb_created_dirs + stats.nb_created_files))

   endDate = datetime.datetime.now()
   print(f"Ended building subtree {gp.root} at {endDate}")
   print(f"subtree {gp.root} built in {(endDate - startDate).seconds} seconds, or {time.strftime('%H:%M:%S', time.gmtime((endDate - startDate).seconds))} seconds")

No special attention as been given to the user interface and most settings are hard-coded in the script; their values are currently set to produce a dense (256 x 256 = 64K sub-directories/filestore) Documentum-style directory sub-tree each with just 1 zero-byte file. It is pointless, but possible, to have more files here since we only want to generate a set of rsync commands, one for each non empty subdirectory.
Examples of execution:
Create a dense 64K sub-tree at relative path clone/dmtest/content_storage01/0000c35a/86, with one empty file leaf in each:

./make-tree-dctm4.py --root-dir clone/dmtest/content_storage01/0000c35a/86
started building a 3-level deep tree starting at clone/dmtest/content_storage01/0000c35a/86 at 2018-12-31 18:47:20.066665
65536 created files
65792 created dirs
65536 created dirs at bottom
65536 total created dirs with files
131328 total created nodes
Ended building subtree clone/dmtest/content_storage01/0000c35a/86 at 2018-12-31 18:47:34.612075
subtree clone/dmtest/content_storage01/0000c35a/86 built in 14 seconds, or 00:00:14 seconds

Sequentially create a 6-tree forest starting at relative path clone/dmtest/content_storage01/0000c35a:

for i in 80 81 82 83 84 85; do
./make-tree-dctm4.py --root-dir clone/dmtest/content_storage01/0000c35a/${i}
done
started building a 3-level deep tree starting at clone/dmtest/content_storage01/0000c35a/80 at 2018-12-31 18:41:31.933329
65536 created files
65792 created dirs
65536 created dirs at bottom
65536 total created dirs with files
131328 total created nodes
Ended building subtree clone/dmtest/content_storage01/0000c35a/80 at 2018-12-31 18:41:39.694346
subtree clone/dmtest/content_storage01/0000c35a/80 built in 7 seconds, or 00:00:07 seconds
started building a 3-level deep tree starting at clone/dmtest/content_storage01/0000c35a/81 at 2018-12-31 18:41:39.738200
65536 created files
65792 created dirs
65536 created dirs at bottom
65536 total created dirs with files
131328 total created nodes
Ended building subtree clone/dmtest/content_storage01/0000c35a/81 at 2018-12-31 18:42:14.166057
subtree clone/dmtest/content_storage01/0000c35a/81 built in 34 seconds, or 00:00:34 seconds
...
subtree clone/dmtest/content_storage01/0000c35a/84 built in 22 seconds, or 00:00:22 seconds
started building a 3-level deep tree starting at clone/dmtest/content_storage01/0000c35a/85 at 2018-12-31 18:43:06.644111
65536 created files
65792 created dirs
65536 created dirs at bottom
65536 total created dirs with files
131328 total created nodes
Ended building subtree clone/dmtest/content_storage01/0000c35a/85 at 2018-12-31 18:43:41.527459
subtree clone/dmtest/content_storage01/0000c35a/85 built in 34 seconds, or 00:00:34 seconds

So, the test creation script is quite quick with the current 40 concurrent workers. Depending on your hardware, be careful with that value because the target disk can easily be saturated and stop responding, especially if each tree of a forest is created concurrently.
Just out of curiosity, how long would it take to ls to navigate this newly created forest ?

# clear the disk cache first;
sudo sysctl vm.drop_caches=3
vm.drop_caches = 3
 
time ls -1R clone/dmtest/content_storage01/0000c35a/* | wc -l
1840397
real 2m27,959s
user 0m4,291s
sys 0m19,587s
 
# again without clearing the disk cache;
time ls -1R clone/dmtest/content_storage01/0000c35a/* | wc -l
1840397
real 0m2,791s
user 0m0,941s
sys 0m1,950s

ls is very fast because there is just one leaf file in each sub-tree; it would be whole different story with well filled terminal sub-directories. Also, the cache has a tremendous speed up effect and the timed tests will always take care to clear them before a new run. The command above is very effective for this purpose.

Walking the trees

As written above, the main task before generating the rsync commands is to reach the terminal sub-directories. Let’s see if the obvious “find” command could be used at first, e.g.:

loc_path=/home/documentum/data/dmtest/./content_storage_01
find ${loc_path} -type d | gawk -v FS='/' -v max_level=11 '{if (max_level == NF) print}'
/home/documentum/data/dmtest/./content_storage_01/0000c350/80/00/00
/home/documentum/data/dmtest/./content_storage_01/0000c350/80/00/01
...
/home/documentum/data/dmtest/./content_storage_01/0000c350/80/00/0b
...

find stopped at the last subdirectory, just before the files, as requested.
Can we get rid of the extra-process used by the gawk filter ? Let’s try this:

find ${loc_path} -mindepth 4 -maxdepth 4 -type d
/home/documentum/data/dmtest/./content_storage_01/0000c350/80/00/00
/home/documentum/data/dmtest/./content_storage_01/0000c350/80/00/01
...
/home/documentum/data/dmtest/./content_storage_01/0000c350/80/00/0b
...

We can, good. The -mindepth and -maxdepth parameters, and the -type filter, let us jump directly to the directory level of interest, which is exactly what we want.

The “find” command is very fast; e.g. on the test forest above:

time find ${loc_path} -mindepth 4 -maxdepth 4 -type d | wc -l
458752
 
real 0m10,423s
user 0m0,536s
sys 0m2,450s

10 seconds for the forest’s 458’000 terminal directories, with disk cache emptied beforehand, impressing. If those directories were completely filled, they would contain about 117 millions files, a relatively beefy repository. Thus, find is a valuable tool, also because it is directed to stop before reaching the files. Finally, Documentum’s unusual file layout does not look so weird now, does it ? Let’s therefore use find to generate the rsync commands on the terminal sub-directories it returns:

find $loc_path -mindepth 4 -maxdepth 4 -type d | xargs -i echo "rsync -avzR {} dmadmin@${dest_machine}:{}" > migr

Example of execution:

find $loc_path -mindepth 4 -maxdepth 4 -type d | xargs -i echo "rsync -avzR {} dmadmin@new_host:/some/other/dir/dmtest"
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/ea dmadmin@new_host:/some/other/dir/dmtest
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/5e dmadmin@new_host:/some/other/dir/dmtest
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/aa dmadmin@new_host:/some/other/dir/dmtest
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/cc dmadmin@new_host:/some/other/dir/dmtest
...

Subsequently, the commands in migr could be executed in parallel N at a time, with N a reasonable number that won’t hog the infrastructure.
The same gawk script showed in part I could be used here:

find $loc_path -mindepth 4 -maxdepth 4 -type d | gawk -v nb_rsync=10 -v dest=dmadmin@new_machine:/some/other/place/dmtest 'BEGIN {
print "\#!/bin/bash"
}
{
printf("rsync -avzR %s %s &\n", $0, dest)
if (0 == ++nb_dirs % nb_rsync)
print "wait"
}
END {
if (0 != nb_dirs % nb_rsync)
print "wait"
}' > migr.sh

Output:

rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/ea dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/5e dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/aa dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/cc dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/e5 dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/bd dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/1d dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/61 dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/39 dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/75 dmadmin@new_machine:/some/other/place/dmtest &
wait
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/6d dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/d2 dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/8c dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/a1 dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/84 dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/b8 dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/a4 dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/27 dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/fe dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/e6 dmadmin@new_machine:/some/other/place/dmtest &
wait
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/4f dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/84/ea/82 dmadmin@new_machine:/some/other/place/dmtest &
...
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/86/95/92 dmadmin@new_machine:/some/other/place/dmtest &
wait
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/86/95/20 dmadmin@new_machine:/some/other/place/dmtest &
rsync -avzR /home/documentum/data/dmtest/./content_storage01/0000c35a/86/95/95 dmadmin@new_machine:/some/other/place/dmtest &
wait
 
chmod +x migr.sh
nohup ./migr.sh &

The good thing with rsync in archive mode is that the commands can be interrupted anytime; once relaunched, they will quickly resume the transfers where they left since they work incrementally.
Here, no SQL statements to correct the dm_locations, if needed, because we don’t connect to the repository to get the necessary information. It is not a big deal to produce them in case of need, however, just don’t forget.

Feeling uncomfortable because of the many produced rsync commands ? As in part I, just tell find to stop one level above the terminal sub-directory using “-mindepth 3 -maxdepth 3″ at the cost of some speed decrease, though.

Any faster find alternative ?

As fast as find is when walking a docbase’s directories with lots of filestores, it can still take a long time despite the depth and type limits. Like we asked before for rsync, is there any way to speed it up by running several find commands concurrently on their own directory partition ? Such a find could be launched on sub-directory /home/dmadmin/data/dmtest/filestore_1, and others on /home/dmadmin/data/dmtest/filestore_i, all simultaneously. E.g.:

slice=0; for fs in dmtest/./*; do
(( slice += 1 ))
(
find $fs -mindepth 4 -maxdepth 4 -type d | gawk -v nb_rsync=10 -v dest=dmadmin@new_machine:/some/other/place/dmtest 'BEGIN {
print "\#!/bin/bash"
}
{
printf("rsync -avzR %s %s &\n", $0, dest)
if (0 == ++nb_dirs % nb_rsync)
print "wait"
}
END {
if (0 != nb_dirs % nb_rsync)
print "wait"
}' > migr${slice}.sh
chmod +x migr${slice}.sh
./migr${slice}.sh
) &
done

But in such approach, the degree of parallelism is limited by the number of sub-directories to explore (i.e. here, the number of filestores in the repository). What about the script below instead ?

#! /usr/bin/python3
import datetime
import time
import sys
import os
import multiprocessing
import multiprocessing.pool
import getopt

# walks a Documentum tree at a given filesystem location using multi-processes for speed;
# C. Cervini, dbi-services.com, 12/2018;

def Usage():
   print("""Walks a sub-tree and prints the files' full path;
Usage:
walk-tree.py [-h|--help] | [-p |--print] [-r|--root-dir ] [-v|--verbose] [-m|--max_level]
-p |--print for outputting the files found; default action is do nothing, which is handy for just having a count of objects;
-r|--root-dir is the starting directory to explore;
-v|--verbose for outputting timing and count information; off by default;
-m|--max_level stops recursion at the max_level-th tree level, not included; like find's -maxdepth parameter;
Example:
walk-tree.py --print --root-dir ./dmtest/content_storage01/0000c35d/80
walk-tree.py --verbose --root-dir ./dmtest/content_storage01/0000c35d/80
""")

# this function cannot be local to do_tree(), though it is only used there, because of the following error:
#     Can't pickle local object 'do_tree..do_level'
# the error prevents it to be invoked as a local callback;
# actually, functions used in processes must have been entirely defined prior to their usage; this implies that functions cannot fork processes that calls themselves;
# therefore, the master function do_tree is needed that detaches the processes that execute the functions defined earlier, do_level here;
# moreover, processes in a pool are daemonic and are not allowed to fork other processes;
# we must therefore use a global pool of processes allocated in the master function;
def do_level(dir):
   """
   lists the sub-directores and files under dir;
   """
   global gp
   result = {'nb_files': 0, 'nb_dirs': 0, 'level_dirs': []}
   if gp.max_level > 0 and len() - gp.root_level > gp.max_level:
      #print("too deep, returning")
      if gp.bPrint:
         print(dir)
      return result
   for node in os.listdir(dir):
      fullpath = os.path.join(dir, node)
      if os.path.isdir(fullpath):
         result['nb_dirs'] += 1
         result['level_dirs'].append(fullpath)
      elif os.path.isfile(fullpath):
         if gp.bPrint:
            print(fullpath)
         result['nb_files'] += 1
   return result

# the master function;
def do_tree(root):
   global gp, stats
   sub_level_dirs = []

   # list_dirs contains a list of the values returned by parallelized calls to function do_level, i.e. a list of dictionaries;
   # get_dirs is invoked as a callback by map_async once all the processes in the pool have terminated executing do_level() and returned their result;
   def get_dirs(list_dirs):
      global stats
      nonlocal sub_level_dirs
      for l in list_dirs:
         stats.nb_dirs += len(l['level_dirs'])
         stats.nb_files += l['nb_files']
         stats.nb_dirs_with_files += (1 if l['nb_files'] > 0 else 0)
         sub_level_dirs.extend(l['level_dirs'])

   # callback for error reporting;
   def print_error(error):
      print(error)

   # first directory level;
   level_data = do_level(root)
   stats.nb_files += level_data['nb_files']
   stats.nb_dirs += len(level_data['level_dirs'])
   stats.nb_dirs_with_files += (1 if level_data['nb_files'] > 0 else 0)

   # all the other directory sub-levels;
   while level_data['nb_dirs'] > 0:
      sub_level_dirs = []
      gp.pool = multiprocessing.pool.Pool(processes = gp.maxWorkers)
      gp.pool.map_async(do_level, level_data['level_dirs'], len(level_data['level_dirs']), callback = get_dirs, error_callback = print_error)
      gp.pool.close()
      gp.pool.join()
      level_data['nb_files'] = 0
      level_data['nb_dirs'] = len(sub_level_dirs) 
      level_data['level_dirs'] = sub_level_dirs

# -----------------------------------------------------------------------------------------------------------
if __name__ == "__main__":
   # main;
   # global parameters;
   # we use a typical idiom to create a cheap but effective namespace for the global variables used as execution parameters;
   class gp: pass

   # root directory;
   gp.root = None
   gp.bPrint = False
   gp.bVerbose = False
   gp.root_level = 0
   gp.max_level = 0
   try:
       (opts, args) = getopt.getopt(sys.argv[1:], "hpr:m:v", ["help", "print", "root-dir=", "verbose", "max_level="])
   except getopt.GetoptError:
      print("Illegal option")
      print("./walk-tree.py -h|--help | [-p | --print] [-r|--root-dir ] [-v|--verbose] [-m|--max_level]")
      sys.exit(1)
   for opt, arg in opts:
      if opt in ("-h", "--help"):
         Usage()
         sys.exit()
      elif opt in ("-p", "--print"):
         gp.bPrint = True
      elif opt in ("-r", "--root-dir"):
         gp.root = arg
      elif opt in ("-v", "--verbose"):
         gp.bVerbose = True
      elif opt in ("-m", "--max_level"):
         try:
            gp.max_level = int(arg)
         except:
            print("invalid value for max_level")
            sys.exit()
   if None == gp.root:
      print("Error: root_dir must be specified")
      print("Use -h or --help for help")
      sys.exit()
   gp.root_level = len()

   # maximum numbers of allowed processes; tasks will wait until some processes are available;
   # caution not to choose huge values for disk I/Os are saturated and overall performance drops to a crawl;
   gp.maxWorkers = 50

   # again but for the counters;
   class stats: pass
   stats.nb_files = 0
   stats.nb_dirs = 0
   stats.nb_dirs_with_files = 0

   startDate = datetime.datetime.now()
   if gp.bVerbose:
      print(f"started walking {gp.root} at {startDate}")
   do_tree(gp.root)
   endDate = datetime.datetime.now()
   if gp.bVerbose:
      print("{:d} found files".format(stats.nb_files))
      print("{:d} found dirs".format(stats.nb_dirs))
      print("{:d} total found nodes".format(stats.nb_dirs + stats.nb_files))
      print("{:d} total found dirs with files".format(stats.nb_dirs_with_files))
      print(f"Ended walking subtree {gp.root} at {endDate}")
      print(f"subtree {gp.root} walked in {(endDate - startDate).seconds} seconds, or {time.strftime('%H:%M:%S', time.gmtime((endDate - startDate).seconds))} seconds")

Here, we have a pool of workers which receives tasks to explore sub-directories up to a given depth, just like find’s maxdepth. But is it any faster than a sequential find ? On my test laptop with an external USB 3.1 spinning drive, find gives the best result:

sudo sysctl vm.drop_caches=3; time find dmtest/content_storage01 -mindepth 4 -maxdepth 4 -type d | wc -l
vm.drop_caches = 3
1787628
 
real 35m19,989s
user 0m9,316s
sys 0m43,632s

The python script is very close but lags 2 minutes behind with 35 workers:

sudo sysctl vm.drop_caches=3; time ./walk-tree4.py -v --root-dir dmtest/content_storage01 -m 3
vm.drop_caches = 3
started walking dmtest/content_storage01 at 2019-01-01 17:47:39.664421
0 found files
1797049 found dirs
1797049 total found nodes
0 total found dirs with files
Ended walking subtree dmtest/content_storage01 at 2019-01-01 18:25:18.437296
subtree dmtest/content_storage01 walked in 2258 seconds, or 00:37:38 seconds
real 37m38,996s
user 0m34,138s
sys 1m11,747s

Performance gets worst when the number of concurrent workers is increased, likely because the disk is saturated. It was not designed for such intensive use in the first place. Whether with find or the python script, most of the execution time is spent waiting for the I/Os to complete, with very little time spent in user or system code. Obviously, there is little to optimize here, except switch to a faster disks sub-system, e.g. beginning with an SSD drive. Even a find’s parallel equivalent would not bring any improvement, it would only put more pressure on the disk and be counter-productive. But on real production infrastructures, with large enough disk bandwidth, if may be worth parallelizing if the volume is large, and that’s where the script can make a difference.

So, which one is better ?

Both alternatives generates more or less the same rsync commands to execute later; the only difference is the source of the information to produce those commands: the repository in the first alternative and the filesystem in the second.
The first alternative looks simpler and cleaner because it works from a repository and get its information directly from it. But if one wants to be absolutely, positively sure not to forget any content file, the second alternative is better as it works directly from the filesystem; since no queries are run for each of the contents, a lot of time is saved, even despite the required disk walking. It is ideal when an exact clone is needed, orphans, and possibly garbage, included. Its python script variant is interesting in that it can readily take advantage of the faster I/Os through an easy to set concurrency level.

Cet article Two techniques for cloning a repository filestore, part II est apparu en premier sur Blog dbi services.

Two techniques for cloning a repository filestore, part I

Fri, 2019-01-18 03:26

I must confess that my initial thought for the title was “An optimal repository filestore copy”. Optimal, really ? Relatively to what ? Which variable(s) define(s) the optimality ? Speed/time to clone ? Too dependent on the installed hardware and software, and the available resources and execution constraints. Simplicity to do it ? Too simple a method can result in a very long execution time while complexity can give a faster solution but be fragile, and vice-versa. Besides, simplicity is a relative concept; a solution may look simple to someone and cause nightmares to some others. Beauty ? I like that one but no, too fuzzy too. Finally, I settled for the present title for it is neutral and up to the point. I leave it up to the reader to judge if the techniques are optimal or simple or beautiful. I only hope that they can be useful to someone.
This article has two parts. In each, I’ll give an alternative for copying a repository’s filestores from one filesystem to another. Actually, both techniques are very similar, they just differ in the way the the content files’ path locations are determined. But let’s start.

A few simple alternatives

Cloning a Documentum repository is a well-known procedure nowadays. Globally, it implies to first create a placeholder docbase for the clone and then to copy the meta-data stored in the source database, e.g. through an export/import, usually while the docbase is stopped, plus the document contents, generally stored on disks. If a special storage peripheral is used, such as a Centera CAS or a NAS, there might be a fast, low-level way to clone to content files directly at the device level, check with the manufacturer.
If all we want is an identical copy of the whole contents’ filesystem, the command dd could be used, e.g. supposing that the source docbase and the clone docbase both use for their contents a dedicated filesystem mounted on /dev/sda1 respectively on /dev/sdb1:

sudo dd if=/dev/mapper/vol01 bs=1024K of=/dev/mapper/vol02

The clone’s filesystem /dev/mapper/vol02 could be mounted temporarily on the source docbase’s machine for the copy and later dismounted. If this is not possible, dd can be used over the network, e.g.:

# from the clone machine;
ssh root@source 'dd if=/dev/mapper/vol01 bs=1024K | gzip -1 -' | zcat | dd of=/dev/mapper/vol02
 
# from the source machine as root;
dd if=/dev/mapper/vol01 bs=1024K | gzip -1 - | ssh dmadmin@clone 'zcat | dd of=/dev/mapper/vol02'

dd, and other partition imaging utilities, perform a block by block mirror copy of the source, which is much faster than working at the file level, although it depends on the percentage of used space (if the source filesystem is almost empty, dd will spend most of its time copying unused blocks, which is useless. Here, a simple file by file copy would be more effective). If it is not possible to work at this low level, e.g. filesystems are not dedicated to repositories’ contents or the types of the filesystems differ, then a file by file copy is required. Modern disks are quite fast, especially for deep-pocket companies using SSD, and so file copy operations should be acceptably quick. A naive command such as the one below could even be used to copy the whole repository’s content (we suppose that the clone will be on the same machine):

my_docbase_root=/data/Documentum/my_docbase
dest_path=/some/other/or/identical/path/my_docbase
cp -rp ${my_docbase_root} ${dest_path}/../.

If $dest_path differs from $my_docbase_root, don’t forget to edit the dm_filestore’s dm_locations of the clone docbase accordingly.

If confidentiality is requested and/or the copy occurs across a network, scp is recommended as it also encrypts the transferred data:

scp -rp ${my_docbase_root} dmadmin@${dest_machine}:${dest_path}/../.

The venerable tar command could also be used, on-the-fly and without creating an intermediate archive file, e.g.:

( cd ${my_docbase_root}; tar cvzf - * ) | ssh dmadmin@${dest_machine} "(cd ${dest_path}/; tar xvzf - )"

Even better, the command rsync could be used as it is much more versatile and efficient (and still secure too if configured to use ssh, which is the default), especially if the copy is done live several times in advance during the days preceding the kick-off of the new docbase; such copies will be performed incrementally and will execute quickly, providing an easy way to synchronize the copy with the source. Example of use:

rsync -avz --stats ${my_docbase_root}/ dmadmin@{dest_machine}:${dest_path}

The trailing / in the source path means copy the content of ${loc_path} but not the directory itself as we assumed it already exists.
Alternatively, we can include the directory too:

rsync -avz --stats ${my_docbase_root} dmadmin@{dest_machine}:${dest_path}/../.

If we run it from the destination machine, the command changes to:

rsync -avz --stats dmadmin@source_machine:${my_docbase_root}/ ${dest_path}/.

The –stats option is handy to obtain a summary of the transferred files and the resulting performance.

Still, if the docbase is large and contains millions to hundreds of millions of documents, copying them to another machine can take some time. If rsync is used, repeated executions will just copy over modified or new documents and optionally remove the deleted ones if the archiving mode is requested (the -a option above) but the first run will take time anyway. Logically, reasonably taking advantage of the available I/O bandwidth by having several rsync running at once should reduce the time to clone the filestore, shouldn’t it ? Is it possible to apply here a divide-and-conquer technique and process each part simultaneously ? It is, and here is how.

How Documentum stores the contents on disk

Besides the sheer volume of documents and possibly the limited network and disk I/O bandwidth, one reason the copy can take a long time, independently from the tools used, is the peculiar way Documentum stores its contents on disk, with all the content files exclusively at the bottom of a 6-level deep sub-tree with the following structure (assuming $my_docbase_root has the same value as above; the letter at column 1 means d for directory and f for file):

cd $my_docbase_root
d filestore_1
d <docbase_id>, e.g. 0000c350
d 80 starts with 80 and increases by 1 up to ff, for a total of 2^7 = 128 sub-trees;
d 00 up to 16 ^ 16 = 256 directories directly below 80, from 00 to ff
d 00 again, up to 256 directories directly below, from 00 to ff, up to 64K total subdirectories at this level; let's call these innermost directories "terminal directories";
f 00[.ext] files are here, up to 256 files, from 00 to ff per terminal directory
f 01[.ext] f ...
f ff[.ext] d 01
f 00[.ext] f 01[.ext] f ...
f ff[.ext] d ...
d ff
f 00[.ext] f 01[.ext] f ...
f ff[.ext] d 01
d ...
d ff
d 81
d 00
d 01
d ...
d ff
d ...
d ff
d 00
d 01
d ...
d ff
f 00[.ext] f 01[.ext] f ...
f ff[.ext]

The content files on disk may have an optional extension and are located exclusively at the extremities of their sub-tree; there are no files in the intermediate sub-directories. Said otherwise, the files are the leaves of a filestore directory tree.
All the nodes have a 2-digit lowercase hexadecimal name, from 00 to ff, possibly with holes in the sequences when sub-directories or deleted documents have been swept by the DMClean job. With such a layout, each filestore can store up to (2^7).(2^8).(2^8).(2^8) files, i.e. 2^31 files or a bit more than 2.1 billions files. Any number of filestores can be used for virtually an “unlimited” number of content files. However, since each content object has an 16-digit hexadecimal id whose only last 8 hexadecimal digits are really distinctive (and directly map to the filesystem path of the content file, see below), a docbase can effectively contain “only” up to 16^8 files, i.e. 2^32 content files or slightly more than 4.2 billions files distributed among all the filestores. Too few ? There is hope, aside from spawning new docbases. The knowledge base note here explains how “galactic” r_object_id are allocated if more than 4 billions documents are present in a repository, so it should be possible to have literally gazillions of documents in a docbase. It is not clear though whether this galactic concept is implemented yet or whether it has ever been triggered once, so let us stay with our feet firmly on planet Earth for the time being.

It should be emphasized that such a particular layout in no way causes a performance penalty in accessing the documents’contents from within the repository because their full path can easily be computed by the content server out of their dm_content.data_ticket attribute (a signed decimal number), e.g. as shown with the one-liner:

data_ticket=-2147440384
echo $data_ticket | gawk '{printf("%x\n", $0 + 4294967296)}'
8000a900

or, more efficiently entirely in the bash shell:

printf "%x\n" $(($data_ticket + 4294967296))
8000a900

This value is now split apart by groups of 2 hex digits with a slash as separator: 80/00/a9/00
To compute the full path, the filestore’s location and the docbase id still need to be prepended to the above partial path, e.g. ${my_docbase_root}/filestore_01/0000c350/80/00/a9/00. Thus, knowing the r_object_id of a document, we can find its content file on the filesystem as shown (or, preferably, using the getpath API function) and knowing the full path of a content file makes it possible to find back the document (or documents as the same content can be shared among several documents) in the repository it belongs to. To be complete, the explanation still needs the concept of filestore and its relationship with a location but let’s stop digressing (check paragraph 3 below for a hint) and focus back to the subject at hand. We have now enough information to get us started.

As there are no files in the intermediate levels, it is necessary to walk the entire tree to reach them and start their processing, which is very time-consuming. Depending on your hardware, a ‘ls -1R’ command can takes hours to complete on a set of fairly dense sub-trees. A contrario, this is an advantage for processing through rsync because rsync is able to create all the necessary sub-path levels (aka “implied directories” in rsync lingo) if the -R|–relative option is provided, as if a “mkdir -p” were issued; thus, in order to optimally copy an entire filestore, it would be enough to rsync only the terminal directories, once identified, and the whole sub-tree would be recreated implicitly. In the illustration above, the rsync commands for those paths are:

cd ${my_docbase_root}
rsync -avzR --stats filestore_01/80/00/00 dmadmin@{dest_machine}:${dest_path}
rsync -avzR --stats filestore_01/80/00/01 dmadmin@{dest_machine}:${dest_path}
rsync -avzR --stats filestore_01/80/00/ff dmadmin@{dest_machine}:${dest_path}
rsync -avzR --stats filestore_01/ff/ff/00 dmadmin@{dest_machine}:${dest_path}

In rsync ≥ v2.6.7, it is even possible to restrict the part within the source full path that should be copied remotely, so no preliminary cd is necessary, e.g.:

rsync -avzR --stats ${my_docbase_root}/./filestore_01/80/00/00 dmadmin@{dest_machine}:${dest_path}

Note the /./ path component, it marks the start of the relative path to reproduce remotely. This command will create the directory ${dest_path}/filestore_01/80/00/00 on the remote host and copy its content there.
Path specification can be quite complicated, so use the -n|–dry-run and -v|–verbose (or even -vv for more details) options to have a peek at rsync’s actions before they are applied.

With the -R option, we get to transfer only the terminal sub-directories AND their original relative paths, efficiency and convenience !
We potentially replace millions of file by file copy commands with only up to 128 * 64K directory copy commands per filestore, which is much more concise and efficient.

However, if there are N content files to transfer, at least ⌈N/256⌉ such rsync commands will be necessary, e.g. a minimum of 3’900 commands for 1 million files, subject of course to their distribution in the sub-tree (some lesser dense terminal sub-directories can contain less than 256 files so more directories and hence commands are required). It is not documented how the content server distributes them over the sub-directories and there is no balancing job that relocates the content files in order to reduce the number of terminal directories by increasing the density of the left ones and removing the emptied ones. All this is quite sensible because, while it may matter to us, it is a non-issue to the repository.
Nevertheless, on the bright side, since a tree is by definition acyclic, rsync transfers won’t overlap and therefore can be parallelized without synchronization issues, even when intermediate “implied directories”‘s creations are requested simultaneously by 2 or more rsync commands (if rsync performs an operation equivalent to “mkdir -p”, possible errors due to race conditions during concurrent execution can be simply ignored since the operation is idempotent).

Empty terminal paths can be skipped without fear because they are not referenced in the docbase (only content files are) and hence their absence from the copy cannot introduce inconsistencies.

Of course, in the example above, those hypothetical 3900 rsync commands won’t be launched at once but in groups of some convenient value depending on the load that the machines can endure or the application’s response time degradation if the commands are executed during the normal work hours. Since we are dealing with files and possibly the network, the biggest bottleneck will be the I/Os and care should be exercised not to saturate them. When dedicated hardware such as high-speed networked NAS with SSD drives are used, this is less of a problem and more parallel rsync instances can be started, but such expensive resources are often shared across a company so that someone might still be impacted at one point. I for one remember one night as I was running one data-intensive DQL query in ten different repositories at once and users were suddenly complaining that they couldn’t work their Word documents any more because response time fell down to a crawl. How was that possible ? What was the relationship between DQL queries in several repositories and a desktop program ? Well, the repositories used Oracle databases whose datafiles were stored on a NAS also used as a host for networked drives mounted on desktop machines. Evidently, that configuration was somewhat sloppy but one never knows how things are configured at a large company, so be prepared for the worst and set up the transfer so that they can be easily suspended, interrupted and resumed. The good thing with rsync in archive mode is that the transfers can be resumed where they left off at a minimal cost just by relaunching the same commands with no need to compute a restarting point.

It goes without saying that setting up public key authentication or ssh connection sharing is mandatory when rsync-ing to a remote machine in order to suppress the thousands of authentication requests that will pop up during the batch execution.

But up to 128 * 64K rsync commands per filestore : isn’t that a bit too much ?

rsync performs mostly I/Os operations, reading and writing the filesystems, possibly across a network. The time spent in those slow operations (mostly waiting for them to complete, actually) by far outweighs the time spent launching the rsync processes and executing user or system code, especially if the terminal sub-directories are densely filled with large files. Moreover, if the DMClean jobs has been run ahead of time, this 128 * 64K figure is a maximum, it is only reached if all of the terminal sub-directories are not empty.
Still, rsync has some cleverness of its own in processing the files to copy so why not let it do its job ? Is it possible to reduce their number ? Of course, by just stopping one level before the terminal sub-directories, at their parents’ level. From there, up to 128 * 256 rsync commands are necessary, down from 128 * 64K commands. rsync would then explore itself the up to 64K terminal directories below, hopefully more efficiently than when explicitly told so. For sparse sub-directories or small docbases, this could be more efficient. If so, what would be the cut-off point ? This is a complex question depending on so many factors that there is no other way to answer it than to experiment with several situations. A few informal tests show that copying terminal sub-directories with up to 64K rsync commands is about 40% faster than copying their parent sub-directories. If optimality is defined as speed, then the “64k” variant is the best; if it is defined as compactness, then the “256” variant is the best. One explanation for this could be that the finer and simpler the tasks to perform, the quicker they terminate and free up processes to be reused. Or maybe rsync is overwhelmed by the up to 64K sub-directories to explore and is not so good at that and needs some help. The scripts in this article allow experimenting with the “256” variant.

To summarize up to this point, we will address the cost of navigating the special Documentum sub-tree layout by walking the location sub-trees up to the last directory level (or up to 2 levels above if so requested) and generate efficient rsync commands that can easily be parallelized. But before we start, how about asking the repository about its contents ? As it keeps track of it, wouldn’t this alternative be much easier and faster than navigating complex directory sub-trees ? Let’see.

Alternate solution: just ask the repository !

Since a repository obviously knows where its content files are stored on disks, it makes sense to get this information directly from it. In order to be sure to include all the possible renditions as well, we should query dmr_content instead of dm_document(all) (note that the DQL function MFILE_URL() returns those too, so that a “select MFILE_URL(”) from dm_document(all)” could also be used here). Also, unless the dmDMClean job is run beforehand, dmr_content includes orphan contents as well, so this point must be clarified ahead. Anyway, by querying dmr_content we are sure not to omit any content, orphans or not.
The short python/bash-like pseudo-code shows how we could do it:

for each filestore in the repository:
(
   for each content in the filestore:
      get its path, discard its filename;
      add the path to the set of paths to transfer;
   for each path in the set of paths:
      generate an rsync -avzR --stat command;
) &

Line 4 just gets the terminal sub-directories, while line 5 ensures that they are unique in order to avoid rsync-ing the same path multiple times. We use sets here to guarantee distinct terminal path values (set elements are unique in the set).
Line 7 outputs the rsync commands for the terminal sub-directories and the destination.
Even though all filestores are processed concurrently, there could be millions of contents in each filestore and such queries could take forever. However, if we run several such queries in parallel, each working on its own partition (i.e. a non-overlapping subset of the whole such that their union is equal to the whole), we could considerably speed it up. Constraints such “where r_object_id like ‘%0′”, “where r_object_id like ‘%1′”, “where r_object_id like ‘%2′”, .. “where r_object_id like ‘%f'” can slice up the whole set of documents into 16 more or less equally-sized subsets (since the r_object_id is essentially a sequence, its modulo 16 or 256 distribution is uniform), which can then be worked on independently and concurrently. Constraints like “where r_object_id like ‘%00′” .. “where r_object_id like ‘%ff'” can produce 256 slices, and so on.
Here is a short python 3 program that does all this:

#!/usr/bin/env python

# 12/2018, C. Cervini, dbi-services;
 
import os
import sys
import traceback
import getopt
from datetime import datetime
import DctmAPI
import multiprocessing
import multiprocessing.pool

def Usage():
   print("""Purpose:
Connects as dmadmin/xxxx to a given repository and generates rsync commands to transfer the given filestores' contents to the given destination;
Usage:
   ./walk_docbase.py -h|--help | -r|--repository  [-f|--filestores [{,}]|all] [-d|--dest ] [-l|--level ]
Example:
   ./walk_docbase.py --docbase dmtest
will list all the filestores in docbase dmtest and exit;
   ./walk_docbase.py --docbase dmtest --filestore all --dest dmadmin@remote-host:/documentum/data/cloned_dmtest
will transfer all the filestores' content to the remote destination in /documentum/data, e.g.:
   dm_location.path = /data/dctm/dmtest/filestore_01 --> /documentum/data/cloned_dmtest/filestore_01
if dest does not contain a file path, the same path as the source is used, e.g.:
   ./walk_docbase.py --docbase dmtest --filestore filestore_01 --dest dmadmin@remote-host
will transfer the filestore_01 filestore's content to the remote destination into the same directory, e.g.:
   dm_location.path = /data/dctm/dmtest/filestore_01 --> /documentum/dctm/dmtest/filestore_01
In any case, the destination root directory, if any is given, must exist as rsync does not create it (although it creates the implied directories);
Generated statements can be dry-tested by adding the option --dry-run, e.g.
rsync -avzR --dry-run /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/0c dmadmin@dmtest:/home/dctm/dmtest
Commented out SQL statements to update the dm_location.file_system_path for each dm_filestore are output if the destination path differs from the source's;
level is the starting sub-directory level that will be copied by rsync;
Allowd values for level are 0 (the default, the terminal directories level), -1 and -2;
level -1 means the sub-directory level above the terminal directories, and so on;
Practically, use 0 for better granularity and parallelization;
""")

def print_error(error):
   print(error)

def collect_results(list_paths):
   global stats
   for s in list_paths:
      stats.paths = stats.paths.union(s)

def explore_fs_slice(stmt):
   unique_paths = set()
   lev = level
   try:
      for item in DctmAPI.select_cor(session, stmt):
         fullpath = DctmAPI.select2dict(session, f"execute get_path for '{item['r_object_id']}'")
         last_sep = fullpath[0]['result'].rfind(os.sep)
         fullpath = fullpath[0]['result'][last_sep - 8 : last_sep]
         for lev in range(level, 0):
            last_sep = fullpath.rfind(os.sep)
            fullpath = fullpath[ : last_sep]
         unique_paths.add(fullpath)
   except Exception as e:
      print(e)
      traceback.print_stack()
   DctmAPI.show(f"for stmt {stmt}, unique_paths={unique_paths}")
   return unique_paths

# --------------------------------------------------------
# main;
if __name__ == "__main__":
   DctmAPI.logLevel = 0
 
   # parse the command-line parameters;
   # old-style for more flexibility is not needed here;
   repository = None
   s_filestores = None
   dest = None
   user = ""
   dest_host = ""
   dest_path = ""
   level = 0
   try:
       (opts, args) = getopt.getopt(sys.argv[1:], "hr:f:d:l:", ["help", "docbase=", "filestore=", "destination=", "level="])
   except getopt.GetoptError:
      print("Illegal option")
      print("./graph-stats.py -h|--help | [-r|--repository ] [-f|--filestores [{,}]|all] [-d|--dest ][-l|--level ]")
      sys.exit(1)
   for opt, arg in opts:
      if opt in ("-h", "--help"):
         Usage()
         sys.exit()
      elif opt in ("-r", "--repository"):
         repository = arg
      elif opt in ("-f", "--filestores"):
         s_filestores = arg
      elif opt in ("-d", "--dest"):
         dest = arg
         p_at = arg.rfind("@")
         p_colon = arg.rfind(":")
         if -1 != p_at:
            user = arg[ : p_at]
            if -1 != p_colon:
               dest_host = arg[p_at + 1 : p_colon]
               dest_path = arg[p_colon + 1 : ]
            else:
               dest_path = arg[p_at + 1 : ]
         elif -1 != p_colon:
            dest_host = arg[ : p_colon]
            dest_path = arg[p_colon + 1 : ]
         else:
            dest_path = arg
      elif opt in ("-l", "--level"):
         try:
            level = int(arg)
            if -2 > level or level > 0:
               print("raising")
               raise Exception()
         except:
            print("level must be a non positive integer inside the interval (-2,  0)")
            sys.exit()
   if None == repository:
      print("the repository is mandatory")
      Usage()
      sys.exit()
   if None == dest or "" == dest:
      if None != s_filestores:
         print("the destination is mandatory")
         Usage()
         sys.exit()
   if None == s_filestores or 'all' == s_filestores:
      # all filestores requested;
      s_filestores = "all"
      filestores = None
   else:
      filestores = s_filestores.split(",")
 
   # global parameters;
   # we use a typical idiom to create a cheap namespace;
   class gp: pass
   gp.maxWorkers = 100
   class stats: pass

   # connect to the repository;
   DctmAPI.show(f"Will connect to docbase(s): {repository} and transfer filestores [{s_filestores}] to destination {dest if dest else 'None'}")

   status = DctmAPI.dmInit()
   session = DctmAPI.connect(docbase = repository, user_name = "dmadmin", password = "dmadmin")
   if session is None:
      print("no session opened, exiting ...")
      exit(1)

   # we need the docbase id in hex format;
   gp.docbase_id = "{:08x}".format(int(DctmAPI.dmAPIGet("get,c,docbaseconfig,r_docbase_id")))

   # get the requested filestores' dm_locations;
   stmt = 'select fs.r_object_id, fs.name, fs.root, l.r_object_id as "loc_id", l.file_system_path from dm_filestore fs, dm_location l where {:s}fs.root = l.object_name'.format(f"fs.name in ({str(filestores)[1:-1]}) and " if filestores else "")
   fs_dict = DctmAPI.select2dict(session, stmt)
   DctmAPI.show(fs_dict)
   if None == dest:
      print(f"filestores in repository {repository}")
      for s in fs_dict:
         print(s['name'])
      sys.exit()

   # filestores are processed sequentially but inside each filestore, the contents are queried concurrently;
   for storage_id in fs_dict:
      print(f"# rsync statements for filestore {storage_id['name']};")
      stats.paths = set()
      stmt = f"select r_object_id from dmr_content where storage_id = '{storage_id['r_object_id']}' and r_object_id like "
      a_stmts = []
      for slice in range(16):
         a_stmts.append(stmt + "'%0{:0x}'".format(slice))
      gp.path_pool = multiprocessing.pool.Pool(processes = 16)
      gp.path_pool.map_async(explore_fs_slice, a_stmts, len(a_stmts), callback = collect_results, error_callback = print_error)
      gp.path_pool.close()
      gp.path_pool.join()
      SQL_stmts = set()
      for p in stats.paths:
         last_sep = storage_id['file_system_path'].rfind(os.sep)
         if "" == dest_path:
            dp = storage_id['file_system_path'][ : last_sep] 
         else:
            dp = dest_path
         # note the dot in the source path: relative implied directories will be created from that position; 
         print(f"rsync -avzR {storage_id['file_system_path'][ : last_sep]}/.{storage_id['file_system_path'][last_sep : ]}/{str(gp.docbase_id)}/{p} {(user + '@') if user else ''}{dest_host}{':' if dest_host else ''}{dp}")
         if storage_id['file_system_path'][ : last_sep] != dest_path:
            # dm_location.file_system_path has changed, give the SQL statements to update them in clone's schema;
            SQL_stmts.add(f"UPDATE dm_location SET file_system_path = REPLACE(file_system_path, '{storage_id['file_system_path'][ : last_sep]}', '{dest_path}') WHERE r_object_id = '{storage_id['loc_id']}';")
      # commented out SQL statements to run before starting the repository clone;
      for stmt in SQL_stmts:
         print(f"# {stmt}")

   status = DctmAPI.disconnect(session)
   if not status:
      print("error while  disconnecting")

On line 10, the module DctmAPI is imported; such module was presented in a previous article (see here) but I include an updated version at the end of the present article.
Note the call to DctmAPI.select_cor() on line 51; this is a special version of DctmAPI.select2dict() where _cor stands for coroutine; actually, in python, it is called a generator but it looks very much like a coroutine from other programming languages. Its main interest is to separate navigating through the result set from consuming the returned data, for more clarity; also, since the the data are consumed one row at a time, there is no need to read them all into memory at once and pass them to the caller, which is especially efficient here where we can potentially have millions of documents. DctmAPI.select2dict() is still available and used when the expected result set is very limited, as for the list of dm_locations on line 154. By the way, DctmAPI.select2dict() invokes DctmAPI.select_cor() from within a list constructor, so they share that part of the code.
On line 171, function map_async is used to start 16 concurrent calls to explore_fs_slice on line 47 per filestore (the r_object_id % 16 expressed in DQL as r_object_id like ‘%0′ .. ‘%f’), each in its own process. That function repeatedly gets an object_id from the coroutine above and calls the administrative method get_path on it (we could query the dm_content.data_ticket and compute ourselves the file path but would it be any faster ?); the function returns a set of unique paths for its slice of ids. map_async then waits until all the processes terminate. Their result is collected by the callback collect_results starting on line 42; its parameter, list_paths, is a list of sets received from map_async (which received the sets from the terminating concurrent invocation of explore_fs_slice and put them in a list) that are further made unique by union-ing them into a global set. Starting on line 175, this set is iterated to generate the rsync commands.
Example of execution:

./walk-docbase.py -r dmtest -f all -d dmadmin@dmtest:/home/dctm/dmtest | tee clone-content
# rsync statements for filestore filestore_01;
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/0c dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/02 dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/06 dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/0b dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/04 dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/09 dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/01 dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/03 dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/07 dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/00 dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/05 dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/0a dmadmin@dmtest:/home/dctm/dmtest
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/08 dmadmin@dmtest:/home/dctm/dmtest
# UPDATE dm_location SET file_system_path = REPLACE(file_system_path, '/home/dmadmin/documentum/data/dmtest', '/home/dctm/dmtest') WHERE r_object_id = '3a00c3508000013f';
# rsync statements for filestore thumbnail_store_01;
# rsync statements for filestore streaming_store_01;
# rsync statements for filestore replicate_temp_store;
# rsync statements for filestore replica_filestore_01;

This execution generated the rsync commands to copy all the dmtest repository’s filestores to the remote host dmtest’s new location /home/dctm/dmtest. As the filestores’ dm_location has changed, an SQL statement (to be taken as an example because it is for an Oracle RDBMS; the syntax may differ in another RDBMS) has been generated too to accommodate the new path. We do this in SQL because the docbase clone will still be down at this time and the change must be done at the database level.
The other default filestores in the test docbase are empty and so no rsync are necessary for them; normally, the placeholder docbase already has initialized their sub-tree.
Those rsync commands could be executed in parallel, say, 10 at a time, by launching them in the background with “wait” commands inserted in between, like this:

./walk-docbase.py -r dmtest -f all -d dmadmin@dmtest:/home/dctm/dmtest | gawk -v nb_rsync=10 'BEGIN {
print "\#!/bin/bash"
nb_dirs = 0
}
{
print $0 " &"
if (!$0 || match($0, /^#/)) next
if (0 == ++nb_dirs % nb_rsync)
print "wait"
}
END {
if (0 != nb_dirs % nb_rsync)
print "wait"
}' | tee migr.sh
#!/bin/bash
# rsync statements for filestore filestore_01; &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/08 dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/02 dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/04 dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/03 dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/06 dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/01 dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/0a dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/07 dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/0c dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/05 dmadmin@dmtest:/home/dctm/dmtest &
wait
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/00 dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/09 dmadmin@dmtest:/home/dctm/dmtest &
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00/0b dmadmin@dmtest:/home/dctm/dmtest &
# UPDATE dm_location SET file_system_path = REPLACE(file_system_path, '/home/dmadmin/documentum/data/dmtest', '/home/dctm/dmtest') WHERE r_object_id = '3a00c3508000013f';
# rsync statements for filestore thumbnail_store_01;
# rsync statements for filestore streaming_store_01;
# rsync statements for filestore replicate_temp_store;
# rsync statements for filestore replica_filestore_01;
wait

Even though there may be a count difference of up to 255 files between some rsync commands, they should complete roughly at the same time so that nb_rsync commands should be running at any time. If not, i.e. if the transfers frequently wait for a few long running rsync to complete (it could happen with huge files), it may be worth using a task manager that makes sure the requested parallelism degree is respected at any one time throughout the whole execution.
Let’s now make the generated script executable and launch it:

chmod +x migr.sh
time ./migr.sh

The parameter level lets one choose the levels of the sub-directories that rsync will copy, 0 (the default) for terminal sub-directories, -1 for the level right above them and -2 for the level above those. As discussed, the lesser the level, the lesser rsync commands are necessary, e.g. up to 128 * 64K for level = 0, up to 128 * 256 for level = -1 and up to 128 for level = -2.
Example of execution:

./walk-docbase.py -r dmtest -d dd --level -1
# rsync statements for filestore filestore_01;
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80/00 dmadmin@dmtest:/home/dctm/dmtest &
# UPDATE dm_location SET file_system_path = REPLACE(file_system_path, '/home/dmadmin/documentum/data/dmtest', 'dd') WHERE r_object_id = '3a00c3508000013f';
# rsync statements for filestore thumbnail_store_01;
# rsync statements for filestore streaming_store_01;
# rsync statements for filestore replicate_temp_store;
# rsync statements for filestore replica_filestore_01;

And also:

./walk-docbase.py -r dmtest -d dd --level -2
# rsync statements for filestore filestore_01;
rsync -avzR /home/dmadmin/documentum/data/dmtest/./content_storage_01/0000c350/80 dmadmin@dmtest:/home/dctm/dmtest &
# UPDATE dm_location SET file_system_path = REPLACE(file_system_path, '/home/dmadmin/documentum/data/dmtest', 'dd') WHERE r_object_id = '3a00c3508000013f';
# rsync statements for filestore thumbnail_store_01;
# rsync statements for filestore streaming_store_01;
# rsync statements for filestore replicate_temp_store;
# rsync statements for filestore replica_filestore_01;

So, this first solution looks quite simple eventhough it initially puts a little, yet tolerable stress on the docbase. The python script connects to the repository and generates the required rsync commands (with an user-selectable compactness level) and a gawk filter prepares an executable with those statements launched in parallel N (user-selectable) at a time.
Performance-wise, its not so good because all the contents must be queried for their full path, and that’s a lot of queries for a large repository.

All this being said, let’s see now if a direct and faster, out of the repository filesystem copy procedure can be devised. Please, follow the rest of this article in part II. The next paragraph just lists the latest version of the module DctmAPI.py.

DctmAPI.py revisited
"""
This module is a python - Documentum binding based on ctypes;
requires libdmcl40.so/libdmcl.so to be reachable through LD_LIBRARY_PATH;
C. Cervini - dbi-services.com - december 2018

The binding works as-is for both python2 amd python3; no recompilation required; that's the good thing with ctypes compared to e.g. distutils/SWIG;
Under a 32-bit O/S, it must use the libdmcl40.so, whereas under a 64-bit Linux it must use the java backed one, libdmcl.so;

For compatibility with python3 (where strings are now unicode ones and no longer arrays of bytes, ctypes strings parameters are always converted to unicode, either by prefixing them
with a b if litteral or by invoking their encode('ascii', 'ignore') method; to get back to text from bytes, b.decode() is used;these works in python2 as well as in python3 so the source is compatible with these two versions of the language;
"""

import os
import ctypes
import sys, traceback

# use foreign C library;
# use this library in eContent server = v6.x, 64-bit Linux;
dmlib = 'libdmcl.so'

dm = 0

class getOutOfHere(Exception):
   pass

def show(mesg, beg_sep = False, end_sep = False):
   "displays the message msg if allowed"
   if logLevel > 0:
      print(("\n" if beg_sep else "") + repr(mesg), ("\n" if end_sep else ""))

def dmInit():
   """
   initializes the Documentum part;
   returns True if successfull, False otherwise;
   dmAPI* are global aliases on their respective dm.dmAPI* for some syntaxic sugar;
   since they already have an implicit namespace through their dm prefix, dm.dmAPI* would be redundant so let's get rid of it;
   returns True if no error, False otherwise;
   """

   show("in dmInit()")
   global dm

   try:
      dm = ctypes.cdll.LoadLibrary(dmlib);  dm.restype = ctypes.c_char_p
      show("dm=" + str(dm) + " after loading library " + dmlib)
      dm.dmAPIInit.restype    = ctypes.c_int;
      dm.dmAPIDeInit.restype  = ctypes.c_int;
      dm.dmAPIGet.restype     = ctypes.c_char_p;      dm.dmAPIGet.argtypes  = [ctypes.c_char_p]
      dm.dmAPISet.restype     = ctypes.c_int;         dm.dmAPISet.argtypes  = [ctypes.c_char_p, ctypes.c_char_p]
      dm.dmAPIExec.restype    = ctypes.c_int;         dm.dmAPIExec.argtypes = [ctypes.c_char_p]
      status  = dm.dmAPIInit()
   except Exception as e:
      print("exception in dminit(): ")
      print(e)
      traceback.print_stack()
      status = False
   finally:
      show("exiting dmInit()")
      return True if 0 != status else False
   
def dmAPIDeInit():
   """
   releases the memory structures in documentum's library;
   returns True if no error, False otherwise;
   """
   status = dm.dmAPIDeInit()
   return True if 0 != status else False
   
def dmAPIGet(s):
   """
   passes the string s to dmAPIGet() method;
   returns a non-empty string if OK, None otherwise;
   """
   value = dm.dmAPIGet(s.encode('ascii', 'ignore'))
   return value.decode() if value is not None else None

def dmAPISet(s, value):
   """
   passes the string s to dmAPISet() method;
   returns TRUE if OK, False otherwise;
   """
   status = dm.dmAPISet(s.encode('ascii', 'ignore'), value.encode('ascii', 'ignore'))
   return True if 0 != status else False

def dmAPIExec(stmt):
   """
   passes the string stmt to dmAPIExec() method;
   returns TRUE if OK, False otherwise;
   """
   status = dm.dmAPIExec(stmt.encode('ascii', 'ignore'))
   return True if 0 != status else False

def connect(docbase, user_name, password):
   """
   connects to given docbase as user_name/password;
   returns a session id if OK, None otherwise
   """
   show("in connect(), docbase = " + docbase + ", user_name = " + user_name + ", password = " + password) 
   try:
      session = dmAPIGet("connect," + docbase + "," + user_name + "," + password)
      if session is None or not session:
         raise(getOutOfHere)
      else:
         show("successful session " + session)
         show(dmAPIGet("getmessage," + session).rstrip())
   except getOutOfHere:
      print("unsuccessful connection to docbase " + docbase + " as user " + user_name)
      session = None
   except Exception as e:
      print("Exception in connect():")
      print(e)
      traceback.print_stack()
      session = None
   finally:
      show("exiting connect()")
      return session

def execute(session, dql_stmt):
   """
   execute non-SELECT DQL statements;
   returns TRUE if OK, False otherwise;
   """
   show("in execute(), dql_stmt=" + dql_stmt)
   try:
      query_id = dmAPIGet("query," + session + "," + dql_stmt)
      if query_id is None:
         raise(getOutOfHere)
      err_flag = dmAPIExec("close," + session + "," + query_id)
      if not err_flag:
         raise(getOutOfHere)
      status = True
   except getOutOfHere:
      show(dmAPIGet("getmessage," + session).rstrip())
      status = False
   except Exception as e:
      print("Exception in execute():")
      print(e)
      traceback.print_stack()
      status = False
   finally:
      show(dmAPIGet("getmessage," + session).rstrip())
      show("exiting execute()")
      return status

def select2dict(session, dql_stmt, attr_name = None):
   """
   execute the DQL SELECT statement passed in dql_stmt and returns an array of dictionaries (one per row) into result;
   attributes_names is the list of extracted attributes (the ones in SELECT ..., as interpreted by the server); if not None, attribute namea are appended to it, otherwise nothing is returned;
   """
   show("in select2dict(), dql_stmt=" + dql_stmt)
   return list(select_cor(session, dql_stmt, attr_name))

def select_cor(session, dql_stmt, attr_name = None):
   """
   execute the DQL SELECT statement passed in dql_stmt and return one row at a time;
   coroutine version;
   if the optional attributes_names is not None, it contains an appended list of attributes returned by the result set, otherwise no names are returned;
   return True if OK, False otherwise;
   """
   show("in select_cor(), dql_stmt=" + dql_stmt)

   status = False
   try:
      query_id = dmAPIGet("query," + session + "," + dql_stmt)
      if query_id is None:
         raise(getOutOfHere)

      # iterate through the result set;
      row_counter = 0
      if None == attr_name:
         attr_name = []
      width = {}
      while dmAPIExec("next," + session + "," + query_id):
         result = {}
         nb_attrs = dmAPIGet("count," + session + "," + query_id)
         if nb_attrs is None:
            show("Error retrieving the count of returned attributes: " + dmAPIGet("getmessage," + session))
            raise(getOutOfHere)
         nb_attrs = int(nb_attrs) 
         for i in range(nb_attrs):
            if 0 == row_counter:
               # get the attributes' names only once for the whole query;
               value = dmAPIGet("get," + session + "," + query_id + ",_names[" + str(i) + "]")
               if value is None:
                  show("error while getting the attribute name at position " + str(i) + ": " + dmAPIGet("getmessage," + session))
                  raise(getOutOfHere)
               attr_name.append(value)
               if value in width:
                  width[value] = max(width[attr_name[i]], len(value))
               else:
                  width[value] = len(value)

            is_repeating = dmAPIGet("repeating," + session + "," + query_id + "," + attr_name[i])
            if is_repeating is None:
               show("error while getting the arity of attribute " + attr_name[i] + ": " + dmAPIGet("getmessage," + session))
               raise(getOutOfHere)
            is_repeating = int(is_repeating)

            if 1 == is_repeating:
               # multi-valued attributes;
               result[attr_name[i]] = []
               count = dmAPIGet("values," + session + "," + query_id + "," + attr_name[i])
               if count is None:
                  show("error while getting the arity of attribute " + attr_name[i] + ": " + dmAPIGet("getmessage," + session))
                  raise(getOutOfHere)
               count = int(count)

               for j in range(count):
                  value = dmAPIGet("get," + session + "," + query_id + "," + attr_name[i] + "[" + str(j) + "]")
                  if value is None:
                     value = "null"
                  #result[row_counter] [attr_name[i]].append(value)
                  result[attr_name[i]].append(value)
            else:
               # mono-valued attributes;
               value = dmAPIGet("get," + session + "," + query_id + "," + attr_name[i])
               if value is None:
                  value = "null"
               width[attr_name[i]] = len(attr_name[i])
               result[attr_name[i]] = value
         if 0 == row_counter:
            show(attr_name.append)
         yield result
         row_counter += 1
      err_flag = dmAPIExec("close," + session + "," + query_id)
      if not err_flag:
         show("Error closing the query collection: " + dmAPIGet("getmessage," + session))
         raise(getOutOfHere)

      status = True

   except getOutOfHere:
      show(dmAPIGet("getmessage," + session).rstrip())
      status = False
   except Exception as e:
      print("Exception in select2dict():")
      print(e)
      traceback.print_stack()
      status = False
   finally:
      return status

def select(session, dql_stmt, attribute_names):
   """
   execute the DQL SELECT statement passed in dql_stmt and outputs the result to stdout;
   attributes_names is a list of attributes to extract from the result set;
   return True if OK, False otherwise;
   """
   show("in select(), dql_stmt=" + dql_stmt)
   try:
      query_id = dmAPIGet("query," + session + "," + dql_stmt)
      if query_id is None:
         raise(getOutOfHere)

      s = ""
      for attr in attribute_names:
         s += "[" + attr + "]\t"
      print(s)
      resp_cntr = 0
      while dmAPIExec("next," + session + "," + query_id):
         s = ""
         for attr in attribute_names:
            value = dmAPIGet("get," + session + "," + query_id + "," + attr)
            if "r_object_id" == attr and value is None:
               raise(getOutOfHere)
            s += "[" + (value if value else "None") + "]\t"
            show(str(resp_cntr) + ": " + s)
         resp_cntr += 1
      show(str(resp_cntr) + " rows iterated")

      err_flag = dmAPIExec("close," + session + "," + query_id)
      if not err_flag:
         raise(getOutOfHere)

      status = True
   except getOutOfHere:
      show(dmAPIGet("getmessage," + session).rstrip())
      status = False
   except Exception as e:
      print("Exception in select():")
      print(e)
      traceback.print_stack()
      print(resp_cntr); print(attr); print(s); print("[" + value + "]")
      status = False
   finally:
      show("exiting select()")
      return status

def walk_group(session, root_group, level, result):
   """
   recursively walk a group hierarchy with root_group as top parent;
   """

   try:
      root_group_id = dmAPIGet("retrieve," + session + ",dm_group where group_name = '" + root_group + "'")
      if 0 == level:
         if root_group_id is None:
            show("Cannot retrieve group [" + root_group + "]:" + dmAPIGet("getmessage," + session))
            raise(getOutOfHere)
      result[root_group] = {}

      count = dmAPIGet("values," + session + "," + root_group_id + ",groups_names")
      if "" == count:
         show("error while getting the arity of attribute groups_names: " + dmAPIGet("getmessage," + session))
         raise(getOutOfHere)
      count = int(count)

      for j in range(count):
         value = dmAPIGet("get," + session + "," + root_group_id + ",groups_names[" + str(j) + "]")
         if value is not None:
            walk_group(session, value, level + 1, result[root_group])

   except getOutOfHere:
      show(dmAPIGet("getmessage," + session).rstrip())
   except Exception as e:
      print("Exception in walk_group():")
      print(e)
      traceback.print_stack()

def disconnect(session):
   """
   closes the given session;
   returns True if no error, False otherwise;
   """
   show("in disconnect()")
   try:
      status = dmAPIExec("disconnect," + session)
   except Exception as e:
      print("Exception in disconnect():")
      print(e)
      traceback.print_stack()
      status = False
   finally:
      show("exiting disconnect()")
      return status

Highlighted are the main changed lines that added a generator-based select() function yielding a dictionary for each row from the result set and changed select2dict() to use it.

Cet article Two techniques for cloning a repository filestore, part I est apparu en premier sur Blog dbi services.

Pages