ALMA NA STE Maintenance Documentation (pragmatic version)

==

STE's in NA

  • STE: Standard test environment, this name refers to the ALMA environment, used in production though
  • GNS: General network services, in charge of the network services within the STE, this is managed by a DHCP service, provides "static" configuration and environment for all the ALMA STE
  • GAS: General ALMA services, where really the ALMA Software runs, each ALMA component and container are deployed in these machines
  • ARCH: Archive, where the ALMA Oracle DB is deployed
  • Support: Acts as a LDAP service and repository data
  • TMCDB: Telescope Monitoring Data Base, in charge of keeping the ALMASW configuration.

AOC STE

Those machine with IP addresses are accessible outside the STE network.
  • NRAO Network
    • AOC
      • fw: ste-its-pix.aoc.nrao.edu (146.88.7.17)
        • gns: ste-its.aoc.nrao.edu (146.88.7.16)
        • accessible within the STE
          • gas01, gas02, gas03, gas04, gas05, gas06
      • *alma-support.aoc.nrao.edu (146.88.7.33)
      • *alma-arch01.aoc.nrao.edu (146.88.7.15)
* Maintained by NRAO IT

CV STE

Those machine with IP addresses are accessible outside the STE network.
  • NRAO Network
    • CV
      • fw: ste-its-pix.cv.nrao.edu (10.12.97.88)
        • gns: ste-its.aoc.nrao.edu (10.12.97.89)
        • accessible within the STE
          • gas01, gas02, gas03, gas04, arch01, support
          • simulation nodes
            • cob-cc, cob-cdpm-dev (currently down), cob-cpn-01, cob-cpn-02, cob-cpn-03, cob-cpn-04
          • development nodes
            • cob-cc-dev, cob-cdpm-dev, cob-cpn-01-dev, cob-cpn-02-dev

Local NA Backup

Deployment

I strongly recommend doing this on GNS.

ACS

Download binaries

The official repository of ACS is located at ftp://ftp.eso.org/restricted/Releases/, this repository contains the ACS binaries built for different OS (RHEL55.5, RHEL6.5, RH6.6), the ACS binaries are conformed by 3 pieces of software packages as TAR files for x86 and x86_64 architectures:
  • ACS-{VERSION}-ExtProd-{DATE}-{ARCH}-{OS}: Contains all the needed thirdparty binaries.
  • ACS-{VERSION}-ARCHIVE-{DATE}-{ARCH}-{OS}: Contains the ARCHIVE subsystem binaries.
  • ACS-{VERSION}--{DATE}-{ARCH}-{OS}: The actual ACS binaries.
This snippet allows to download all the builds from e.g.: ACS-2015_2 in a couple of hours.
  • wget -r --user="user@login" --password="mypasswd" ftp://ftp.eso.org/restricted/Releases/ACS-2015_2

Deploy

After extracting the 3 tarballs (same os and arch) within the same directory, you will notice a directory named as "alma/ACS-2014.6_20141114-NOLGPL which contains the ACS installation, within the alma directory:
  • create a symlink as e.g.: ln -s ACS-2014.6_20141114-NOLGPL ACS-2014.6
  • tarball the merged alma/ with the ACS-2014.6_20141114-NOLGPL # to speed up, not compress the binary, just tar ball due is faster transfer 1 big file between machines rather than scp o rsync million of tiny files
  • copy the file the root STE account
  • untar the file and move the content to /alma/
  • there should be the /alma/RTI and /alma/casapy-{version} within /alma, by default there's a backup of these two directories, refer to the backup section.
    • within ACS-2014.6_20141114-NOLGPL or ACS-2014.6create a symlink for both directories:
      • ln -s ../RTI .
      • ln -s ../casa-{VERSION} casa
      • update casa data (with a period char at the end): cd casa/data && svn update alma catalogs ephemerides geodetic gui && svn revert -R alma catalogs ephemerides geodetic gui
  • to enable the ACS deployment, /alma/ACS-current should point to the desired version in all gas machines, e.g.: ACS-2014.6, this process must be done as root.
    • this must be done in all gns + gas* machines as root: cd /alma && rm ACS-current && ln -s ACS-{version} ACS-current
      • or just done at gns by executing: salt 'gas*' cmd.run 'cd /alma && rm /alma/ACS-current && ln -s ACS-{version} ACS-current'
  • rsync the current installation, e.g.: ACS-2014.6 on all gas machines: rsync -av --progress origin gas01:destination (this can vary depending if the destination folder was already created)

CV CORR Nodes

For CV corr nodes, e.g.: real time and diskless, the node can have another ACS/ALMASW installation done by the developers (the procedure is the same like the one described above), make sure that within the ACS folder rtai is pointing to /usr/realtime and rtlinux is pointing to /usr/src/kernels/linux, both /usr/realtime and usr/src/kernels/linux are contained within the ramdisk images but those are backed up as well.

Don't forget that diskless machines mount the ALMASW from GNS through NFS, alma, alma64, almabuild and almadev are mounted, to see what is mounted where execute in a terminal: mount | grep -i alma

This CV development environment is described at https://ictwiki.alma.cl/twiki/bin/view/Control/NtcDevelopmentTestEnvironment.

ALMASW

I strongly recommend doing this on GNS. IRM provides a twiki https://ictwiki.alma.cl/twiki/bin/view/IRM/ConfigChangesRelNotes to track all the configuration changes needed for a specific ALMA software deployment.

Deployment

as root:
  • Download the needed binary from http://buildfarm.osf.alma.cl:8181/download.html, as a root untar the ALMA-{BRANCH}-B-{DATE}.tar.bz2 in /alma/ (preserve permissions), the ALMASW will be installed in his ACS version due is packaged within his ACS folder. In Socorro builds are stored at alma-build.aoc.nrao.edu:export/home/alma-build/buildfarm_cl/{BRANCH}, downloaded daily basis from Chile.
  • for practical purposes, lets assume that the ACS version is 2015.02 and the build name is 2015-02-B-2015-02-16-01-00-00.tar.bz2, once uncompressed, the build will be located at /alma/ACS-2015.02/
    • the binaries are located in two folders:
      • ACSSW-2015-02-B-2015-02-16-01-00-00
      • acsdata-2015-02-B-2015-02-16-01-00-00
    • probably a ACSSW and acsdata are pointing to another ALMASW build, in this way ACS manage the version of the software used, the one pointing by ACSSW and acsdata is the currently used.
    • cat /alma/ste/etc/sitename contains the STE name, in this case lets assume that the location name ({SITE_NAME}) is AOC
    • configuration files:
      • these can be copied from a previous installation or from a backup to the wanted installation:
        • at acsdata/config
          • archiveConfig.properties.{SITE_NAME}: contains the necessary configuration to setup the connection to the archive/oracle instance
            • archiveConfig.properties and archiveConfig.properties.STE must point to this file (a symlink)
          • scheduling.properties.{SITE_NAME}: contains the necessary configuration to setup the connection to the archive/oracle instance for scheduling
            • scheduling.properties.{SITE_NAME} and scheduling.properties.STE must point to this file (a symlink)
          • tnsnames.ora: is the actual connection string in order to establish the connection to oracle
        • at ACSSW/config
          • ExecConfig -{SITE_NAME}.xml: The subsystem that OMC will show to startup.
    • Once everything is done at GNS for the wanted installation, at e.g.: /alma/ACS-2015.02/ (switch to the software version installed):
      • Remove previous symlinks: rm ACSSW && rm acsdata
      • Create the new symlinks:
        • ln -s ACSSW-2015-02-B-2015-02-16-01-00-00 ACSSW
        • ln -s acsdata-2015-02-B-2015-02-16-01-00-00 acsdata
    • as root@gns: rsync the current installation on all gas machines: rsync -av --progress origin gas01:destination (this can vary depending if the destination folder was already created)
      • e.g.: salt 'gas*' cmd.run 'rsync -a gns:/alma/ACS-{version} /alma/ACS-{version}' # this cmd will rsync all the gas machines against gns
    • confirm the java version to be used, at this point should be Java 1.7, the documentation to switch version is described at https://ictwiki.alma.cl/twiki/bin/view/Main/ACS12Java7Update
    • logout and login as a non root user, e.g: almamgr, the prompt will display the current version installed/deployed indicating the ACS, build version, antennas, the installed patches and the configuration name.

Useful CMD's and EnvVars

CMD's:
  • ALMA
    • showRunningVersion: get the version currently deployed
    • getHosts: list all the hosts configured for the STE
  • RHEL
    • uname -a
    • lsb_release -a
    • top
    • iostat
    • w
    • netstat -npl[t]
    • uptime
    • date
EnvVars
  • ALMA
    • ACSROOT: /alma/ACS-current/ACSSW
    • ACSDATA: /alma/ACS-current/acsdata
    • TMCDB_CONFIGURATION_NAME: the TMCDB configuration to use

Patches

  • as almamgr
    1. checkout the module
    2. generate the integration root: getTemplateForDirectory INTROOT <name_of_the_patch>
    3. expor the introot variable: export INTROOT=<absolute_path_of_the_introot_generated>
    4. resource the ACS profile in order to install the patch into the introot: source $ACSROOT/config/.acs/.bash_profile.acs
    5. Within in your module: "make build" for high level modules, if there's no "build" target in the makefile, do "make clean all install"
    6. go to the introot and edit the "responsible" file
    7. copy your introot into $ACSROOT/intlist/
    8. add the name of the introot to $ACSROOT/intlist
    9. logout and re login
    10. synchronize $ACSROOT/intlist across all STE GAS machines.

All patches will be catched by the system after a FSR, that means if a patch is installed when the system is online, a FSR is needed.

Database

Is often needed to update the ALMA database nor change some configuration in TMCDB these changes are needed by the software version.

Schema update

The procedure is very well documented at https://ictwiki.alma.cl/twiki/bin/view/CommonInf/OracleSchemaUpdateHistory, for the NA STE are applied all the deltas update til 2014-06, db passwords and users are configured in archiveConfig.properties file, e.g.: https://ictwiki.alma.cl/twiki/bin/view/CommonInf/OracleUpdateInstructions2015_01.

TMCDB

The name of the TMCDB configuration and startup scenario to use is defined at /alma/ste/etc/sites/tmcdb.{STE}.env ({STE} is the STE location name), before doing something in the database, is always good to have a copy of the database, a copy of ALMA-2014-06-AOC1_04152015.xml is already backed up. In order to do these backups, execute TMCDB explorer as almamgr within the STE, search the config and export it to a file, same procedure for restoring a backup.

Scheduling

Often not needed, but sometimes Scheduling needs to be updated as well, the update consist of drop the databases tables of the scheduling account (defined in scheduling.properties):
  • as oracle(user) at the archive machine (arch01 within CV STE or alma-arch01.aoc.nrao.edu)
    • connect first as root, then switch to oracle user: su - oracle
  • connect to the database: sqlplus / as sysdba
  • with the user and password defined in the scheduling.properties file: conn {scheduling schema}
  • execute:
    • select table_name from user_tables;
      • drop all the listed tables, is safe, these tables will be created automatically at runtime by the scheduling software
        • drop table {table name} cascade constraints

LDAP

Server side

The LDAP service is located at support at ste-corr.cv.nrao.edu for the CV STE and at alma-support.aoc.nrao.edu for the AOC STE, the configuration files are located at
  • /etc/openldap
  • /var/lib/ldap
  • /etc/ldap.conf
These configurations and the actual LDAP tree is backed up to 08.08.2015.

The daemon is executed as:
  • service ldap stop # can throw an error, that's fine
  • service ldap start
  • service ldap status
Client side

The configuration is located at /etc/ldap.conf, "uri" field support more than 1 ldap server, using the first available. Check that /etc/nsswitch,conf has configured "file ldap" for passwd, shadow, group, protocols, netgroup and automount fields.

Oracle

The Oracle machine, since isn't trivial to configure and recover possible failures related to the Oracle management a VM is provided as a backup solution (work in progress)
  • as root, these service should not show any errors:
    • Oracle web management interface: service emora { start | stop | restart | status } # optional
    • Oracle ASM daemon: service asmora { start | stop | restart | status }
    • Oracle DB: service dbora { start | stop | restart | status }
    • Oracle ASM kernel module: service oracleasm { start | stop | restart | status }
    • NGAS daemon: service ngamsServer1 { start | stop | restart | status }

STE Firewall

ALMA STE Firewall is a Cisco PIX 506E accessible through SSH for remote configuration, the ssh connection use the version 1 protocol (man ssh).

Administration interface:
  • at gns as root: minicom
  • enable # enter to administration mode
  • show interface # list the configured interfaces
  • show name # list the name aliases for ip addresses
  • show access-list {id} # list the the access-list, the actual rules
  • configure terminal # enter to the configuration terminal
  • write mem # to save the changes done
  • writer terminal # print the current configuration to the stdout
The configuration command line is like any Cisco PIX, how to remove, add rules, etc.. refer to the online Cisco documentation.

Useful and legacy resources

-- AlexisTejeda - 2015-04-17
Topic revision: r14 - 2016-02-04, AndyHale
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback