-
Notifications
You must be signed in to change notification settings - Fork 20
ESGFNode|ConfiguringMetricsService
Wiki Reorganisation |
---|
This page has been classified for reorganisation. It has been given the category REVISE. |
This page contains useful content but needs revision. It may contain out of date or inaccurate content. |
The ESGF metrics service has the following components:
- Access logging filter: adds logs to the metrics database.
- Access log service: retrieves logs from the access log database.
- Metrics database: implemented in the node database.
The TDS access logging filter intercepts file accesses to the TDS webapp, and logs the access to the metrics database. The filter is implemented in the jar files in $CATALINA_HOME/webapps/thredds/WEB-INF/lib:
esgf-node-manager-common-<version>.jar
esgf-node-manager-filters-<version>.jar
commons-dbutils-<version>.jar
commons-dbcp-<version>.jar
commons-pool-<version>.jar
The following configuration should be placed in $CATALINA_HOME/webapps/thredds /WEB-INF/web.xml, immediately after the authorization filter configuration (the ordering is important). Replace the values of DATABASE_USER and DATABASE_PASSWORD for your postgres database. Note: it's a good idea to use the same user / password that created the metrics database (see esgf_node_manager_initialize below):
<!-- -->
<!-- web.xml entry for the esg node filter -->
<!-- -->
<filter>
<filter-name>AccessLoggingFilter</filter-name>
<filter-class>esg.node.filters.AccessLoggingFilter</filter-class>
<init-param>
<param-name>db.driver</param-name>
<param-value>org.postgresql.Driver</param-value>
</init-param>
<init-param>
<param-name>db.protocol</param-name>
<param-value>jdbc:postgresql:</param-value>
</init-param>
<init-param>
<param-name>db.host</param-name>
<param-value>localhost</param-value>
</init-param>
<init-param>
<param-name>db.port</param-name>
<param-value>5432</param-value>
</init-param>
<init-param>
<param-name>db.database</param-name>
<param-value>esgcet</param-value>
</init-param>
<init-param>
<param-name>db.user</param-name>
<param-value>DATABASE_USER</param-value>
</init-param>
<init-param>
<param-name>db.password</param-name>
<param-value>DATABASE_PASSWORD</param-value>
</init-param>
<init-param>
<param-name>extensions</param-name>
<param-value>.nc</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>AccessLoggingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
Note: The parameters - as defined in elements above - may also be defined in the ESGF properties file $ESG_HOME/esgf.properties ($ESG_HOME defaults to /esg). Parameters defined in web.xml override values in esgf.properties.
The access log service is part of the esgf-node-manager webapp. It is distributed as a WAR file: esgf-node-manager.war, and may be downloaded and installed with the esg-node installer script. The service should be configured to limit visibility to selected users (see below).
A Python client is provided. It is also installed with esg-node or, if the Python easy_install script is available (e.g., if the ESG publisher package is already installed):
easy_install -f http://www-pcmdi.llnl.gov/dist/externals 'esgf_node_manager>=0.2.0'
The client requires a proxy certificate as obtained from myproxy-logon. To run:
esgf_accesslog --service-url https://host.domain.gov/esgf-node-manager/accesslog starttime:endtime
The esgf_node_manager_initialize script creates the database schema for the metrics service. Note: use the same DATABASE_USER and DATABASE_PASSWORD as in the TDS filter configuration:
esgf_node_manager_initialize -c --dburl myname:[email protected]:5432/esgcet
The access log service must be secured to limit the visibility of the metrics logs to selected users. All files should be owned by the user that runs tomcat, usually 'tomcat'.
-
Add a new resource to conf/server.xml: in the GlobalNamingResources element add:
<Resource name="MetricsUserDatabase" auth="Container" type="org.apache.catalina.UserDatabase" description="User database that can be updated and saved" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" pathname="conf/metrics-users.xml" />
-
List the distinguished names (DNs) of users who can read the metrics accesslog service, in conf/metrics-users.xml. Note the ordering of the DN components is important. Tomcat must be restarted for any changes to take effect.
?xml version='1.0' encoding='utf-8'?>
-
Use the metrics-users.xml database for this webapp. Add the file conf/Catalina/localhost/esgf-node-manager.xml:
<?xml version="1.0" encoding="UTF-8"?>
-
When the service is called, use client authentication over SSL. At the end of webapps/esgf-node-manager/WEB-INF/web.xml:
CLIENT-CERT Access Logger metrics<security-constraint> <web-resource-collection> <web-resource-name>Access Logger</web-resource-name> <url-pattern>/accesslog/*</url-pattern> </web-resource-collection> <auth-constraint> <role-name>metrics</role-name> </auth-constraint> <user-data-constraint> <transport-guarantee>CONFIDENTIAL</transport-guarantee> </user-data-constraint>
The following query can be used to get user, file and volume download metrics:
SELECT
EXTRACT (YEAR FROM (TIMESTAMP WITH TIME ZONE 'epoch' + fixed_log.date_fetched * INTERVAL '1 second')) as year,
EXTRACT (MONTH FROM (TIMESTAMP WITH TIME ZONE 'epoch' + fixed_log.date_fetched * INTERVAL '1 second')) as month,
count(*) as downloads,
count(distinct url) as files,
count(distinct user_id_hash) as users,
to_char(sum(fixed_log.size)/1024/1024/1024, '9,999,999.99') as gb
FROM (
SELECT
file.url,
log.user_id_hash,
max(log.date_fetched) as date_fetched,
max(file.size) as size
FROM
esgf_node_manager.access_logging as log join
file_version as file on (log.url LIKE '%.nc' AND regexp_replace(log.url, E'^.*/(cmip5/.*\.nc)$', E'\\1') = file.url)
where log.success and log.duration > 1000
group by file.url,log.user_id_hash
) as fixed_log
group by year,month order by year,month;