Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v6r17] Introduce meta-filters in TS #3180

Merged
merged 8 commits into from
Nov 30, 2016

Conversation

arrabito
Copy link
Contributor

This PR is about the introduction of 'meta-filters' in the TS.
All the details are explained in the forum:
https://groups.google.com/forum/?hl=en#!topic/diracgrid-develop/BAQYOgLNCAI
As I explained in my last post, even if I already have in mind some improvements on the code, it should be safe to merge it, since the new functionality is optional. Moreover, it has been already successfully tested in the production setup of CTA. Also unit and integration tests pass.

@arrabito arrabito changed the title introduce meta-filters in TS [v6r17] Introduce meta-filters in TS Nov 21, 2016
@arrabito
Copy link
Contributor Author

With the last commit I've moved the logic from the client to the server side as explained in the first point of the work plan:
https://groups.google.com/forum/?hl=en#!topic/diracgrid-develop/BAQYOgLNCAI

The integration test still works, but I've removed the unit tests introduced with the first commit, since they are not valid anymore.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 16.877% when pulling 46bd2e9 on arrabito:TSwithFilters into bf52c4c on DIRACGrid:rel-v6r17.

@arrabito
Copy link
Contributor Author

arrabito commented Nov 23, 2016

Concerning the three other points of the work plan, detailed here:
https://groups.google.com/forum/?hl=en#!topic/diracgrid-develop/BAQYOgLNCAI
id est:
2. "Changing FileMetadata"
I agree that things should be kept as simple as possible. So I would say that we could keep the current implementation, where once a file is attached to a transformation, it remains attached to it, until the transformation is deleted.

  1. "Add the possibility to create a transformation without any inputdataquery and to add an inputdataquery afterwards"
    I've verified that it's already the case. Here below a working example:

from DIRAC.TransformationSystem.Client.Transformation import Transformation
transformation = Transformation()
MDdict1b = {'particle':'gamma_diffuse', 'zenith':{"<=": 20}}
mqJson1b = json.dumps( MDdict1b )
.....
transformation.setFileMask(mqJson1b)
transformation.addTransformation()

  1. "Collect your suggestion about how avoiding multiple FileCatalog instantiations"
    After having moved the logic of the methods with filters to the server side (TransformationDB), the FileCatalog is not instantiated anymore in the TransformationClient. However, it's instanciated in the TransformationDB, as it was already the case for instance for the addDirectory method.

So, in principle the workplan is completed. Let me know your comments.

WRITE_METHODS = FileCatalogClientBase.WRITE_METHODS + [ "addFile", "removeFile" ]
WRITE_METHODS = FileCatalogClientBase.WRITE_METHODS + [ "addFile", "removeFile", "setMetadata" ]

NO_LFN_METHODS = [ "setMetadata" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is setMetadata in NO_LFN_METHODS ? It takes an LFN as first argument no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I'm not sure about the definition of 'NO_LFN_METHODS', I've put it here in analogy with the FileCatalogClient.

Copy link
Contributor

@chaen chaen Nov 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in the doc of FileCatalog:

    The names of those methods are reported by the plug-ins as "no_lfn" methods
    in the getInterfaceMethods() call. For those methods there is obviously no
    additional check of the structure of the LFNs argument and no corresponding
    processing of the results ```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but anyway, we should be coherent between the different catalog plugins, shouldn't we?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are both right. The setMetadata should be a normal "Write" method following LFN conventions. This is not like that right now in FileCatalogClient. I suggest to merge this PR as it is and in the next release review the metadata related methods consistently for all the catalogs. I will make an issue on that in order not to forget it.

if addFiles and fileMask:
self.__addExistingFiles( transID, connection = connection )
mqDict = json.loads( fileMask )
res = catalog.findFilesByMetadata( mqDict )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should test res['OK']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit: 36b3a10

# Add the files to the transformations
fileIDs = []
for lfn in filesToAdd:
if lfnFileIDs.has_key( lfn ):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if lfn in lfnFileIDs
Even better and much faster, make a set intersection between filesToAdd and lfnFileIDs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -479,6 +494,8 @@ def __deleteTransformationParameters( self, transID, parameters = None, connecti

def addFilesToTransformation( self, transName, lfns, connection = False ):
""" Add a list of LFNs to the transformation directly """
gLogger.info( "TransformationDB.addFilesToTransformation: Attempting to add %s files." % lfns )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(lfns) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

res = catalog.getFileUserMetadata( lfn )
if not res['OK']:
failed[lfn] = res['Message']
return S_OK( {'Successful':successful, 'Failed':failed } )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you return after the first failure ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Changed with:
if not res['OK']:
gLogger.error( "Failed to getFileUserMetadata for file", "%s: %s" % ( lfn, res['Message'] ) )
failed[lfn] = res['Message']
continue

gLogger.info( "setMetadata: Attempting to set metadata %s to %s" % (usermetadatadict, path) )
successful = {}
failed = {}
if isinstance( path, dict ):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you are trying to achieve with this test ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found why I had introduced this test... I had some type mismatch problem, because I was using:
@checkCatalogArguments
with
def setMetadata( self, path, metadatadict ):
in Resources/Catalog/TSCatalogClient

So you are right, the test is not necessary.

filesToAdd = []

catalog = FileCatalog()
isFile = catalog.isFile( path )['Value']['Successful'][path]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either test res['OK'] or use the .get syntax


if not res['OK']:
failed[path] = res['Message']
return S_OK( {'Successful':successful, 'Failed':failed } )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand now. You only take the first element of the directory, but still return it as a multi input structure ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand your question. But one thing I have to check, is that I was assuming that the corresponding setMetadata method in the FileCatalogDB takes as input either a directory, either a single file, but not a list of files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is my question. But you only treat one element in our method, and return a dictionary of successful / failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the method treats one element, as for the FileCatalogDB, which is either a file, either a directory. The reason why the method returns a dictionary of successful / failed is that, the effect of setMetadata in this case is to add to the transformations all the files that match the query conditions of the transformations. So the effect of setMetadata is on a set of files...
Does it make sense in this case to return a successful / failed dictionary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, because the Failed/Successful dict are indexed on the input path. Since there is a single one, I don't think it is worth having this extra level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've just changed it in commit:
9faad26
with some other minor changes.

metadatadict.update( usermetadatadict )
gLogger.info( 'Filter file with metadata:', metadatadict )
fileTrans = self._filterFileByMetadata( metadatadict )
if not ( fileTrans ):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parenthesis are not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

path = [path]
else:
res = catalog.findFilesByMetadata( metadatadict, path )
path = res['Value']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should test res['OK']

successful[path] = False
elif isFile:
filesToAdd.append( path )
path = [path]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you do that ?

# Add the files to the transformations
gLogger.info( 'Files to add to transformations:', filesToAdd )
if filesToAdd:
for transID, lfns in transFiles.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iteritems

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

for transID, lfns in transFiles.items():
res = self.addFilesToTransformation( transID, lfns )
if not res['OK']:
for lfn in lfns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not returning res ?

failed[lfn] = res['Message']
else:
for lfn in lfns:
successful[lfn] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A speedup:
successful = dict.fromkeys(lfns, True)

Copy link
Contributor

@chaen chaen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about the logic of the implementation, but I just added a couple of coding suggestions

typeDict = res['Value']['FileMetaFields']
typeDict.update( res['Value']['DirectoryMetaFields'] )

for transID, query in queries:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queries.iteritems ? Or is queries a list of tuples ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a list of tuples

gLogger.info( "Apply query %s to metadata %s" % ( mq.getMetaQuery(), metadatadict ) )
res = mq.applyQuery(metadatadict)
if not res['OK']:
gLogger.error( "Error in applying query: %s" % res['Message'] )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return res ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Done in commit: 36b3a10.

res = mq.applyQuery(metadatadict)
if not res['OK']:
gLogger.error( "Error in applying query: %s" % res['Message'] )
elif res['Value'] == True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== True is not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

CatalogURL = Transformation/TransformationManager
}

...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally this test was supposed to be for the "TransformationSystem only". With the test that you are adding here it supposes instead that also the DFC is up and running. I think you should move this new test in a separate module (this would also simplify the management of CI infrastructure)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I've put it in a separate module. Please check that the module name and the location are appropriate or suggest a new one.

self.fc = FileCatalog()
self.dm = DataManager()
self.metaCatalog = 'DIRACFileCatalog'
gLogger.setLevel( 'INFO' )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside a test, please always use "DEBUG"

@@ -479,6 +494,8 @@ def __deleteTransformationParameters( self, transID, parameters = None, connecti

def addFilesToTransformation( self, transName, lfns, connection = False ):
""" Add a list of LFNs to the transformation directly """
gLogger.info( "TransformationDB.addFilesToTransformation: Attempting to add %s files." % lfns )
gLogger.info( "TransformationDB.addFilesToTransformation: to Transformations: %s" % transName )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you want to merge the last 2 gLogger calls

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.04%) to 16.864% when pulling 7a51fc1 on arrabito:TSwithFilters into bf52c4c on DIRACGrid:rel-v6r17.

@fstagni fstagni changed the title [v6r17] Introduce meta-filters in TS [WIP] [v6r17] Introduce meta-filters in TS Nov 28, 2016
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 16.871% when pulling 24b3b96 on arrabito:TSwithFilters into bf52c4c on DIRACGrid:rel-v6r17.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 16.871% when pulling 24b3b96 on arrabito:TSwithFilters into bf52c4c on DIRACGrid:rel-v6r17.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 16.871% when pulling 7bd8959 on arrabito:TSwithFilters into bf52c4c on DIRACGrid:rel-v6r17.

@fstagni
Copy link
Contributor

fstagni commented Nov 28, 2016

IMHO this looks OK. I let Chris make the last comments, if any. I remove the "WIP".

@fstagni fstagni changed the title [WIP] [v6r17] Introduce meta-filters in TS [v6r17] Introduce meta-filters in TS Nov 28, 2016
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 16.872% when pulling 9faad26 on arrabito:TSwithFilters into bf52c4c on DIRACGrid:rel-v6r17.

@fstagni
Copy link
Contributor

fstagni commented Nov 29, 2016

Ok Luisa I think that code-wise there are no more big changes to do. The only thing to add now would be documentation, still in this PR.

@arrabito
Copy link
Contributor Author

OK, do you mean some documentation about how to use this functionality (and how it works) to be added in:
http://dirac.readthedocs.io/en/rel-v6r15/AdministratorGuide/Systems/Transformation/index.html
?

@fstagni
Copy link
Contributor

fstagni commented Nov 29, 2016

Yes, for example.

@arrabito
Copy link
Contributor Author

Ok. Just one question, I don't remember exactly why, but the doc at the url:
http://dirac.readthedocs.io/en/rel-v6r15/AdministratorGuide/Systems/Transformation/index.html
or even:
http://dirac.readthedocs.io/en/rel-v6r16/AdministratorGuide/Systems/Transformation/index.html
is slightly different from the one in the master branch in DIRACDocs github. Should I anyway start from the this last one?

@fstagni
Copy link
Contributor

fstagni commented Nov 29, 2016 via email

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 16.871% when pulling 41ff42c on arrabito:TSwithFilters into bf52c4c on DIRACGrid:rel-v6r17.

@arrabito
Copy link
Contributor Author

OK. Thank you. I've added the new documentation and tried to syncronizing the two branches.

@fstagni
Copy link
Contributor

fstagni commented Nov 30, 2016

Very good documentation. Review OK.

@atsareg atsareg merged commit e941862 into DIRACGrid:rel-v6r17 Nov 30, 2016
@arrabito arrabito deleted the TSwithFilters branch August 8, 2017 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants