Full-blown importer stack for importing almost any data into your application. Can be used for exports, too.
This library implements some high-level functionality based on the great Ddeboer Data Import library. As the Data-Import library offers a great toolkit to implement a data import/export process in a quick and clean way, there is still a lot of work to do to have a full blown importer set up for your application. This library helps you with that.
If you are developing a Symfony2 project, you may want to use the comfortable configuration of the ImportEngineBundle.
- A Storage Abstraction-layer that supports nice features like automatic delimiter-discovering or processing compressed files. Currently these storages are supported:
- Structured Files
- CSV, XML, JSON, Excel
- File may be compressed
- Doctrine2 queries
- Service endpoints
- Structured Files
- Storage Provisioning. Provide a list of possible storage-containers for your import. I.e. local files, remote files, uploaded files, database connections, service endpoints and more.
- A mapping sub-system, for building various mappings for your import: field-field, field-converter-field, field-converter-object and more.
- Automatic mapping into object tree's using the JMSSerializer
- Source (read) and Target (write) validation using Symfony Validation. Annotations can be used.
- Integrated Eventsystem using Symfony EventDispatcher
- Keeping almost every flexibility that is offered by the Ddeboer Data Import library.
- Well-tested code.
This library is available on Packagist. The recommended way to install it is through Composer:
$ composer require mathielen/import-engine
Then include Composer’s autoloader:
require_once 'vendor/autoload.php';
If you want to make use of excel files, please also make sure to include phpoffice/phpexcel in your project:
$ composer require phpoffice/phpexcel
Using the *Provider facilities enables you to let the importer-system figure out what format the file has and what abstraction-classes should be used.
$service = new TestEntities\Dummy(); //your domain service
$fileStorageProvider = new Mathielen\ImportEngine\Storage\Provider\FileStorageProvider();
$storageSelection = $fileStorageProvider->select('tests/metadata/testfiles/flatdata.csv');
$sourceStorage = $fileStorageProvider->storage($storageSelection);
$targetStorage = new Mathielen\ImportEngine\Storage\ServiceStorage(array($service, 'onNewData'));
$importer = Mathielen\ImportEngine\Importer\Importer::build($targetStorage);
$import = Mathielen\ImportEngine\Import\Import::build($importer, $sourceStorage);
$importRunner = new Mathielen\ImportEngine\Import\Run\ImportRunner();
$importRunner->run($import);
Have a look at: https://github.com/mathielen/import-engine/tree/master/tests/functional/Mathielen/ImportEngine
- An Importer is the basic definition of the whole import-process. It says what may be imported and where to. It consists of:
- (optional) A StorageProvider, that represents a "virtual file system" for selecting a SourceStorage
- (optional) A SourceStorage that may be a file, a database table, an array, an object-tree, etc
- A TargetStorage that may be a file, a database table, an array, an object-tree, etc
- A Mapping, which may contain converters, field-mappings, etc
- A Validation, that may contain validation-rules for data read from the SourceStorage and/or validation-rules for data that will be written to the TargetStorage.
- An Eventsystem for implementing detailed Logging or other interactions within the process.
- An Import is a specific definition of the import-process. It uses the Importer and has all the specific informations that is mandatory for processing the data. That is a specific SourceStorage and a Mapping.
- The ImportRunner is used to process the Import.
- Every run of an Import is represented by an ImportRun
StorageProviders represents a "virtual file system" for selecting a SourceStorage that can be used as a source or target of the import.
Using the Symfony Finder Component as a collection of possible files that can be imported.
use Symfony\Component\Finder\Finder;
use Mathielen\ImportEngine\Storage\Provider\FinderFileStorageProvider;
$finder = Finder::create()
->in('tests/metadata/testfiles')
->name('*.csv')
->name('*.tab')
->size('>0K')
;
$ffsp = new FinderFileStorageProvider($finder);
You can use specific Doctrine Queries or only Entity-Classnames (the query will be SELECT * FROM then) as possible Source-Storages.
use Symfony\Component\Finder\Finder;
use Mathielen\ImportEngine\Storage\Provider\DoctrineQueryStorageProvider;
$em = ... //Doctrine2 EntityManager
$qb = $em->createQueryBuilder()
->select('a')
->from('MySystem\Entity\Address', 'a')
->andWhere('a.id > 10')
;
$queries = array(
'MySystem/Entity/MyEntity',
$qb
);
$desp = new DoctrineQueryStorageProvider($em, $queries);
You can use a Provider to facilitate a File-Upload.
use Mathielen\ImportEngine\Storage\Provider\UploadFileStorageProvider;
$ufsp = new UploadFileStorageProvider('/tmp'); //path to where the uploaded files will be transferred to
FileStorageProviders may use StorageFactories for constructing Storage objects. By default the FormatDiscoverLocalFileStorageFactory is used. This StorageFactory uses a MimeTypeDiscoverStrategy to determine the mime-type of the selected file and use it to create the correct storage-handler. You can change this behavior or extend it. There is a CsvAutoDelimiterTypeFactory that you can use to automaticly guess the correct delimiter of a CSV file.
use Mathielen\ImportEngine\Storage\Format\Factory\CsvAutoDelimiterFormatFactory;
use Mathielen\ImportEngine\Storage\Factory\FormatDiscoverLocalFileStorageFactory;
use Mathielen\ImportEngine\Storage\Format\Discovery\MimeTypeDiscoverStrategy;
$ffsp = ...
$ffsp->setStorageFactory(
new FormatDiscoverLocalFileStorageFactory(
new MimeTypeDiscoverStrategy(array(
'text/plain' => new CsvAutoDelimiterFormatFactory()
))));
This way any file that has the text/plain mime-type will be passed to the CsvAutoDelimiterFormatFactory to determine the delimiter.
A storage is a container of data. Storages provide a reader and writer implementation for itself.
use Mathielen\ImportEngine\Storage\ArrayStorage;
use Mathielen\ImportEngine\Storage\DoctrineStorage;
use Mathielen\ImportEngine\Storage\LocalFileStorage;
use Mathielen\ImportEngine\Storage\Format\CsvFormat;
$em = ... //Doctrine2 EntityManager
$array = array(1,2,3);
$storage = new ArrayStorage($array);
$storage = new DoctrineStorage($em, 'MyEntities\Entity');
$storage = new LocalFileStorage('tests/metadata/testfiles/flatdata.csv', new CsvFormat());
$storage = new ServiceStorage(array($service, 'myMethod')); //callable
You can get the source and target validation errors with:
$import = ...
$import->importer()->validation()->getViolations();
use Mathielen\ImportEngine\Validation\ValidatorValidation;
use Mathielen\DataImport\Filter\ClassValidatorFilter;
use Symfony\Component\Validator\Constraints\NotBlank;
use Symfony\Component\Validator\Constraints\Regex;
$validator = ... //Symfony Validator
$validation = ValidatorValidation::build($validator)
->addSourceConstraint('salutation', new NotBlank()) //source field 'salutation' should not be empty
->addSourceConstraint('zipcode', new Regex("/[0-9]{5}/")) //source field 'zipcode' should be 5 digits
;
You can use the ClassValidatorFilter to map the data to an object-tree and validate the objects (using annotations, or differently configurated validation rules). Therefore you must provide an ObjectFactory. There is a JmsSerializerObjectFactory you may want to use.
use Mathielen\ImportEngine\Validation\ValidatorValidation;
use Mathielen\DataImport\Filter\ClassValidatorFilter;
use Mathielen\DataImport\Writer\ObjectWriter\JmsSerializerObjectFactory;
$validator = ... //Symfony Validator
$jms_serializer = ...
$objectFactory = new JmsSerializerObjectFactory(
'Entity\Address',
$jms_serializer);
$validation = ValidatorValidation::build($validator)
->setTargetValidatorFilter(new ClassValidatorFilter($validator, $objectFactory));
use Mathielen\ImportEngine\Importer\Importer;
use Mathielen\ImportEngine\Storage\ArrayStorage;
$ffsp = ...
$validation = ...
$targetStorage = ...
$array = array(1,2,3);
$importer = Importer::build($targetStorage)
->setSourceStorage(new ArrayStorage($array))
->validation($validation)
;
You can either use a StorageProvider (see above) and set the selection-id or you can use a specific Storage-Handler directly:
use Mathielen\ImportEngine\Storage\ArrayStorage;
use Mathielen\ImportEngine\Storage\LocalFileStorage;
use Mathielen\ImportEngine\Import\Import;
use Mathielen\ImportEngine\Importer\Importer;
use Mathielen\ImportEngine\Storage\Format\CsvFormat;
$targetArray = array();
$importer = Importer::build(new ArrayStorage($targetArray));
$import = Import::build(
$importer,
new LocalFileStorage(new \SplFileObject(__DIR__ . '/../../../metadata/testfiles/flatdata.csv'), new CsvFormat())
);
Also see orginal documentation here.
$import = ...
$import->mappings()
->add('foo', 'fooloo')
->add('baz', array('some' => 'else'));
;
There are a some field-level build-in converters available:
- upperCase
- lowerCase
- @TODO
$import = ...
$import->mappings()
->add('SALUTATION_FIELD', 'salutation', 'upperCase')
;
You have to register more complex converters to the importer for selecting them in your import.
use Mathielen\ImportEngine\Mapping\Converter\Provider\DefaultConverterProvider;
use Ddeboer\DataImport\ValueConverter\CallbackValueConverter;
use Mathielen\ImportEngine\Import\Import;
use Mathielen\ImportEngine\Storage\ArrayStorage;
use Mathielen\ImportEngine\Importer\Importer;
$converterProvider = new DefaultConverterProvider();
$converterProvider
->add('salutationToGender', new CallbackValueConverter(function ($item) {
switch ($item) {
case 'Mr.': return 'male';
case 'Miss':
case 'Mrs.': return 'femaile';
}
}));
$targetStorage = ...
$importer = Importer::build($targetStorage);
$importer
->transformation()
->setConverterProvider($converterProvider);
$array = array();
$import = Import::build($importer, new ArrayStorage($array))
->mappings()
->add('salutation', 'gender', 'salutationToGender')
;
Like the fieldlevel converters, you have to register your converters first.
use Mathielen\ImportEngine\Mapping\Converter\Provider\DefaultConverterProvider;
use Ddeboer\DataImport\ItemConverter\CallbackItemConverter;
use Mathielen\ImportEngine\Import\Import;
use Mathielen\ImportEngine\Storage\ArrayStorage;
use Mathielen\ImportEngine\Importer\Importer;
$converterProvider = new DefaultConverterProvider();
$converterProvider
->add('splitNames', new CallbackItemConverter(function ($item) {
list($firstname, $lastname) = explode(' ', $item['name']);
$item['first_name'] = $firstname;
$item['lastname'] = $lastname;
return $item;
}));
$targetStorage = ...
$importer = Importer::build($targetStorage);
$importer
->transformation()
->setConverterProvider($converterProvider);
$array = array();
$import = Import::build($importer, new ArrayStorage($array))
->mappings()
->add('fullname', null, 'splitNames')
;
For running a configured Import you need an ImportRunner. Internally the ImportRunner builds a workflow and runs it. You can change the way how the workflow is built by supplying a different WorkflowFactory.
use Symfony\Component\EventDispatcher\EventDispatcher;
use Mathielen\ImportEngine\Import\Run\ImportRunner;
use Mathielen\ImportEngine\Import\Workflow\DefaultWorkflowFactory;
use Mathielen\ImportEngine\ValueObject\ImportConfiguration;
use Mathielen\ImportEngine\Storage\LocalFileStorage;
use Mathielen\ImportEngine\Storage\Format\CsvFormat;
use Mathielen\ImportEngine\Importer\ImporterRepository;
$import = ...
$importRunner = new ImportRunner(new DefaultWorkflowFactory(new EventDispatcher()));
//sneak peak a row
$previewData = $importRunner->preview($import);
//dont really write, just validate
$importRun = $importRunner->dryRun($import);
//do the import
$importRun = $importRunner->run($import);
If you use the DefaultWorkflowFactory with your ImportRunner you get basic statistics from dryRun() and run() invocations.
$importRun = ...
$importRunner = ...
$importRunner->dryRun($import);
$stats = $importRun->getStatistics();
/*
Array
(
[processed] => 1
[written] => 1
[skipped] => 0
[invalid] => 0
)
*/
You can interact with the running import via the Symfony Eventdispatcher.
use Symfony\Component\EventDispatcher\EventDispatcher;
use Mathielen\ImportEngine\Import\Run\ImportRunner;
use Mathielen\DataImport\Event\ImportProcessEvent;
use Mathielen\DataImport\Event\ImportItemEvent;
use Mathielen\ImportEngine\Import\Workflow\DefaultWorkflowFactory;
$myListener = function ($event) {
if ($event instanceof ImportItemEvent) {
$currentResult = $event->getCurrentResult(); //readonly access to current result in the process (might be false)
}
};
$eventDispatcher = new EventDispatcher();
$eventDispatcher->addListener(ImportProcessEvent::AFTER_PREPARE, $myListener);
$eventDispatcher->addListener(ImportItemEvent::AFTER_READ, $myListener);
$eventDispatcher->addListener(ImportItemEvent::AFTER_FILTER, $myListener);
$eventDispatcher->addListener(ImportItemEvent::AFTER_CONVERSION, $myListener);
$eventDispatcher->addListener(ImportItemEvent::AFTER_CONVERSIONFILTER, $myListener);
$eventDispatcher->addListener(ImportItemEvent::AFTER_VALIDATION, $myListener);
$eventDispatcher->addListener(ImportItemEvent::AFTER_WRITE, $myListener);
$eventDispatcher->addListener(ImportProcessEvent::AFTER_FINISH, $myListener);
$workflowFactory = new DefaultWorkflowFactory($eventDispatcher);
$importRunner = new ImportRunner($workflowFactory);
$import = ...
$importRunner->run($import);
Import-Engine is released under the MIT license. See the LICENSE file for details.