06-20-2023 05:23 AM - edited 04-30-2024 10:56 AM
The ContentConnector will dramatically reduce the complexity of database creation, data retrieval, and database maintenance processes. The products upload process is also automated for those customers that request content on specific products and utilize 1WS’s product mapping services.
In order to use the ContentConnector, the server that will host the application will need the following environment:
The ContentConnector requires at least JRE version 1.8 or higher.
ContentConnector-2.2.28 |
JDK 11.0.13 |
Compatible |
|
JDK 17.0.7 |
Compatible |
|
JDK 20.0.1 |
Compatible |
|
JDK 21.0.3 |
Compatible |
|
JDK 22.0.1 |
Compatible |
|
Open JDK 22.0.1 |
Compatible |
The ContentConnector currently supports most common installations:
It is recommended to schedule the ContentConnector to run at least twice a day at different times as the download.zip file delivery times vary for each distribution cycle . The ContentConnector automatically detects new data, and therefore only will perform the database maintenance and any related tasks as necessary.
Customers who maintain a defined catalog and will be requesting specific SKUs from 1WS (a.k.a. DataSource Pull Mode), are required to send specific product information to 1WS in the form of Manufacturer (MFC) and Product (PRC) files.
The ContentConnector automatically generates and transfers these files to the customer’s FTP account using a table called cds_sku_request, which is structured as follows:
Ref | Column Name | Type | Description |
1 | CustomerPartNumber | VARCHAR(40) | Customer part number |
2 | GTIN | VARCHAR(128) | UPC/EAN code (optional) |
3 | ManufacturerCode | VARCHAR(40) | Customer manufacturer code |
4 | ManufacturerName | VARCHAR(255) | Customer manufacturer name |
5 | ManufacturerPartNumber | VARCHAR(40) | Manufacturer part number |
6 | ProductDescription | VARCHAR(255) | Product description |
7 | IsActive | BIT | SKU status (1=active; 0=inactive) |
8 | RequestStatus | BIT | SKU request status (0=not yet requested; -1=requested) |
9 | FirstInserted | DATETIME | Date/time SKU was first inserted |
10 | Checksum | VARCHAR(32) | A hash code used to detect changes to columns 2-6 |
11 | FirstRequestDate | DATETIME | Date/time SKU was first requested |
12 | LastRequestDate | DATETIME | Date/time SKU was last requested |
The cds_sku_request table is only generated if the UploadProducts option is enabled in ContentConnector.xml, and the application is run in CREATE mode. This option is disabled by default.
The customer is responsible for inserting new records and updating existing records, and including values in fields 1-9. RequestStatus should be set to 0. FirstInserted should have the system timestamp. All other fields are managed by the ContentConnector. The ContentConnector automatically detects new records and changes to existing records using the Checksum column, and then generates and transfers the PRC/MFC file set as necessary. By default, the PRC/MFC file set extension starts with “001” and increments to “999”, before resetting to “001”. The extension information itself is persisted in the .prcmfcCounter text file and can be manipulated if necessary. This file is automatically generated the first time the ContentConnector generates a PRC/MFC file set.
The PRC/MFC files are stored locally in the directory specified in the LocalUploadPath setting (ContentConnector.xml). By default, this value is set to "data/DataSource/Upload".
The ContentConnector will generate and upload a MFC/PRC file set when running with IMPORT command or legacy FULL or INCREMENT commands (not with IMPORTNODOWNLOAD, FULLNODOWNLOAD or INCREMENTNODOWNLOAD commands). Uploads can be done explicitly by calling the ContentConnector in UPLOAD mode.
Create a blank database on the server, which will host the DataSource data. The database will contain both the repository as well as staging tables. The following examples reference a database named “DataSource”.
Once the database is created you must create a user for the DataSource database or add access privileges for an existing user. The database user will need table creation, read, write, and delete privileges. Depending on the database, the user may also need special privileges to truncate tables and perform bulk inserts.
For the DataSource service, the ContentConnector requires two configuration files: ConnectConnector.xml and DataSource\DataSource-DATABASE.xml.
The DataSource-DATABASE.xml file contains platform dependent SQL instructions:
The configuration/ContentConnector.xml file contains the setup information for the ContentConnector. Locate and edit the ContentConnector.xml file and make the following edits to align the behavior of the application to your needs:
Option |
Option Overview |
Comments |
Related Tables |
All Services |
|||
KeepArchivedZipsFor days="30" | Delete archived files 'download.zip' after they have been kept for the specified amount of time | If the tag is not present (or the value of the attribute 'days' is less than 0) the archived downloads will be kept. | |
IndexCompression type="ROW" | Sets data compression type ('ROW', 'PAGE', or 'NONE') for all indexes that do not have their own compressionType attribute | Only supported by SQL Sever 2008+ Enterprise Edition or Developer Edition | |
TableCompression type="PAGE" | Sets data compression type ('ROW', 'PAGE', or 'NONE') for all tables that do not have their own compressionType attribute | Only supported by SQL Sever 2008+ Enterprise Edition or Developer Edition | |
UseTruncate | TRUNCATE will be used instead of DELETE to clear tables | TRUNCATE is faster, takes less transaction log space, but breaks replication | |
UpdateStatisticsOnBulkInsert |
Update table statistics after a bulk insert |
Can cleanup table structures after a bulk insert |
|
ConnectorStatus |
This option is used for providing the status of each ContentConnector run |
Keep enabled |
cds_ConnectorStatus |
JVMProps |
Display JVM properties in the log file |
For debugging purposes |
|
DisplayInsertRecordCount |
Display the number of records that were bulk inserted in the log file |
For debugging purposes. This causes an extra select. |
|
NoMySQLCharacterSetSupport |
Do not send character set property to MySQL |
Certain older versions of MySQL did not support character sets for bulk inserts for multiple languages |
|
DisableTabLock |
Do not use TABLOCK during SQL Server bulk inserts |
This can prevent table locking, but also slows down inserts |
|
OracleUseUnixNewLine |
Use Unix new lines for Oracle bulk inserts |
Can be necessary if bulk inserts fail because of lines |
|
DataSource Options |
|||
KeepACK |
Triggers the delivery of the new increment download.zip file automatically once the current file is processed successfully by the ContentConnector |
Enable only if the delivery of the new increment file should not be triggered automatically. It can be done manually by deleting the dataout.trxt file from the ack folder on your account's FTP. |
|
SearchableAttributes |
Loads searchable attributes either the full set or Product Line and Model only in case the full set is not a part of the contract |
Keep enabled |
cds_Atr |
NewUNSPSC |
Loads UNSPSC data |
Enable only when your account is set up to receive this data. |
cds_UNSPSC_Versioned_Commodities |
UNSPSC |
Legacy option for UNSPSC data. |
Keep disabled |
cds_UNSPSC |
Metamap |
Loads the data which defines a link between Customer SKU and 1WS ProdID as well as UPC codes and Distributors' Part Numbers when the account is set up to receive these options |
Keep enabled |
cds_Metamap |
NVARCHAR | This option should be set for SQLServer UTF8/UTF16 encoding for double bytes support | Enable when needed | |
RelatedProducts |
Legacy option for ContentConnector 2.2.3.4 or later. Loads Related Products data (RP Links and/or RP Rules) |
Enable only when your account is set up to receive Related Products Links and/or Related Products Rules |
cds_Acc_Updates |
RelatedProductsLinks |
Loads Related Products Links, i.e. a link between an Accessory SKU and a Parent SKU |
Enable only when your account is set up to receive OEM Related Products |
cds_Acc_Links |
RelatedProductsRules |
Loads Related Products Rules, i.e. a link between an Accessory SKU and a Parent Product Line and Model |
Enable only when your account is set up to receive Compatibility Rules |
cds_AccCompat |
DigitalContent |
Loads digital content. The list of the digital content options is available in the config file in the MediaType section |
Enable only when your account is set up to receive this data. |
cds_DigContent |
DigitalContentMeta |
Loads Images Meta Attributes data, i.e. Image Size, Image Type, Image Angle, etc. |
Enable only when your account is set up to receive this data. |
cds_DigContent_Meta |
DigitalContentText |
Loads digital content text; an alternative delivery method for all text components existing in Digital Content, including:
|
Enable only when your account is set up to receive this data. | cds_DigContent_Text |
DownloadDigitalContent |
This option is used for downloading the digital content. |
Enable if you would like the digital content to be downloaded. You may choose to disable this option if you would like to use the direct links. |
|
ProcessThumbnails |
This option is used for thumbnail generation. Thumbnails are defined in the config file in the Thumbnails section. |
Disable if the image reducing to thumbnails is not needed |
|
MainSpecs |
Loads Main Specifications |
Enable only when your account is set up to receive this data. |
cds_Mspecxx |
ExtendedSpecs |
Loads Extended Specifications |
Enable only when your account is set up to receive this data. |
cds_Especxx |
MarketingText |
This is a legacy option for when the marketing text was delivered not in digital content |
Keep disabled |
cds_Mktxx |
MarketingTextDigContent |
Loads the marketing text which is delivered in digital content |
Keep enabled |
The digital content tables |
Atomic |
Loads Atomic Data |
Keep disabled |
cds_Atr_full |
CatalogInfo |
Loads the Catalog information which includes all SKUs with their current status, last delivery date, etc. |
Keep enabled |
cds_Catalog |
Lifecycle |
Loads Products Lifecycle data (Mfg Release Date, Mfg Discontinue Date) |
Keep enabled |
cds_Lifecycle |
Categorization |
Loads Alternative Categorization |
Keep enabled |
cds_Cct_Categories |
CheckMissedIncrement |
This option is used for checking that there is no missing data. If any data is missing the ContentConnector will exit with an error "An increment has been missed" |
Keep enabled |
|
CheckIsFullDownload |
Legacy option for ContentConnector 2.2.3.4 or later. This option is for checking if the delivered file is full when the ContentConnector runs in increment mode. If the Connector detects that it is a full file then it switches automatically to processing it in full mode |
Keep enabled |
|
SecureFTP |
This option is for downloading the files via SFTP (FTP over SSH) |
Enable if SFTP is used |
|
External:AlternativeSearchableAttributes |
This option is used for loading the external files with Alternative Searchable Attributes |
Enable if your account is set up with the full set of searchable attributes and if would like to get Alternative Searchable Attributes |
cds_Asa_Cat |
External:AttributeDisplayOrder |
Loads a file that contains the logical ordering of searchable attributes, including those that are not part of the alternative set |
Enable if needed along with the AlternativeSearchableAttributes option |
cds_Atr_Logical_Order |
External:LanguageMapping |
Loads a file that maps DS language codes to ISO codes |
Enable if needed |
cds_ISO_Languages |
UploadProducts |
This option is used for uploading the products |
Enable only when your account is set up for requesting the SKUs and downloading from the same account |
|
StageOnly |
Only load the staging tables for an increment. |
Does not apply the update policies. Production tables will be empty. |
|
KeepStage |
Do not truncate the staging tables at the end of the Connector run. |
Staging tables will still be truncated at the beginning. |
|
PIMFeed | Loads PIM feed | Enable only when your account is set up to receive this data. |
cds_Pim_Atr |
Uncomment the lines containing the required languages.
For example, if your DataSource account is set up to receive the data in English language with the Imperial Units of measure, the following language parameter would need to be uncommented:
<Language DSFileCode="ez" ISOCode="" TableCode="ez" Create="True" Load="True"/>
If your account is set up with English and French languages with Metrics units of measure then the following parameters would need to be enabled:
<Language DSFileCode="fr" ISOCode="" TableCode="fr" Create="True" Load="True"/>
<Language DSFileCode="en" ISOCode="" TableCode="en" Create="True" Load="True"/>
Locate the element <DataExchanges> in the configuration file and uncomment the block that corresponds to your database and character encoding.
If your account is set up with SQL Server and Windows encoding, and your code page is other than 1252, add the Encoding attribute with the corresponding value. You may find the code per language in the Encoding/Codepage Documentation section in the configuration file.
For example for Japanese language:
<DataExchanges ServiceName="DataSource" LocalDownloadPath="data/DataSource/zipped/download.zip"
LocalUnzippedPath="data/DataSource/unzipped" ImageRepository="data/DataSource/digitalcontent"
LocalUploadPath="data/DataSource/Upload"
ThumbnailRepository="data/DataSource/digitalcontent" DSFTPMinutesBetweenAttempts="5" DSFTPMaxAttempts="3" Encoding="932" > <!-- Not UTF -->
...
...
</DataExchanges>
Please customize the following:
Attribute | Replace with |
LocalDownloadPath | The location where you would like the files to be downloaded to (example: C:\DataSource\data\zipped\) |
LocalUnzippedPath | The location where you would like the files to be extracted/uncompressed to (example: C:\DataSource\data\unziped for example) |
LocalUploadPath | The local directory where PRC/MFC files will be stored before being transferred to 1WS. This setting is intended for customers who request specific SKUs from 1WS (i.e. “PULL mode” subscribers). |
ImageRepository | The location where you would like the images to be copied to. This can be a folder that is directly visible by your web-application through a virtual directory. (example: C:\DataSource\data\images) |
ThumbnailRepository | The location where any thumbnails will be stored. Often, this is the same as ImageRepository. |
<DS …Login | Your DataSource Account FTP username. |
<DS … Password | Your DataSource Account FTP password. |
The DigitalContent section provides all available digital content types.
<DigitalContent Download="NewOrChangedSinceLastRun" NbThreads="10" ConnectionAttempts="20">
<MediaTypeRestriction>
<MediaType ID="1" Directory="JPG_200x150" Description="Standard image (200x150)">
<Thumbnails>
<Thumbnail ID="1"/>
</Thumbnails>
</MediaType>
<MediaType ID="2" Directory="JPG_400x300" Description="Large image (400x300)"/>
<MediaType ID="3" Directory="SSA" Description="Multi-angle images"/>
<MediaType ID="4" Directory="MARKETING_TEXT" Description="Localized marketing text"/>
<MediaType ID="5" Directory="KEY_SELLING_POINTS" Description="Key selling points"/>
<MediaType ID="10" Directory="WHATS_IN_THE_BOX" Description="What's in the Box"/>
<MediaType ID="11" Directory="PRODUCT_DATA_SHEET" Description="Product data sheet"/>
<MediaType ID="12" Directory="USER_MANUAL" Description="User manual"/>
<MediaType ID="13" Directory="QUICK_START_GUID" Description="Quick start guide"/>
<MediaType ID="14" Directory="PRODUCT_FEATURES" Description="Product features"/>
<MediaType ID="15" Directory="CCS" Description="CCS Product Images">
<Thumbnails>
<Thumbnail ID="1" Width="200" Height="150"/>
</Thumbnails>
</MediaType>
<MediaType ID="17" Directory="MANUFACTURER_LOGOS" Description="Manufacturer Logos"/>
</MediaTypeRestriction>
<MimeTypeRestriction>
<MimeType MimeID="image/gif" Extension="gif"/>
<MimeType MimeID="image/jpeg" Extension="jpg"/>
<MimeType MimeID="image/x-photoshop" Extension="psd"/>
<MimeType MimeID="application/x-photoshop" Extension="psd"/>
<MimeType MimeID="application/xml" Extension="xml"/>
<MimeType MimeID="application/pdf" Extension="pdf"/>
</MimeTypeRestriction>
</DigitalContent>
<Thumbnails>
<Thumbnail ID="1" Directory="JPG_75x75" Width="75" Height="75"/>
</Thumbnails>
The MediaTypeRestriction lists the media types that will be downloaded. MediaType ID matches the MediaTypeID in the database.
The Directory is where each media type is stored and it is going to be created in \data\DataSource\digitalcontent folder once the ContentConnector runs. Each directory is relative to the ImageRepository directory.
ContentConnector also supports additional thumbnail generation. The thumbnails the size 75x75 is provided for all images as part of the CCS Product Image media type.
The additional other than 75x75 size thumbnails can be generated by including Thumbnails section within the original media type. Multiple thumbnail sections can be defined as well. Each Thumbnails section will reference an actual thumbnail size and directory by ID. By default, Thumbnails section is configured to create a 75x75 image from the 200x150 image (as shown above).
MimeTypeRestriction is similar to MediaTypeRestriction. It limits the mime types the ContentConnector will download.
The Schema element references the database-specific configuration file located in the configuration/DataSource directory. Uncomment the schema that corresponds to your database and comment out all other schemas as necessary.
<!-- Enable one of the following options -->
<Schema definition="configuration/DataSource/DataSource-SQLServer.xml"/>
<!--Schema definition="configuration/DataSource/DataSource-MySQL.xml"/-->
<!--Schema definition="configuration/DataSource/DataSource-Oracle.xml"/-->
<!--Schema definition="configuration/DataSource/DataSource-PostGreSql.xml"/-->
Locate the ServerURL field. Depending on your database, you may need to specify your server address, port number, and database name. Replace all fields that are in all capital letters with the information for your own environment.
If you are running multiple instances of SQL Server, your server address may be something like “MYSERVER\MYINSTANCE”.
Next, locate the ProductDataDatabase field. Replace DATABASE with the name of your database.
The stagingTablePrefix attribute tells the application how to prefix the tables that will store increment data. Those tables are then synchronized with the main tables (which have no prefix). You may change this entry to suit your naming conventions.
Locate the Login section and enter the user ID and password for a user with table creation/deletion/update rights on the database created.
In order to speed up the update in Oracle, there is an option to turn validation off. When constraints are added or enabled, this option will tell Oracle not to verify the existing data. However, any future data operations will be affected by the constraints. To enable this option, add the line <NoValidate>true</NoValidate> just after the Loader element in the Databases section.
The ContentConnector uses a library called Log4j which allows for application events to be distributed in several ways (email alerts, HTML report, Windows event log, web-service, database etc….) and for a configurable set of levels (FATAL, ERROR, WARN, INFO,DEBUG). The monitoring configuration file, which can be found at logs/log4j-DataSource.properties, specifies where the notifications should be sent and establishes minimal levels for the notifications events.
The defaults are as follows:
You may change those settings to suit your particular logging needs and in any case will need to modify log4j-DataSource.properties to reference local file paths.
When running the ContentConnector through a proxy server, you will need to configure extra optional settings in ContentConnector.xml.
Insert the following text at the very end of the <DataExchanges> section and modify to fit your proxy definition:
<Proxy Host="proxy.mycompany.com” Port=”3000” Username=”” Password=””/>
If ContentConnector and the database are located on different servers follow the recommendations below.
MySQL, SQLServer, and Oracle support loading the database from a remote workstation or server.
MySQL
<!-- MySQL -->
Databases Type="MySQL4.0CompatibilityMode">
<Driver>com.mysql.jdbc.Driver</Driver>
<ServerURL>jdbc:mysql://testserver:11533/ConnectorTest</ServerURL>
<ProductDataDatabase name="ConnectorTest" repositoryTablePrefix="" stagingTablePrefix="in_"/>
<Login User="xxx" Password="xxx"/>
<TableType>MyISAM</TableType>
<Loader>Local</Loader>
</Databases>
SQLServer
<Databases Type="SQLServer2000">
<Driver>com.microsoft.sqlserver.jdbc.SQLServerDriver</Driver>
<ServerURL>jdbc:sqlserver://testserver:1433;database=ConnectorTest</ServerURL>
<ProductDataDatabase name="ConnectorTest" repositoryTablePrefix="" stagingTablePrefix="in_"/>
<Login User="connector" Password="connector2"/>
<Loader>BCP</Loader>
</Databases>
Oracle
<Databases Type="Oracle9i">
<Driver>oracle.jdbc.driver.OracleDriver</Driver>
<ServerURL>jdbc:oracle:thin:@remotehost:1521:ConnectorTest</ServerURL>
<ProductDataDatabase name="ConnectorTest" repositoryTablePrefix="" stagingTablePrefix="in_"/>
<Login User="ConnectorTest" Password="xxx"/>
<Loader>Native</Loader>
<NoValidate>true</NoValidate>
</Databases>
ConnectorTest =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = remotehost)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = ConnectorTest)
)
)
To test tnsnames.ora, log into sqlplus using sid. It should work so the contentConnector could work as well.
sqlplus ConnectorTest@XE
The ContentConnector has the ability to load external files during the maintenance process. "External files" are data files that are not delivered within download.zip file. They can originate from 1WS or your own organization.
Preset External DataSets are defined and can be activated in the configuration/ContentConnector.xml file.The following External Datasets are a part of the standard release but not activated by default:
Option to activate |
Description |
External:AlternativeSearchableAttributes |
Loads the alternative searchable attributes tables (cds_asa*). |
External:AttributeDisplayOrder |
Loads a file that contains the logical ordering of searchable attributes, including those that are not part of the alternative set (cds_Atr_Logical_Order table). |
External:LanguageMapping |
Loads a file that maps DS language codes to ISO codes (cds_ISO_Languages table). |
Three steps need to be taken to use external datasets:
In the configuration/ContentConnector.xml file you will notice an <ExternalDataSet> tag with <ExternalDataSetFile> children for preset datasets. You will need to define a new <ExternalDataSetFile> to load your data.
The example below illustrates how to load 1WS's Alternative Searchable Attributes:
<ExternalDataSet>
<ExternalDataSetFile
description="1WS Alternative attributes"
remoteFileName="cds_asa.zip"
compressed="true"
localFolder="cds_asa"
tiedToOption="External:AlternativeSearchableAttributes">
<ExternalDataSetFileHTTP HTTPServerURL= "https://contentsolutions.1worldsync.com support/datasource/asa/cds_asa.zip">
</ExternalDataSetFileHTTP>
</ExternalDataSetFile>
</ExternalDataSet>
description |
The dataset name as it will appear in the connector logs |
remoteFileName |
Name of the remote file - this will also be the name of the local file |
compressed |
"true" if the file needs to be unzipped after download "false" otherwise Note that you will need to set compressed to true if you intend to download a zip file that itself contains multiple files |
localFolder |
Location where the file will be downloaded locally. This is relative to the location defined in ContentConnector.xml by the <DataExchanges> LocalDownloadPath attribute. |
tiedToOption |
This external dataset will be active only if the according option is set in ContentConnector.xml, for example <Option>External:AlternativeSearchableAttributes</Option> |
<ExternalDataSetFileHTTP> |
If the remote file is accessible via a public HTTP URL, use this tag and populate the attribute described below |
HTTPServerURL |
Public URL from where the file can be downloaded |
<ExternalDataSetFileFTP> |
If the remote file is accessible via FTP or SFTP, use this tag and populate the attributes described below |
FTPServer |
Server name or IP address |
FTPLogin |
User name |
FTPPassword |
Password |
FTPRemoteFolder |
Remote folder in which file resides |
FTPSecure |
"true" if SFTP access needs to be used to connect. Note that the remote server will need to be added to your SSH KnowHosts file "false" if standard FTP access is to be used to connect |
Each file that needs to be loaded in the database needs to have a corresponding table created. Follow the syntax that is used to create classic DataSource tables to do this in DataSource-XXX.xml. Note that the fileName attribute needs to be prefixed with the "external/" folder name as this is the location under the LocalUnzippedPath, the value given in LocalFolder in the ExternalDataSet should follow.
that all external files will be placed/unzipped into. Use the same tiedToOption value that was defined in the ExternalDataSet.
Example:
<DBTable name="cds_Asa_Cat"
localized="false"
fileName="external/cds_asa/cds_Asa_Cat.txt"
tiedToOption="External:AlternativeSearchableAttributes>
...
</DBTable>
Maintenance policies are located at the end of the DataSource-XXX.xml file. They define how data is synchronized between the staging tables and the main tables and end up being translated into SQL statements in all supported database dialects.
If the data that is loaded is always a full dataset (not incremental) then just add a policy that does a truncate/insert all between staging and main:
<UpdatePolicyMethod tableName="cds_Asa_Cat" methodName="truncateThenCopy"
description="Full reload of cds_Asa_Cat table ">
<UpdatePolicyArg name="DUMMY" value="REMOVE"/>
</UpdatePolicyMethod>
If more complex logic needs to be implemented, other policies exist but they are more complicated to use. You may want to consult 1WS if you have the intention of using such policies.
updateExistingAndInsertNew |
Updates records in main that also exist in stage and that have changed (changes concern all columns except the idenfiying colums), inserts records from stage that do not exist in main. On some databases (Oracle, SQL Server), the SQL that is generated leverages the MERGE keyword, on others it will use comparisons between all column values to identify changes.
Specify the column(s) that are used to uniquely identify records: <UpdatePolicyArg name="column" value="ID"/> |
deleteExistingAndInsertNew |
Deletes all records from main that are present in stage and re-inserts them from stage. |
mergeTablesWithSwap |
This method is functionally identical to deleteExistingAndInsertNew but it instead relies on table swaps to speed up the synchronization of record-heavy tables. Table swap operations create temporary copies of destination tables to perform only insert operations (no deletes, no updates) restricted to the data that needs to be kept. This data can come from the existing main table or the staging table. After all inserts are performed, the swap table replaces the main table, replicating foreign keys and indexes in the process. Since the original main table is dropped, mergeTablesWithSwap cannot be used on tables that are part of materialized views.
Specify the column(s) that are used to uniquely identify records: <UpdatePolicyArg name="column" value="ID"/> |
insertNew |
Inserts records in main that are only present in staging
Specify the column(s) that are used to uniquely identify records: <UpdatePolicyArg name="column" value="ID"/> |
replaceTableWithSwap |
Performs a reload of the main table by using table swaps. This should be used when foreign keys point to the main table. |
truncateThenCopy |
Performs a reload of the main table by truncating and reloading it. This should be used when no foreign keys point to the main table. |
deleteObsoleteContent |
Deletes all content from the repository which we know will not be used. The method has 4 arguments:
Example: <UpdatePolicyMethod tableName="cds_Distivoc" methodName="deleteObsoleteContent" description="Remove stale metamapping contributor information"> <UpdatePolicyArg name="deleteType" value="WhenOnlyInProd"/> <UpdatePolicyArg name="column" value="ID"/> <UpdatePolicyArg name="stageTable" value="cds_Distivoc"/> <UpdatePolicyArg name="stageColumn" value="ID"/> </UpdatePolicyMethod> |
deleteOrphanedContent |
Deletes all content from the repository when records aren't being used by another table in the repository.
Example: remove all records from cds_DigContent_Regions that cds_DigContent_Region_Links aren't using anymore:
<UpdatePolicyMethod tableName="cds_DigContent_Regions" methodName="deleteOrphanedContent" description="Remove unused regions"> <UpdatePolicyArg name="column" value="RegionCode"/> <UpdatePolicyArg name="repositoryTable" value="cds_DigContent_Region_Links"/> <UpdatePolicyArg name="repositoryColumn" value="RegionCode"/> </UpdatePolicyMethod> |
Once the three steps above have been completed, you will need to re-run the connector in create mode to get it to create the missing tables, including staging tables (prefixed with "in_").
It is also important to know that external datasets are fetched, even when running the connector in "nodownload" mode (fullnodownload or incrementnodownload keywords).
Since release 2.2.0.x, the ContentConnector has the ability to download files via SFTP (FTP over SSH). This protocol allows for encrypted transfers; note that it is different from FTPS which isn't supported.
To activate SFTP transfers, you will need to activate the option and point the connector to the 1worldsync SFTP server instead of the regular FTP server. You will also need to register the server as being trusted '("Known Host")
SFTP support is a standard Unix/Linux/OSX feature while it isn't part of Windows.
The ContentConnector will first look for the known_hosts file in the ContentConnector's main folder. If it can't be found, it will be searched for under the usual key register (~/.ssh/known_hosts).
Adding the 1worldSync server to your list of trusted SSH endpoints is easy: SFTP to the server and confirm that you trust it:
$sftp USERID@sftp.cs.1worldsync.com
The authenticity of host 'sftp.cs.1worldsync.com (40.79.74.37)' can't be established.
RSA key fingerprint is 73:8c:c8:f8:52:73:7e:c9:ec:6d:a6:14:63:62:50:75.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'sftp.cs.1worldsync.com,40.79.74.37' (RSA) to the list of known hosts.
You first need to install an SSH client - many exist and some such as SSHWindows are free (http://sshwindows.sourceforge.net).
Once the SSH client is installed, create a known_hosts file under the ContentConnector main directory:
ssh-keyscan -t rsa sftp.cs.1worldsync.com > known_hosts
The connector log files will show that SFTP is activated:
2013-11-18 17:51:26,542 INFO {unknown} Connected to sftp.cs.1worldsync.com using secured FTP [FileTransfer] (FileTransferSFTP.java:144)
This section gives the syntax of the calls which can be issued to the ContentConnector. These commands are performed at the command prompt.
The basic command structure for calling the ContentConnector is as follows:
java -jar contentConnector-2.2.3.x.jar <service> <configuration> <action>
Example:
java -jar contentConnector-2.2.3.4.jar datasource configuration\ContentConnector.xml create
import |
New command for ContentConnector 2.2.3.4 and later.
Full file - truncates all existing tables and loads new data into the DataSource tables.
|
importnodownload |
New command for ContentConnector 2.2.3.4 and later. Same as “import” but uses local data instead of retrieving new data from FTP. A download.zip file may also be specified to extract data from a location other than the default download location. |
create |
Builds the table structure and relationships for DataSource Note: The DataSource database must be manually created prior to running the create action. |
full |
Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'import' for ContentConnector 2.2.3.4 and later. Downloads download.zip file from FTP. Truncates all existing tables and loads new data into the DataSource Tables. The upload action is also performed, if applicable. |
fullnodownload |
Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'importnodownload' for ContentConnector 2.2.3.4 and later. Same as "full" but uses local data instead of retrieving new data from 1WS. A download.zip file may also be specified to extract data from a location other the default download location. |
increment |
Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'import' for ContentConnector 2.2.3.4 and later. Downloads download.zip file from FTP. Performs a series of delete and inserts to update existing rows and insert new records. (Note: This is the command used on a daily basis). The upload action is also performed, if applicable |
incrementnodownload |
Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'importnodownload' for ContentConnector 2.2.3.4 and later. Same as "increment" but uses local data instead of retrieving new data from 1WS. A download.zip file may also be specified to extract data from a location other the default download location. |
digitalcontentonly | Does not update the database. Only downloads the digital content files and creates thumbnails to what is currently in the database. |
rebuildthumbnails | Does not download any files or update the database. Only recreates thumbnails for existing digital content files to what is in the database. |
upload | Uploads MFC/PRC files to FTP. No other actions are performed. |
NOTE: If you are running ContentConnector 2.2.3.3 and older, the full mode should be used on the initial load only and after that the ContentConnector should be scheduled to run in increment mode. If the full refresh is required the full file delivery should be requested. The ContentConnector may still run in increment mode on a full file as long as the option CheckIsFullDownload is enabled in the ContentConnector.xml file.
There are several options available to help troubleshoot ContentConnector. In the ContentConnector.xml file, uncomment the JVMProps option to print out a listing of the current Java environment in which the ContentConnector is executing.
In the Monitoring section of the ContentConnector.xml file, add the following attribute: “traceSQL=true”. This setting will print out all SQL that is generated by the ContentConnector. The SQL output will be placed in the scripts directory and will be organized by database and action.
In configuration\log4j.xml file replace <priority value="info"/> with <priority value="debug"/> With this option enabled ContentConnector will log additional debug information to logs\log.txt file.
The log files contain detailed and useful information. In order to ensure that the database connection is working properly and that the data is correct, ContentConnector performs several checks. If any of these checks fail, the ContentConnector will log the results and exit before the main tables are altered.
ContentConnector will make sure that the JDBC driver can be loaded. It will also make sure that the login information can be found. If a connection cannot be made to the database, it will log the error message and exit.
ContentConnector will print out the database product name, database product version, driver name, driver version, and catalog. It will make sure that the database type in the ContentConnector.xml file (e.g. SQLServer2000”) matches the database that ContentConnector has actually connected to. It will also make sure that the versions of the database and driver are compatible. In the case of MySQL, if the JDBC MySQL driver is prior to version 3.1, then it also makes sure that ISO-8859-1 is not being used.
Before loading any data, ContentConnector will look at DataSource-xxx.xml and make sure that the files specified in the schema configuration match the files contained in the current download. If they do not match, ContentConnector will exit.
In the ContentConnector.xml file, the CheckMissedIncrement option is turned on by default. This tells ContentConnector to load the data into staging and to then check the catalog to see if an increment has been missed. If an increment has been missed, ContentConnector will exit before updating the main tables.
In the ContentConnector.xml file, the CheckFullDownload option is turned on by default. This tells ContentConnector to load the data into staging and to then check the data to see if it is a full dump of the catalog. The ContentConnector will automatically switch to full mode when full dumps are detected and perform a complete refresh of the main tables. Although full dumps can be processed in increment mode, full mode offers the best performance.
Thank you for the helpful guide on ContentConnector! We have a suggestion for improvement and can’t find a way to open a ticket to have it considered for future releases.
We are trying to import 1WS data to a hosted GCP Cloud SQL for PostgreSQL database using the ContentConnector 2.2.28. During the syncronization the ContentConnector issues the following SQL commands:
- 'ALTER TABLE ... DISABLE TRIGGER ALL' (in PostGreSqlPrimitives.DBLoader::disableForeignKey)
- 'COPY from <local file path>'
It is generally not possible in a hosted database to assign the necessary permissions for these commands to a user. As a solution, the command 'ALTER TABLE ... DISABLE TRIGGER USER' would work fine with the necessary permissions and instead of COPY it is possible to use the psql \copy with a local filestream.
We are writing to inform you of the issue and in the hope that future version of the ContentConnector will contain a solution.
Hello @MarvinS and thank you for utilizing our customer community. Welcome. We took a look at your note and always appreciate customer feedback. Your suggestion was shared with our engineering team. If needed you can also create a ticket with our support team HERE
We offer some additional information which you can find here: https://community.1worldsync.com/t5/Extranet-Training-Videos/Video-DataSource-Extranet-Support/ta-p/...