`
hunxiejun
  • 浏览: 1144888 次
文章分类
社区版块
存档分类
最新评论

Oracle Golden Gate 系列一 -- GG 架构 说明

 
阅读更多

一. GoldenGate 下载地址

官网可以直接下载:

http://www.oracle.com/technetwork/middleware/goldengate/downloads/index.html

文档的下载也在这块。

如果想深入研究GoldenGate,这些文档需要认真的阅读一遍。

二. GoldenGate 架构说明

学一个工具,最块的方法是先掌握其原理,当然也是因系统而已,GG 的原理比Oracle 的就简单很多,直接从Administrator’s Guide 上摘取GG 的架构部分。

2.1 Oracle GoldenGate supported processing methods anddatabases

Oracle Golden Gate enables the exchange and manipulation of data at the transactionlevel among multiple, heterogeneous platforms across the enterprise. Itsmodular architecture gives you the flexibility to extract and replicateselected data records, transactional changes, and changes to DDL (datadefinition language) across a variety of topologies.


With thisflexibility, and the filtering, transformation, and custom processing featuresof Oracle GoldenGate, you can support numerous business requirements:

1)Business continuance and highavailability.

2)Initial load and databasemigration.

3)Data integration.

4)Decision support and datawarehousing.

Figure 1 Oracle GoldenGate supportedtopologies


Table 1 Supported processing methods1

*Supported only as a target database.Cannot be a source database for Oracle GoldenGate extraction.

** Uses a capture module that communicateswith the Oracle GoldenGate API to send change data to Oracle GoldenGate.

*** Only like-to-like configuration issupported. Data manipulation, filtering, column mapping not supported.

2.2. Overview of the Oracle GoldenGate architecture

Oracle Golden Gate is composed of thefollowing components:

(1)Extract

(2)Data pump

(3)Replicat

(4)Trails or extract files

(5)Checkpoints

(6)Manager

(7)Collector

Figure 2 illustrates the logical architecture of Oracle Golden Gate for initial dataloads and for the replication of ongoing database changes. This is the basicconfiguration. Variations of this model are recommended depending on businessneeds.

Figure 2 Oracle GoldenGate logicalarchitecture

2.2.1 Overview of Extract

The Extract process runs on the source system and is the extraction (capture) mechanism ofOracle Golden Gate. You can configure Extract in one of the following ways:

(1)Initialloads: For initial data loads, Extract extracts a current set of data directlyfrom their source objects.

(2)Change synchronization: To keep source data synchronized with another set of data, Extractcaptures changes made to data (typically transactional inserts, updates, and deletes)after the initial synchronization has taken place. DDL changes and sequences arealso extracted, if supported for the type of database that is being used.

When processing data changes, Extract obtains the data from a data source that can be one ofthe following.

(1)The database recovery logs or transaction logs (such as the Oracle redo logs or SQL/MX audittrails). The actual method of obtaining the data from the logs varies dependingon the database type.

(2)A third-partycapture module. This method provides a communication layer that passes datachanges and metadata from an external API to the Extract API. The databasevendor or a third-party vendor provides the components that extract the data changesand pass it to Extract.

Extract captures all of the changes that are made to objects that you configure for synchronization.Extract stores the changes until it receives commit records or rollbacks forthe transactions that contain them. When a rollback is received, Extractdiscards the data for that transaction. When a commit is received, Extractsends the data for that transaction to the trail for propagation to the targetsystem. All of the log records for a transaction are written to the trail as asequentially organized transaction unit. This design ensures both speed anddata integrity.

NOTE:

Extract ignores operations on objects that are not in the Extract configuration, even thoughthe same transaction may also include operations on objects that are in the Extractconfiguration.

Multiple Extract processes can operate on different objects at the same time. For example, oneprocess could continuously extract transactional data changes and stream themto a decision-support database, while another process performs batch extractsfor periodic reporting. Or, two Extract processes could extract and transmit inparallel to two Replicat processes (with two trails) to minimize target latencywhen the databases are large. To differentiate among different processes, youassign each one a group name (see “Overviewof groups” on page 18).

2.2.2 Overview of datapumps

A data pump is asecondary Extract group within the source Oracle GoldenGate configuration. If adata pump is not used, Extract must send data to a remote trail on the target.In a typical configuration that includes a data pump, however, the primary Extract group writes to a trail on the source system. The data pump reads this trail and sends the data over the network to a remote trail on the target. The data pump adds storage flexibility and also serves to isolate the primaryExtract process from TCP/IP activity.

Like a primary Extract group, a data pump can be configured for either online or batch processing.It can perform data filtering, mapping, and conversion, or it can be configuredin pass-through mode, where data is passively transferred as-is, withoutmanipulation. Pass-through mode increases the throughput of the data pump,because all of the functionality that looks up object definitions is bypassed.

In most businesscases, you should use a data pump. Some reasons for using a data pump includethe following:

(1)Protection against network and target failures: In a basic Oracle GoldenGate configuration,with only a trail on the target system, there is nowhere on the source systemto store data that Extract continuously extracts into memory. If the network orthe target system becomes unavailable, that Extract could run out of memory andabend. However, with a trail and data pump on the source system, captured datacan be moved to disk, preventing the abend of the primary Extract. Whenconnectivity is restored, the data pump captures the data from the source trailand sends it to the target system(s).

(2)You are implementing several phases of data filtering or transformation. When using complexfiltering or data transformation configurations, you can configure a data pump toperform the first transformation either on the source system or on the targetsystem, or even on an intermediary system, and then use another data pump orthe Replicat group to perform the second transformation.

(3) Consolidating data from many sources to a central target. When synchronizing multiple sourcedatabases with a central target database, you can store extracted data on each sourcesystem and use data pumps on each of those systems to send the data to a trail onthe target system. Dividing the storage load between the source and targetsystems reduces the need for massive amounts of space on the target system toaccommodate data arriving from multiple sources.

(4) Synchronizingone source with multiple targets. When sending data to multiple target systems,you can configure data pumps on the source system for each target. If network connectivityto any of the targets fails, data can still be sent to the other targets.

If your requirements preclude the use of a data pump, you can still configure Oracle GoldenGatewithout one. Oracle GoldenGate supports many different configurations. See theconfiguration chapters in this guide to find the one that is best suited toyour environment.

2.2.3 Overview of Replicat

The Replicat process runs on the target system. Replicat reads extracted data changes and DDLchanges (if supported) that are specified in the Replicat configuration, and then it replicates them to the target database. You can configure Replicat inone of the following ways:

(1) Initial loads: For initial data loads, Replicat can apply data to target objects orroute them to a high-speed bulk-load utility.

(2)Change synchronization: To maintain synchronization, Replicat applies extracted data changesto target objects using a native database interface or ODBC, depending on the databasetype. Replicated DDL and sequences are also applied, if supported for the typeof database that is being used. To preserve data integrity, Replica applies thereplicated changes in the same order as they were committed to the sourcedatabase.

You can use multiple Replica processes with multiple Extract processes in parallel to increase throughput. Each set of processes handles different objects. To differentiate among processes, you assign each one a group name .

You can delay Replica so that it waits a specific amount of time before applying data to thetarget database. A delay may be desirable, for example, to prevent thepropagation of errant SQL, to control data arrival across different time zones,or to allow time for other planned events to occur. The length of the delay iscontrolled by the DEFER APPLY INTERVAL parameter.

2.2.4 Overview of trails

To support the continuous extraction and replication of database changes, Oracle GoldenGate stores the captured changes temporarily on disk in a series of files called a trail.A trail can exist on the source or target system, or on an intermediary system,depending on how you configure Oracle GoldenGate. On the local system it is known as an extract trail (or local trail). On a remote system it is known as aremote trail.

By using a trailfor storage, Oracle GoldenGate supports data accuracy and fault tolerance. The use of a trail also allows extraction and replication activities to occur independently of each other.With these processes separated, you have more choices for how data isdelivered. For example, instead of extracting and replicating changescontinuously, you could extract changes continuously but store them in thetrail for replication to the target later, whenever the target applicationneeds them.

2.2.4.1 Processes that write to, and read, a trail

The primary Extract process writes to a trail. Only one Extract process can write to atrail.

Processes that read the trail are:

(1) Data-pump Extract: Extracts data from a local trail for further processing, if needed, andtransfers it to the target system or to the next Oracle GoldenGate process

downstream in the Oracle GoldenGateconfiguration.

(2) Replicat: Reads a trail to apply change data to the target database.

2.2.4.2 Trail maintenance

Trail files are created as needed during processing, and they are aged automatically to allow processing to continue without interruption for file maintenance. By default,trails are stored in the dirdat sub-directory of the Oracle GoldenGatedirectory.

By default, each file in a trail is 10 MB in size. All file names in a trail begin with the same two characters, which you assign when you create the trail. As files arecreated, each name is appended with a unique, six-digit serial (sequence)number from 000000 through 999999, for example c:\ggs\dirdat\tr000001. When thetrail sequence number reaches 999999, the numbering starts over at 000000.

You can create more than one trail to separate the data from different objects orapplications. You link the objects that arespecified in a TABLE or SEQUENCE parameter to a trail that is specified with an EXTTRAIL or RMTTRAIL parameter in the Extract parameter file. Aged trail filescan be purged by using the Manager parameter PURGEOLDEXTRACTS.

2.2.4.3 How processes write to a trail

To maximize throughput, and to minimize I/O load on the system, extracted data is sent into and out of a trail in large blocks. Transactional order is preserved. By default, Oracle GoldenGate writes data to the trail in canonical format, aproprietary format which allows it to be exchanged rapidly and accurately amongheterogeneous databases. However, data can be written in other formats that arecompatible with different applications.

By default,Extract operates in append mode, where if there is a process failure, arecovery marker is written to the trail and Extract appends recovery data tothe file so that a history of all prior data is retained for recovery purposes.

In append mode, the Extract initialization determines theidentity of the last complete transaction that was written to the trail atstartup time. With that information, Extract endsrecovery when the commit record for that transaction is encountered in the datasource; then it begins new data capture with the next committed transactionthat qualifies for extraction and begins appending the new data to the trail. Adata pump or Replicat starts reading again from that recovery point.

Overwrite mode is another version of Extract recovery thatwas used in versions of Oracle GoldenGate prior to version 10.0. In these versions, Extract overwrites the existing transaction datain the trail after the last write-checkpoint position, instead of appending thenew data. The first transaction that is written is the first one that qualifiesfor extraction after the last read checkpoint position in the data source.

If the versionof Oracle GoldenGate on the target is older than version 10, Extract will automaticallyrevert to overwrite mode to support backward compatibility. This behavior canbe controlled manually with the RECOVERYOPTIONS parameter.

2.2.4.4 Trail format

As of Oracle GoldenGate version 10.0, each file of a trail contains a file header record thatis stored at the beginning of the file. The file header contains information about the trail file itself. Previous versions of Oracle GoldenGate do not contain this header.

Each data record in a trail file also contains a header area, as well as a data area. The recordheader contains information about the transaction environment, and the dataarea contains the actual data values that were extracted. For more informationabout the trail record format, see Appendix 1.

2.2.4.5 File versioning

Because all of the Oracle GoldenGate processes are decoupled and thus can be of different OracleGoldenGate versions, each trail file or extract file has a version that is stored in the file header. By default, the version of a trail is the current version of the process that created the file. To set the version of a trail, use the FORMAT option of the EXTTRAIL, EXTFILE,RMTTRAIL,or RMTFILE parameter.

To ensure forward and backward compatibility of files among different Oracle GoldenGate processversions, the file header fields are written in a standardized token format.New tokens that are created by new versions of a process can be ignored byolder versions, so that backward compatibility is maintained. Likewise, newerOracle GoldenGate versions support older tokens. Additionally, if a token isdeprecated by a new process version, a default value is assigned to the tokenso that older versions can still function properly. The token that specifiesthe file version is COMPATIBILITY and can be viewed in the Logdump utility andalso by retrieving it with the GGFILEHEADER option of the @GETENV function.

A trail orextract file must have a version that is equal to, or lower than, that of theprocess that reads it. Otherwise the process will abend. Additionally,OracleGoldenGate forces the output trail or file of a data pump to be the sameversion as that of its input trail or file.

Upon restart,Extract rolls a trail to a new file to ensure that each file is of only oneversion (unless the file is empty).

2.2.5 Overview of extract files

When processing a one-time run, such as an initial load or a batch run that synchronizes transactional changes, Oracle GoldenGate stores the extracted changes in an extractfile instead of a trail. The extract file typically is a single file but can beconfigured to roll over into multiple files in anticipation of limitations onfile size that are imposed by the operating system. Inthis sense, it is similar to a trail, except that checkpoints are not recorded.The file or files are created automatically during the run. The sameversioning features that apply to trails also apply to extract files.

2.2.6 Overview of checkpoints

Checkpoints store the current read and write positions of a process to disk for recovery purposes.These checkpoints ensure that data changes that are marked for synchronizationactually are extracted by Extract and replicated by Replicat, and they preventredundant processing. They provide fault tolerance by preventing the loss ofdata should the system, the network, or an Oracle GoldenGate process need to berestarted. For complex synchronization configurations, checkpoints enablemultiple Extract or Replicat processes to read from the same set of trails.

Checkpoints work with inter-process acknowledgments to prevent messages from being lost in thenetwork. Oracle GoldenGate has a proprietary guaranteed-message delivery technology.

Extract creates checkpoints for its positions in the datasource and in the trail. Replicat creates checkpoints for its position in thetrail.

A checkpoint system is used by Extract and Replicat processesthat operate continuously, but it is not required by Extract and Replicat processes that run in batch mode . A batchprocess can be re-run from its start point, whereas continuous processing requiresthe support for planned or unplanned interruptions that is provided by checkpoints.

Replicat stores its checkpoints in a checkpoint table in the target database to couple the commitof its transaction with its position in the trail file. This ensures that atransaction will only be applied once, even if there is a failure of theReplicat process or the database process. For reporting purposes, Replicat alsohas a checkpoint file on disk in the dirchk subdirectory of the OracleGoldenGate directory. You can optionally configure Replicat to use this file asits sole checkpoint store, and not use a checkpoint table at all. In this mode,however, there can be cases where the checkpoint in the file is not consistentwith what was applied after a database recovery, if the failure either rolledback or rolled forward a transaction that was considered applied by Replicat.The checkpoint table guarantees consistency after recovery.

2.2.7 Overview of Manager

Manager is the control process of Oracle GoldenGate. Manager must be running on each system inthe Oracle GoldenGate configuration before Extract or Replicat can be started, and Manager must remain running while those processes are running so that resource managementfunctions are performed. Manager performs the following functions:

(1) Monitor and restart OracleGoldenGate processes.

(2)Issue threshold reports, forexample when throughput slows down or when synchronization latency increases.

(3)Maintain trail files and logs.

(4)Allocate data storage space.

(5)Report errors and events.

(6)Receive and route requests from theuser interface.

One Manager process can control many Extract or Replicat processes. On Windows systems,Manager can run as a service. For more information about the Manager process, see Chapter 2.

2.2.8 Overview of Collector

Collector is aprocess that runs in the background on the target system. Collector receives extracted database changes that are sent across the TCP/IP network, and it writes them toa trail or extract file. Typically, Manager starts Collector automatically whena network connection is required. When Manager starts Collector, the process isknown as a dynamic Collector, and Oracle GoldenGate users generally do notinteract with it. However, you can run Collector manually. This is known as a staticCollector. Not all Oracle GoldenGate configurations use a Collector process.

When a dynamicCollector is used, it can receive information from only one Extract process, sothere must be a dynamic Collector for each Extract that you use. When a staticCollector is used, several Extract processes can share one Collector. However,a one-to-one ratio is optimal. The Collector process terminates when theassociated Extract process terminates.

By default, Extract initiates TCP/IP connections from the source system to Collector on thetarget, but Oracle GoldenGate can be configured so that Collector initiatesconnections from the target. Initiating connections from the target might berequired if, for example, the target is in a trusted network zone, but thesource is in a less trusted zone.

2.3 Overview of processing methods

Oracle GoldenGate can be configured for thefollowing purposes:

(1) A static extraction of selecteddata records from one database and the loading of those records to anotherdatabase.

(2) Online or batch extraction andreplication of selected transactional data changes and DDL changes (forsupported databases) to keep source and target data consistent.

(3) Extraction from a database andreplication to a file outside the database.For these purposes, OracleGoldenGate supports the following processing modes.

(4) An online process runs untilstopped by a user. Online processes maintain recovery checkpoints in the trailso that processing can resume after interruptions. You can use online processesto continuously extract and replicate transactional changes and DDL changes(where supported).

(5) A batch run, or special run,process extracts or replicates database changes that were generated withinknown begin and end points. For special runs, Oracle GoldenGate does notmaintain checkpoints. Should a process fail, the job can be started over, usingthe same begin and end points. You can use a special run to process a batch ofdatabase changes (such as to synchronize source and target objects once a dayrather than continuously) or for an initial data load.

(6) A task is a special type of batchrun process and is used for certain initial load methods.

A task is aconfiguration in which Extract communicates directly with Replicat over TCP/IP.Neither a Collector process nor temporary disk storage in a trail or file isused.

2.4 Overview of groups

To differentiateamong multiple Extract or Replicat processes on a system, you define processinggroups. For example, to replicate different sets of data in parallel, you wouldcreate two Replicat groups.

A processinggroup consists of a process (either Extract or Replicat), its parameter file,its checkpoint file, and any other files associated with the process. ForReplicat, a group also includes the associated checkpoint table.

You definegroups by using the ADD EXTRACT and ADD REPLICAT commands in the Oracle GoldenGatecommand interface, GGSCI. A group name can be as follows:

All files andcheckpoints relating to a group share the name that is assigned to the group itself.Any time that you issue a command to control or view processing, you supply agroup name or multiple group names by means of a wildcard.

2.5 Overview of the Commit Sequence Number (CSN)

When workingwith Oracle GoldenGate, you might need to refer to a Commit Sequence Number, o rCSN. The CSN can be required to position Extract in the transaction log, to repositionReplicat in the trail, or for other purposes. It is returned by some conversionfunctions and is included in reports and certain GGSCI output.

A CSN is anidentifier that Oracle GoldenGate constructs to identify a transaction for the purposeof maintaining transactional consistency and data integrity. It uniquelyidentifies a particular point in time in which a transaction commits to thedatabase.

Each kind ofdatabase management system generates some kind of unique serial number of itsown at the completion of each transaction, which uniquely identifies thattransaction.

A CSN capturesthis same identifying information and represents it internally as a series ofbytes, but the CSN is processed in a platform-independent manner. A comparisonof any two CSN numbers, each of which is bound to a transaction-commit recordin the same log stream, reliably indicates the order in which the twotransactions completed.

The CSN value isstored as a token in any trail record that identifies the beginning of a transaction.This value can be retrieved with the @GETENV column conversion function and viewedwith the Logdump utility.

Extract writes anormalized form of the CSN to external storage such as the trail files and thecheckpoint file. There, the CSN is represented as a hex string of bytes. Innormalized form, the first two bytes represent the database platform, and theremainder of the string represents the actual unique identifier.

The CSN is alsoincluded in report output, error messages, and command input and output (asappropriate) in human-readable, display form that uses native characterencoding. In this form, the database type is not included, but it can besupplied separately from the identifier.

Table 3 Oracle GoldenGate CSN values perdatabase1 (continued)


All databaseplatforms except Oracle, DB2 LUW, and DB2 z/OS have fixed-length CSNs, whichare padded with leading zeroes as required to fill the fixed length. CSNs thatcontain multiple fields can be padded within each field, such as the SybaseCSN.

-------------------------------------------------------------------------------------------------------

版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!

Blog: http://blog.csdn.net/tianlesoftware

Weibo: http://weibo.com/tianlesoftware

Email: tianlesoftware@gmail.com

Skype: tianlesoftware

-------加群需要在备注说明Oracle表空间和数据文件的关系,否则拒绝申请----

DBA1 群:62697716(满); DBA2 群:62697977(满)DBA3 群:62697850(满)

DBA 超级群:63306533(满); DBA4 群:83829929(满) DBA5群: 142216823(满)

DBA6 群:158654907(满) DBA7 群:69087192(满)DBA8 群:172855474

DBA 超级群2:151508914 DBA9群:102954821 聊天 群:40132017(满)

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics