This project aims to present an overview of how users can process the information about
transactions made by their company.
Remember that you must set Java memory size to at least 512MB in order to run these graphs.
All of the graphs presented in the project are sorted by the numbers that are included in their
names in ascending order. And this is exactly the order in which these graphs
should be run.
The basic files of the projects are the following three: Transactions.dat,
Children.dat and Spouses.dat. These files are
delimited. In other words, their fields are separated from each other
by semicolon.
The Transactions.dat file contains an information about the transactions that were
performed, the customers who made the orders and the employees of the
company who served these orders.
More detailed description of these three basic files:
Transactions.datChildren.datSpouses.dat
The company realized
100,000 transactions based on orders that had been made by 19,955
customers and these customers were served by 200 employees.
A01_SplittingTransactions.grf), data records from the Transactions.dat file are splitted and parsed by Reformat and only some information is selected for each output port. Thus, each of the resulting output files contains only a part of the whole information: IDs.dat file contains the following fields: “AmountID”, “CustomerID” and “EmployeeID”. Amounts.dat file contains “AmountID” and “Amount” (of money) fields. Customers.dat file contains “CustomerID”, “CustomerState”, “CustomerSurname” and “CustomerFirstname”. Employees.dat file contains “EmployeeID”, “EmployeeSurname” and “EmployeeFirstname”. These files are delimited. Their fields are also separated from each other by semicolon. Note that (for each data flow) we have removed all duplicate data records (since more customers made the orders and more employees served the customers). To duplicate data records, as the first step, we needed to sort them according to some key fields using ExtSort and, as the second step, we removed duplicate fields using the Dedup component. Therefore, the resulting files contain 200 employees, 19,955 customers, 100,000 amounts and 100,000 IDs. These IDs serve to interconnect customers and employees with amounts of money paid. We have written these output files to data-tmp, since we will use them as input data files in the next graphs.A02_CreatingXLSEmployeesWithFamily.grf), we take all employees from the Employees.dat file, their children from the Children.dat file and their wives and husbands from the Spouses.dat file and write them all to a single XLS file (EmployeesWithFamily.xls). Also this file will be written to the data-tmp directory. (As we will need this file as an input file in the next graphs.) Employees, their children and spouses will be written to Employees, Children and Spouses file sheets of the output EmployeesWithFamily.xls file, respectively. Note that we have sorted all children according to their names. And note that the same “EmployeeID” is named “ParentID” in the Children.dat file. Each employee can have at most 3 children.A03_ConvertingCustomersFromDelimitedToFixed.grf), we demonstrate how delimited data file (Customers.dat) can be transformed to a fixed data file (CustomersFixed.txt). To transform delimited data files to fixed data files and vice versa, you can use the SimpleCopy component. The output CustomersFixed.txt file will also be written to the data-tmp directory. We will need this file in the next graphs.A04_SortingTransactions.grf), we calculate a statistical information about transactions (we use the Transactions.dat file again). In this graph, we use Aggregates to get very useful statistical information. These components can use a key. For this reason, we had to sort data flows according to the same keys and remove duplicates before these data flows could enter these Aggregates. Note that each Aggregate requires the information sorted. Thus we had to sort the records and deduplicate them (if needed) before they entered the Aggregate components. We calculated the following statistical information:data-out directory as the TransactionsForCustomers.txt file since we will not parse it in the next graphs of this project.)data-out directory as the TransactionsForEmployees.txt file since we will not parse it in the next graphs of this project.)data-out directory as the TransactionsForStatesWithinEmployees.txt file since we will not parse it in the next graphs of this project.)data-out directory as the CustomersForStates.txt file since we will not parse it in the next graphs of this project.)data-out directory as the TransactionsForCustomers.txt file since we will not parse it in the next graphs of this project.)data-out directory as the EmployeesForCustomers.txt file since we will not parse it in the next graphs of this project.)data-out directory as the CustomersForEmployees.txt fiel since we will not parse it in the next graphs of this project.)data-out directory since we will not parse them in the next graphs of this project and they are all delimited by semicolon.A05_CreatingXMLEmplFamCustAm.grf), we have created XML structures based on different mapping hierarchy. In this graph, the hierarchy is employee (the highest element), family, customer, amount. Each record is written to different output file to the data-tmp directory since we will need to read them again in the next graphs of the project, named with mask EmplFamCustAm$$$.xml, where dollars are substituted by record's number. We have narrowed down the number of files to 15 while we didn't want to clutter the directories, but without this limitation, there would be created 200 highest level elements. These XML files are created using delimited files (IDs.dat, Employees.dat), fixed file (CustomersFixed.txt) and XLS file (EmployeesWithFamily.xls).A06_CreatingXMLCustEmplFamAm.grf), we have created XML structures based on different mapping hierarchy. In this graph, the hierarchy is customer (the highest element), employee, family, amount. Each record is written to different output file to the data-tmp directory since we will need to read them again in the next graphs of the project, named with mask CustEmpFamAm$$$.xml, where dollars are substituted by record's number. We have narrowed down the number of files to 15 while we didn't want to clutter the directories, but without this limitation, there would be created 19,955 highest level elements. These XML files are created using delimited files (IDs.dat, Employees.dat), fixed file (CustomersFixed.txt) and XLS file (EmployeesWithFamily.xls).A07_CreatingXMLAmCustEmplFam.grf), we have created XML structures based on different mapping hierarchy. In this graph, the hierarchy is amount (the highest element), customer, employee, family. Each record is written to different output file to the data-tmp directory since we will need to read them again in the next graphs of the project, named with mask AmCustEmpFam$$$.xml, where dollars are substituted by record's number. We have narrowed down the number of files to 10 while we didn't want to clutter the directories, but without this limitation, there would be created 100,000 highest level elements. These XML files are created using delimited files (IDs.dat, Employees.dat), fixed file (CustomersFixed.txt) and XLS file (EmployeesWithFamily.xls).A08_CreatingXMLTransactionsFamily.grf), we have created XML structures based on different mapping hierarchy. In this graph, the hierarchy is amount (the highest element), customer, employee, family. Each record is written to different output file to the data-tmp directory since we will need to read them again in the next graphs of the project, named with mask TransactionsFamily$$$.xml, where dollars are substituted by record's number. We have narrowed down the number of files to 10 while we didn't want to clutter the directories, but without this limitation, there would be created 100,000 highest level elements. These XML files are created using the original delimited files (Transactions.dat, Children.dat and Spouses.dat).
These XML files can be read using XMLExtract or XMLXPathReader. Note the difference of mapping within the same readers when reading data with different hierarchy and note also the difference between XMLExtract and XMLXPathReader. All output files are written to the data-out directory. We will not parse them in the next graphs of this project.
A09_XMLExtractEmplFamCustAm.grf), we read the already created EmplFamCustAm???.xml files contained in the data-tmp directory. These XML files are read and five output files are created. They are similar to the original delimited files except the last one, where “EmployeeID” and “CustomerID” are included in one file together with “AmountID” and “Amount”.A10_XMLExtractCustEmplFamAm.grf), we read the already created CustEmplFamAm???.xml files contained in the data-tmp directory. These XML files are read and five output files are created. They are similar to the original delimited files except the last one, where “EmployeeID” and “CustomerID” are included in one file together with “AmountID” and “Amount”.A11_XMLExtractAmCustEmplFam.grf), we read the already created AmCustEmplFam???.xml files contained in the data-tmp directory. These XML files are read and five output files are created. They are similar to the original delimited files except two of them (for employees and customers), where “AmountID” is included in these files together with the other employee or customer fields.A12_XMLExtractTransactionsFamily.grf), we read the already created TransactionsFamily???.xml files contained in the data-tmp directory. These XML files are and three output files are created. They are similar to the original delimited files: Transactions.dat, Children.dat and Spouses.dat.
The following four graphs read the same XML files in a different way. We use XMLXPathReader for them. Note the different mapping as compared with XMLExtract mapping. Note also that mapping depends on the hierarchy of XML structure. All output files are written to the data-out directory. We will not parse them in the next graphs of this project.
A13_XMLXPathEmplFamCustAm.grf), we read the already created EmplFamCustAm???.xml files contained in the data-tmp directory. These XML files are read and five output files are created. They are similar to the original delimited files except the last one, where “EmployeeID” and “CustomerID” are included in one file together with “AmountID” and “Amount”.A14_XMLXPathCustEmplFamAm.grf), we read the already created CustEmplFamAm???.xml files contained in the data-tmp directory. These XML files are read and five output files are created. They are similar to the original delimited files except the last one, where “EmployeeID” and “CustomerID” are included in one file together with “AmountID” and “Amount”.A15_XMLXPathAmCustEmplFam.grf), we read the already created AmCustEmplFam???.xml files contained in the data-tmp directory. These XML files are read and five output files are created. They are similar to the original delimited files except two of them (for employees and customers), where “AmountID” is included in these files together with the other employee or customer fields.A16_XMLXPathTransactionsFamily.grf), we read the already created TransactionsFamily???.xml files contained in the data-tmp directory. These XML files are read and three output files are created. They are similar to the original delimited files: Transactions.dat, Children.dat and Spouses.dat.