Friday, 19 May 2017

Technical Project Report

Technical Project Report

Classification And Regression via Integer Optimization 
Abstract
Technical Project ReportThis reports covers the complete descriptive form of the project implemented related to Classification and Regression via Integer Optimization, motivated by the advances made in integer optimization. The technically statistical data formats and regression of integers is done in this project that acts similar to an application CRIO, with functional prospect. This report covers the entire pre requirements and functional working elements of proposed software. The data preparation required for classification and regression phases is thoroughly discussed in this report. The technical design of this application is also shown to get better understanding of the system with its evaluation techniques for data regression as per its chunks distribution. The structural diagram of the entire system shows the complete data flow and its proc3essing through the system with applied processed of dividing the available data chunks into classes and applying of regression process. This design shows the set data flow through the application how data is arranged and passed onwards with applied techniques and outcomes at every phase of the complete integer optimization. The complete evaluation and its efficacy are tested by taking different tests in account with varying data categories and non-similar data parameters. The results obtained are analyzed with previous readings or mainly from the comparative analysis of same data with different comparative working parameters. Further this report discusses the complete output means and the varying result formats under different data categories. The present work done is entirely discussed in this report. The outputs are tested depending on the readings and regression done close to same at every attempt with similar data and applied techniques to achieve optimal results. The complete process in followed with collective data distributed in class upon its given range and requirements and then the data read from file is applied the regression in classes with grouping mechanism to differentiate their outputs with easy recognition mechanism.
Introduction
CRIO provides a complete new range for solving the problem of data classification and regression. The studied research paper for thoroughly applying and understanding the complete working of CRIO process is mentioned in the reference section through which the complete process is followed. The entire execution of this application specific code implementation is followed with the process initiation from data classification to data regression and the optimal integer regression is achieved. The used approach for data transformation is Non-LinerTransformation. That is x into x^2 logX and 1/x. and then choose that which had least sum of square error to achieve the optimal case of integer classification that is least prune to error.
The complete process follows with data clustering in which data set in which I choose by following step 1 as X and original Y. This approach follows the data chunks clusters that set one data set as the testing data and other one as training. The regression algorithm selected for optimal integer results is “nearest neighbor algorithm”. It is used in more refined form with effective code that produces the nearest neighbor results to every selected point. It creates the optimal L number of clusters. In the assumed case the selected clusters count is 10. Then applied integer optimization to convert L no of clusters to K number of groups is done that result in provided case as 5.
The application scenario is it fetched the data in integers only the positive integers from a text file and it is clustered then with no linear data transformation that distinguishes the data chunks. This non-linear data transformation arranges the data class and then the testing and training data division is done that matches the relative difference of every testing data point with the training set series points one by one and the shortest distance points from the considered one are separated by applying the near neighbor algorithm. This algorithm counts every point dependent on the relative distance of these points under consideration. This is how the whole process is followed.
The complete project objective is to find the optimal integer regression based on effective data classification.
The figure below shows the effective results from non-linear data transformation.
Fig 1: Non-Linear Data Transformation
This very method of non-liner data transformation is selected due to the following reasons:
1.In this regression form the data is observed in a modular function format with non-linearcombination of the modular combinations in integers of source file.
2.The data is arranged in successive approximation format.
3.The data is entirely composed of error free completely independent variables. The explanatory variable is X here that is independent which is clearly shown with the relation of X^2 LOG with varying vales of X and the dependent relation Y is represented by 1/X form.
4.The systematic error is the only chance while dealing with regression analysis of data but it does not deal with it and is not under its scope.
A few other non-linear functions such as of regression include:
Exponential functions, logarithmic functions, exponential functions and Gaussian Functions etc.
The detailing of such non-linear transformation and their relevant equations are given below to give a deep insight to them and their working.
Table 1: Non-linear Transformations
These all methods along with their transformation formulas to linear the data sets are used for regression purpose in typical data mining tasks. The table above shows this all very well along with their independent and depending variables.
The regressive analysis with its applied effective distinction is clearly to the data points provided in source text file are shown below in SVG Format:
 
 
Fig 2: Regressive Analysis Example
Data Preparation
The data required for this processing is an integral data set dependent on the sele cted source as per varying requirements for dat a mining. Usually the normal practice fetches the data from a database (the relation structure of varying data sources and structures).
The selected source in this appli ed project is a text file which contains integers t hat can be altered and are even not fixed. T here are non-fixed data pre planning phases just random data chunks are taken and then the non-liner data transformation is applied to get erro r free transformation results.
The only applied constraint to this process is that it is a time taking process that requires an overhead in processing. It is a ti me consuming process with no cure and notification for systematic error. So this kind of constraints or not under its scope. 
System Design
The used transformation technique for data regression in this application is Non-Linertransformation. It complete working is shown on practical svg format in above sections. The entire clustering thing, the data testing and training cluster distribution, the applied data mining techniques and regression phases are discussed thoroughly.
The used algorithm in this application is “Nearest neighbor Algorithm”. It checks the each and every point in the testing clusters and matches them with every point of training phase to figure out the differences among them. In this way a distance based calculation is done for every point with each other.
Fig 3: Nearest Neighbor Algorithm
This figure shows the actual working of nearest neighbor algorithm. Here the point x is the point under testing phase of a cluster and the neighboring points are the ones that are being trained against this point X. the distance is measured in this algorithm and regression is applied. The outcomes indicate the nearest points and the distribution of them as per the varying distances among them.
The problems encountered while doing this entire project and carrying out its complete working scenarios are:
It is a time taking process that requires an overhead in processing. It is a time consuming process with no cure and notification for systematic error. So this kind of constraints or not under its scope.
The core objective in comparison to traditional integer optimization techniques of data mining with this solution is that:
It is really hard to process a huge data set accurately. It can be made easy through integer optimization. The research paper is about CRIO(classification and regression via integer optimization). Regression part is done in this paper.
The example scenario taken in this project is marks of students in two different subjects that are in integers. The source text consist of these marks vales.
System Evaluation
The taken example as stated above also is of two student’s marks in different subjects. The data is given in a source text file that consist f these data integers.
This data transformation and nearest algorithm implementation technique can be tested by varying the input source values and estimating the functional working of this regression estimation and algorithm working. The intelligent data mining is done via this approach. For example the weather condition is two different cities can be tested using same information content and source file with varying integer constituents.
I have analyzed the working for this entire developed logic with altering the data values in the source file and examine the results and output accuracy as per the variations in input given.
Output:
The output in the graphical format is shown below that distinguishes the clustered data points shows the applied clustering and nearest algorithm findings in the produced graph and easy help the user to identify them.
Conclusion
 
 
The clustering mechanism reduces the data dimensionality in the provided data if they are vectors. The used non-liner transformation increases and enhances the predictive power of used variables. CRIO mechanism entire working relies on the clustering approach that intelligently divides the data in classes from where the whole proceedings are followed. Still there is a need to come up with future optimization techniques with help of intermixing techniques of integers.
References
Arthanari, T. S., Y. Dodge. 1993. Mathematical Programming in Statistics. Wiley Classics Library Edition, Wiley–Interscience, New York.
Bennett, K. P., J. A. Blue. 1984. Optimal decision trees. R.P.I. Math Report 214, Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY.
Bertsimas, D., R. Shioda. 2004. An algorithm for cardinality constraint quadratic optimization. In review. Bienstock, D. 1996. Computational study of families of mixed-integer quadratic programming problems. Math. Programming 74 121–140.
Bousquet, O., A. Elisseeff. 2002. Stability and generalization. J. Machine Learn. Res. 2 445–498.RomiShoiada. 2007. Classification and Regression Via Integer Optimiztion. Solan School of Management Machetes, Cambridge.

No comments:

Post a Comment