FMTRAIN.DAT: ( 5 Inputs , 1 Outputs, 1024 Training Patterns, 61 KB)
This training file is used to train a neural network to perform demodulation of an FM (frequency modulation) signal containing a sinusoidal message. The data are generated from the equation
r(n) = Camp * cos[2* PI* n* Cfreq + Mamp *sin(2* PI* n* Mfreq )]
where Camp = Carrier Amplitude, Mamp = Message Amplitude, Cfreq = normalized Carrier frequency, Mfreq = normalized message frequency. In this data set, Camp = .5, Cfreq = .1012878, Mfreq = .01106328, and Mamp=5. The five inputs are r(n-2), r(n-1), r(n), r(n+1), and r(n+2). The output is Cos(2* PI* n* Mfreq ). In each consecutive pattern, n is incremented by 1.
For more details, see
K. Rohani and M. T. Manry, “The Design of Multi-Layer Perceptrons using Building Blocks,” Proc of IJCNN 91, Seattle WA., pp. II-497 to II-502.
TWOD.TRA: ( 8 Inputs , 7 Outputs, 1768 Training Patterns, 244 KB)
This training file is used in the task of inverting the surface scattering parameters from an inhomogeneous layer above a homogeneous half space, where both interfaces are randomly rough. The parameters to be inverted are the effective permittivity of the surface, the normalized rms height, the normalized surface correlation length, the optical depth, and single scattering albedo of an inhomogeneous irregular layer above a homogeneous half space from back scattering measurements.
The training data file contains 1768 patterns. The inputs consist of eight theoretical values of back scattering coefficient parameters at V and H polarization and four incident angles. The outputs were the corresponding values of permittivity, upper surface height, lower surface height, normalized upper surface correlation length, normalized lower surface correlation length, optical depth and single scattering albedo which had a joint uniform pdf.
For more details, see
M. S. Dawson, A. K. Fung and M. T. Manry, “Surface parameter retrieval using fast learning neural networks,” Remote Sensing Reviews, 1993, Vol. 7(1), pp. 1-18.
M. S. Dawson, J. Olvera, A. K. Fung and M. T. Manry, “Inversion of surface parameters using fast learning neural networks,” Proc. of IGARSS’92, Houston, Texas, May 1992, Vol II, pp 910 – 912.
The testing version of the data file TWOD.TST is also available (Size 138K)
This file was generated by Mike Dawson while he worked for Prof. Adrian Fung, at University of Texas at Arlington. Dr. Dawson currently works at Raytheon E-Systems in Garland, Texas.
SINGLE2.TRA: (16 Inputs, 3 Outputs, 10,000 Training Patterns, 1.6MB)
This training data file consists of 16 inputs and 3 outputs and represents the training set for inversion of surface permittivity, the normalized surface rms roughness, and the surface correlation length found in back scattering models from randomly rough dielectric surfaces. The first 16 inputs represent the simulated back scattering coefficient measured at 10, 30, 50 and 70 degrees at both vertical and horizontal polarization. The remaining 8 are various combinations of ratios of the original eight values. These ratios correspond to those used in several empirical retrieval algorithms.
For more details, see
A. K. Fung, Z. Li, and K. S. Chen, “Back scattering from a Randomly Rough Dielectric Surface,” IEEE Trans. Geo. and Remote Sensing, Vol. 30, No. 2, March 1992.
A. K. Fung, Microwave Scattering and Emission Models and Their Applications, Arctec House, 1994.
This file was generated by Mike Dawson while he worked for Prof. Adrian Fung, at University of Texas at Arlington. Dr. Dawson currently works at Raytheon E-Systems in Garland, Texas.
OH7.TRA: (20 Inputs, 3 Outputs, 10,453 Training Patterns, 3.1 MB)
This data set is given in Oh, Y., K. Sarabandi, and F.T. Ulaby, “An Empirical Model and an Inversion Technique for Radar Scattering from Bare Soil Surfaces,” in IEEE Trans. on Geoscience and Remote Sensing, pp. 370-381, 1992. The training set contains VV and HH polarization at L 30, 40 deg, C 10, 30, 40, 50, 60 deg, and X 30, 40, 50 deg along with the corresponding unknowns rms surface height, surface correlation length, and volumetric soil moisture content in g / cubic cm.
POW12TRN: ( 12 Inputs, 1 Output, 1414 Training Patterns, 299KB)
This training file was generated using data obtained from TU Electric Company in Texas. The first ten input features are last ten minutes power load in megawatts for the entire TU Electric utility, which covers a large part of north Texas. The output is power load fifteen minutes in the future from the current time. All powers were originally sampled every fraction of a second, and averaged over 1 minute to reduce noise. The last two inputs are respectively, the “True Area Control Error” (TACE) and the “Filtered Area Control Error” (FACE). The FACE is a combination of exponentially filtered TACE and moving average filtered TACE.
For more details, see
K. Liu, S. Subbarayan, R. R. Shoults, M. T. Manry, C. Kwan, F. L. Lewis, and J. Naccarino, “Comparison of Very Short-Term Load Forecasting Techniques,” IEEE Transactions on Power Systems, vol.11, no.2, May 1996, pp. 877-882.
M. T. Manry, R. Shoults, and J. Naccarino, “An Automated System for Developing Neural Network Short Term Load Forecasters,” Proceedings of the 58th American Power Conference, Chicago, Ill., April 9-11, 1996, vol. 1, pp. 237-241.
A testing version POW12TST (299 K) is also available for download.
MAT.TRN: (4 Inputs, 4 Outputs, 2000 Training Patterns, 644KB)
This training file provides the data set for inversion of random two-by-two matrices. Each pattern consists of 4 input features and 4 output features. The input features, which are uniformly distributed between 0 and 1, represent a matrix and the four output features are elements of the corresponding inverse matrix. The determinants of the input matrices are constrained to be between .3 and 2.
SPEECH_MAP.TRA: (39 Inputs, 117 Outputs, 4029 Training Patterns, (7.07 MB)
The speech samples are first preemphasized and it is converted into frequency domain by taking DFT. Then it is passed through Mel filter banks and the inverse DFT is applied on the output to get Mel-Frequency Cepstrum Coefficients (MFCC). Each of MFCC(n), MFCC(n)-MFCC(n-1) and MFCC(n)-MFCC(n-2) would have 13 features, which results in a total of 39 features. The desired outputs are likelihoods for the beginning, middle, and ends of 39 phonemes.
MELTING POINT: (202 Inputs and 1 Output. 4401 Training Patterns, (2.8 MB)
[1] Karthikeyan et al. General melting point prediction based on a diverse compound data set and artificial neural networks. Journal of chemical information and modeling (2005) vol. 45 (3) pp. 581-90
Obtained from: Max Kuhn (2013). QSARdata: Quantitative Structure Activity Relationship (QSAR) Data Sets. R package version 1.3. https://CRAN.R-project.org/package=QSARdata
3×3 MATRIX INVERSION: (9 Inputs and 9 Output. 10000 Training Patterns, (1.3 MB)
Each sample was created by randomly creating 3×3 matrix as inputs and its invert as outputs.
Generated by: Son Nguyen, PhD Student, IPNN Lab, UTA
WEATHER FORECASTING DATA ( 71 Inputs and 3 Outputs) (16.3 MB)
This training file contains following files:
1. Tall_TRAIN_IPNNL_Format.txt and Tall_Validation_IPNNL_Format.txt
* This tall file has Nv = 72050 .
* The data made from the years 2010 to 2013, was divided randomly in 3:1 ratio into training and validation data (Nv for training =54050; Nv for validation = 18000 patterns).
* First 4 inputs are time inputs (encoded in continuous form); Inputs 5 to 8 are spatial variables (latitude, longitude) that indicate the monitoring site/station and city the pattern comes from; Inputs 9 to 71 comprise time delayed data up to 3 days of Daily Mean, Daily Min, and Daily Max values of meteorological variables (temperature, solar radiation, wind speed and wind direction encoded together in continuous form) and pollutant variables (nitric oxide, nitrogen dioxide, 8 – hour average ozone concentration). See Table below
* Outputs are Daily Maximum 8- hour average ozone concentration up to 3 days ahead.
2. Tall_Test_IPNNL_Format.txt
* This tall file has 18000 patterns made from the year 2014 and is used for testing.
Note: Tall_Test_2014_New_Format.txt is actually Tall_Test_2014.txt with a format similar to twod.tst
Reference: Gautam R. Eapi (2015). Comprehensive neural network forecasting system for ground level ozone in multiple regions (Doctoral dissertation).