CSV file format is widely used portable format to store and transfer data. Its used in various domains and industries as its very easy to use and understand or interpret. Also its very easy for any program to read and process the data from CSV file. It can be treated as data stored in tabular format with a separator and most commonly used separator is xCommax. But some times when your data may contain comma, people prepare and use data that is separated by some other character e.g. semi colon. It depends upon the data you want to store. In the end if the separator is known, the program or csv file reader application that is going to read the file will be able to read it and process it.
Most of the times when your java program reads a file, it first prepares list of string and then split it and then it will populate the objects. Here we will see a Java csv reader that will give us fully read and processed data based on your domain object or data transfer object.
Software used in this example
- Java 8
- Eclipse
Below is the data that we have in our CSV, which our csv file reader java program will read, process and print.
Id,Name,Series 1,John Rambo, Rambo 2,Jack Baur, 24 3,Ethan Hunt, Mission Impossible
First of all we need to define our domain object that will be mapped to above csv data.
public class CsvData { private String id; private String name; private String series; //all getters and setters will be defined here. //Simple method to print our data public String toString() { StringBuffer sb = new StringBuffer(); sb.append(this.getId()) .append(" | ") .append(this.getName()) .append(" | ") .append(this.getSeries()); return sb.toString(); } }
Lets define our CSV reader class with some required class variables as
public class CustomCsvReaderxTx { private String seperator; private String file; private MapxString, Fieldx privateFields = new LinkedHashMapxString, Fieldx(); private ClassxTx genericType; private ListxTx data; private ListxStringx order; private ListxStringx headers; private boolean initCompleted; private boolean hasHeader;
You can see that I have added xTx generic in class declarations, that is to make sure that each csv reader is associated with its domain object and we will use that domain class to map our data. The class also has 2 constructor variants to accept different parameters like separator, file path, header etc.. But one important variable that it accepts is the class of the domain object.
public CustomCsvReader(final ClassxTx type, String file, boolean hasHeader) { this.file = file; this.hasHeader = hasHeader; this.genericType = type; this.seperator = ","; } public CustomCsvReader(final ClassxTx type, String file, boolean hasHeader, String separator) { this.file = file; this.hasHeader = hasHeader; this.genericType = type; this.seperator = separator; }
We have to ask for this class as even if we specify the domain object this information is not available at runtime due to Java Type Erasure
The class defines a init method in which we collect the class variables/field information of our mapped domain class and read the data from csv file.
private void initialize() { if (!this.initCompleted) { Field[] allFields = genericType.getDeclaredFields(); for (Field field : allFields) { if (Modifier.isPrivate(field.getModifiers())) { privateFields.put(field.getName(), field); } } try { readData(); } catch (InstantiationException | IllegalAccessException e) { this.initCompleted = false; } this.initCompleted = true; } }
In the read methods csv is read and split to form list of our domain object.
reader = new BufferedReader(new FileReader(file)); while ((line = reader.readLine()) != null) { ListxStringx row = Arrays.asList(line.split(seperator)); if (this.hasHeader){ setHeaders(row);; this.hasHeader = false; continue; } T refObject = genericType.newInstance(); int index = 0; ListxStringx listOfFieldNames = (null != getOrder()) ? getOrder() : new ArrayListxStringx(privateFields.keySet()); for(String fieldName : listOfFieldNames) { if( index x= row.size()) { break; } assign(refObject,privateFields.get(fieldName),row.get(index++)); } getData().add(refObject); } reader.close();
If ordering of variables is provided then that is used to map each csv value with the field mentioned in order. Otherwise, the csv values will be assigned as per they are collected from domain class.
This will give you simply the list of domain objects mapped to csv file. What if you want to process them first before that are passed to any other flow or application. Well for that we can have our own custom processor that can be attached to the reader.
We will define an generic interface to make the implementation consistent. This interface will have only one method which will accept the generic object as input and it will return the processed object back.
public interface CsvProcessorxTx { public T process(T inData); }
A sample processor that will convert the names to upper case can be written as
public class CustomCsvProcessor implements CsvProcessorxCsvDatax { @Override public CsvData process(CsvData inData) { inData.setName(inData.getName().toUpperCase()); return inData; } }
You can put all your transformation business logic here. This will provide you logical separation of your reading and processing.
Now how do we call it? Check it out below
String file = "src/main/resources/sample-data.csv"; ListxStringx ord = new ArrayListxStringx(); ord.add("id"); ord.add("name"); ord.add("series"); CustomCsvReaderxCsvDatax reader = new CustomCsvReaderxCsvDatax(CsvData.class, file, true) .setOrder(ord) .read() .process(new CustomCsvProcessor()); for(CsvData msg :reader.getData()) { System.out.println(msg); }
You can see above, that how I have specified the domain object, ordering and processor. Its that simple 🙂
You can download this code from Git. Please let me know if any issues in the code or any suggestions
x