Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV File Uploading Practices between Frontend and Backend #152

Open
reboottime opened this issue Aug 4, 2023 · 3 comments
Open

CSV File Uploading Practices between Frontend and Backend #152

reboottime opened this issue Aug 4, 2023 · 3 comments
Labels

Comments

@reboottime
Copy link
Owner

reboottime commented Aug 4, 2023

Introduction to CSV and its applications, challenges

what is CSV and what it used for

CSV is an acronym for Comma-Separated Values. CSV is commonly used for data exchange between different applications, importing and exporting data from spreadsheets, etc.

Structure of CSV

  • Rows : Each line in a CSV file represents a row of data. Each row typically corresponds to a record or a data entry.
  • Columns: Within each row, the values are separated by a delimiter, often a comma(,). Alternative delimiter values can also be a semicolon and tab, or any other character based on the requirements.
  • Headers(Optional): The first row of a CSV file is often used to store column names or headers. Which provide a lable for each column

Consideration and Challenges

  • General Challenges

    • Data Types: CSV treats all data as strings. If your data includes numbers or other non-text types, you may need to convert them explicitly in your code.
    • Quoting: If your data contains the delimiter character itself (e.g., a comma) or line breaks, you might need to enclose the values in quotes.
    • Encoding: Pay attention to the character encoding of your CSV files, especially when dealing with international characters.
    • Parsing Errors: Be prepared to handle cases where the CSV data doesn't follow the expected structure.
  • Challenges on parsing large csv file

  • Performance:

    • When parsing large CSV files, the browser's memory usage can increase significantly, potentially causing performance issues and even crashes, blocking the UI
    • How to indicate the progress
    • Optimization and Chunking: Efficiently parsing large CSV files require techs like chunking, where the file is processed in smaller segments to reduce memory consumption and improving performance.
  • Solutions directions

    • Web Workers: Use Web Workers to run the parsing task in a separate thread.
    • Chunking: Break down the CSV file into smaller chunks and process them sequentially. This can help manage memory and prevent long blocking times.
    • Streaming: If possible, stream the CSV data and process it in chunks as it arrives, rather than loading the entire file into memory.
@reboottime
Copy link
Owner Author

reboottime commented Aug 4, 2023

Sample solutions for above solution directions


1. Chunking a large CSV file

  • Chunking a large CSV file on the browser side involves breaking the file into smaller parts (chunks) and processing each chunk sequentially. This approach helps manage memory usage and prevents the UI from becoming unresponsive while parsing. Here's a general approach to implementing chunking in the browser:
function processCSVChunk(chunk) {
  // Process the CSV chunk here
  // you can split the chunk into lines
  // or covert the chunk into a plain text using chunk.text
}

async function processCSVFile(file) {
  const chunkSize = 1024 * 1024;
  let offset = 0;
  
 // Read the file slice
  while (offset < file.size) {
    const chunk = file.slice(offset, offset + chunkSize);
    const textChunk = await chunk.text();
    
    await processCSVChunk(textChunk);
    
    offset += chunkSize;
  }
  
  console.log('CSV processing completed');
}
  • The downside of using chunk: The downside of using chunks: A scenario I can imagine is that a single row is being split into two chunks and resulting in parsing errors."

2. Using streaming solution

Very similar to the solution of using chunk, and also potentially has the same issue of breaking a row into two different chunks and result in parsing error.

@reboottime
Copy link
Owner Author

reboottime commented Aug 4, 2023

Suggested NPM Packages for Addressing the above Challenges

The package solution I gravitate towards is PapaParse, owing to its adept handling of the challenges mentioned above. Moreover, it also supports the functionality of aborting CSV file parsing after producing a set number of results.

@reboottime
Copy link
Owner Author

reboottime commented Aug 5, 2023

Crafting the Workflow

What are expected


When orchestrating the exchange of data between distinct systems using CSV as the medium, user expects:

  • Seamless Migration: Smoothly transfer data from system A to system B.
  • Real-time Updates: Keep user informed about the the file upload and processing status and progress
  • Performance: The migration process takes a reasonable amount of time

In response to these user expectations, the design of CSV uploading systems demands consideration in these aspects:

  • Enhanced Performance: Anticipate to scenarios involving concurrent uploads of numerous files within defined timeframes.
  • Ensured Compatibility: Manage situations requiring meticulous data field mapping to ensure consistent interpretation across systems.
  • Security: Protect our system from malicious content

The Workflow

image

Performance considerations

  • consider uploading file using chunk
  • consider previewing partial records of the CSV file only via configure params of Paparse,
    • set the worker: true
    • or set the preview field
  • communicate asynchronously and processing files using multiple threading on BE side
  • Using polling with proper interval
    • polling file security checking status
    • polling file processing job status

The UI considerations

Provide user instruction and guidelines using step workflows

  • Upload section: With the capability of showing the selected file name and partial of the data
  • Fields map section: mapping the CSV file fields to the digesting system's fields
  • Preview the fields mapping result with save button
    • Some CSV uploading solutions provide field value correction UI support before uploading files
    • Some CSV uploading solutions prefer to notify user with the CSV records with bad data after the whole processing flow
  • Inform user with file uploading/processing state and status
    • One way is to use a notification badge to show user the file uploading and processing status
    • Or having a progressing page informs user the file uploading/processing result

@reboottime reboottime added the 2023 label Aug 6, 2023
@reboottime reboottime changed the title CSV and table on Frontend Side CSV File Uploading Practices between Frontend and Backend Aug 7, 2023
reboottime added a commit that referenced this issue Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant