Skip to content

Python script read excel (xlsx) and match text from text file.

License

Notifications You must be signed in to change notification settings

austin-lai/python_script_read_excel_and_match_text_from_text_file

Repository files navigation

Python script read excel (xlsx) and match text from text file

> Austin.Lai |
> -----------| September 26th, 2023
> -----------| Updated on September 26th, 2023

Table of Contents


Disclaimer

DISCLAIMER:

This project/repository is provided "as is" and without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

This project/repository is for Educational purpose ONLY. Do not use it without permission. The usual disclaimer applies, especially the fact that me (Austin) is not liable for any damages caused by direct or indirect use of the information or functionality provided by these programs. The author or any Internet provider bears NO responsibility for content or misuse of these programs or any derivatives thereof. By using these programs you accept the fact that any damage (data loss, system crash, system compromise, etc.) caused by the use of these programs is not Austin responsibility.


Description

Python script read excel (xlsx) and match text from text file.

The python script require pandas package.

All sample will be provide:

  • read_excel_and_match_text_from_text_file.py
  • data.xlsx
  • sample.txt
  • sample_found_match.txt

Script

The read_excel_and_match_text_from_text_file.py file can be found here or below:

import pandas as pd

# Read the Excel file
xls = pd.ExcelFile('data.xlsx')

# Print the sheet names
print(xls.sheet_names)

# Read the 'Named_Sheet2' tab sheet into a DataFrame
df = pd.read_excel(xls, 'Named_Sheet2')

# Convert all text to lowercase
df = df.applymap(lambda s:s.lower() if type(s) == str else s)

# Read the text file line by line with encoding='utf-16'
with open('sample.txt', 'r', encoding='utf-8') as f:
    lines = f.readlines()

# Convert all text to lowercase and store in a list
text_list = [line.lower().strip() for line in lines]

# Split each line by dot and store only the second part in the list
text_list = [line.split('.')[1] for line in text_list if '.' in line]

# Remove duplicates from the list
text_list = list(set(text_list))

print(text_list)
print('\n\n')

# Search in 'Name' or 'Link' columns for matches with the list
matching_rows = df[(df['Name'].isin(text_list)) | (df['Link'].isin(text_list))]

if not matching_rows.empty:
    matching_item = matching_rows[['Link']]
    print(matching_item)

    # Store matched 'Link' values into variables
    link = matching_rows['Link'].values

    print(link)

    # Save the values into a text file
    with open('found_match.txt', 'w') as f:
        f.write(str(matching_item))

About

Python script read excel (xlsx) and match text from text file.

Topics

Resources

License

Stars

Watchers

Forks

Languages