Tool to find how a given word is used in a corpus

One of my good friend is a professional translator. She often needs to look in several texts to find how a pair of words are used together, and I figured I could make her task easier with a simple python script to search through a corpus.

The app takes a directory as input and recursively open all text files and html files in the directory and its subdirectory to find a word or a pair of words. Once the whole directory is processed, it prints an extract of the sentence where the match was found. The output is pretty similar to that of the online translator Linguee.com, except that here we can choose the texts to analyze to make sure the found occurrences are reliable.

I also created a companion-tool, Website downloader, to be able to download a whole website into a directory which can then be fed to text extractor.

It took me a whole day from 8am to 10pm to code both this text window extractor and Website downloader. For both tools, coding the GUI and linking its input/output to the backend script took half of the development time.

Although I feel like I was really slow on the GUI part, it's very gratifying to code a fully functional (albeit simple!) program that someone else will use. Usually I create non-graphical scripts for myself so this time feels like a stepping stone towards creating apps rather than scripts.

Download & installation

You can download the app for Windows on the release page. Just look for the .exe file on the page and click on it to download.

Because it's coded in python, the app also works on MacOS and Linux. But I didn't create a binary for these platform. You can checkout the GitHub repo and run the python code directly.

Screenshot

Interface for the tool