statesnap-miner

The State Snapshot Miner constructs Petri net models from role trace logs. This has been applied to data from the CGED-Q, a digital version of Qing civil service records. A paper on this research was presented at ICPM2023.

Running

Requires Python 3.11 or later.

Installing

Clone the repo and create a virtual environment of your choice.

Install dependencies with

pip install -r requirements.txt

Visualisation depends on Graphviz being installed and on the PATH, specifically the dot executable.

Running the example

The example script, snapeg.py, uses a toy CSV log from the unit tests, and outputs a model diagram.

$ python snapeg.py
Output model to test_ssnap1.png.

Implementations of the miner and code to work with state snapshot logs are in the ssnap module, and can be used as a library.

Testing

python -m unittest

Qing data from the CGED-Q

Scripts for working with CGED-Q data are in the cgedq package. There are two steps.

Convert - conv.py - create state snapshot logs for different types of officials as CSV files in the var/ directory
Mine - mine.py - run the state snapshot miner

These two scripts form a (simple) data pipeline and there is much in them specific to the input data and target extracts. It is best to think of them as project-specific worked examples. They definitely aren't a supported API.

Convert

conv.py converts CSV or Stata extracts into CSV files in state snapshot log form. Multiple files are produced according to different slices of the data. Role conflation of less frequent roles also happens here. Check help for extra options.

python -m cgedq.conv <datafile>

Some example extracts of the CGED-Q public data release 1850-1864 are included as CSV files. This example will create state snapshot logs from Jinshi officials in the 1850-1864 period.

python -m cgedq.conv cged-q-jinshi_1850-1864.csv --tmlfile tml_1850-1864.csv --rebuild

Mine

mine.py takes snapshot CSV files, performs further filtering, runs the state snapshot miner on them, producing PNG output. Output can be varied to eg PNML or PDF by changing the script.

python -m cgedq.mine

This produces several output models covering different subsets and durations.

Top candidate (状元) first three years - English

Top candidate (状元) first three years - Chinese

Appointment Database

To help explore the data with standard SQL tools, a SQLLite database called appoint.db is built in the home directory when conv.py runs.

Public Extracts Included

As the full set of CGED-Q data is not in public release, it is not included in this project. Do note that there are public releases of extracts covering 1850-1864 and 1900-1911.

Records for 1850-1864 are in the data directory. These extracts include only a subset of the attributes available in the CGED-Q, and include some basic data normalisation that suited this project, such as standardising hanzi character variants and including only officials with surnames (which excludes most Manchu and Mongolians). The exact code used for the extract is in process_public_extract() function in conv.py.

cged-q-allclean_1850-1864.csv - All records
cged-q-jinshi_1850-1864.csv - Jinshi officials
cged-q-t1jtall_1850-1864.csv - Exam Tier 1 placed officials
cged-q-t2jtall_1850-1864.csv - Exam Tier 2 placed officials
roletrans.csv - Small Chinese-English dictionary for role names
tml_1850-1864.csv - Timinglu exam records with CGED-Q person_id, primarily for exam tier

If using the data beyond preliminary investigatory work, please do reach out to the experts at the Lee-Campbell group, and of course cite their work.

References

Burke, A., Leemans, S.J.J, Wynn, M.T., and Campbell, C.D. (2023). State Snapshot Process Discovery on Career Paths of Qing Dynasty Civil Servants. ICPM2023.

Chen, B., Campbell, C., Ren, Y., & Lee, J. (2020). Big Data for the Study of Qing Officialdom: The China Government Employee Database-Qing (CGED-Q). Journal of Chinese History, 4(2), 431–460. https://doi.org/10.1017/jch.2020.15

Campbell, C. D., Chen, B., Ren, Y., & Lee, J. (2022). China Government Employee Database-Qing (CGED-Q) Jinshenlu Public Release [Data set]. DataSpace@HKUST. https://doi.org/10.14711/dataset/E9GKRS

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
cgedq		cgedq
data		data
images		images
pm		pm
tests		tests
var		var
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
snapeg.py		snapeg.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

statesnap-miner

Running

Installing

Running the example

Testing

Qing data from the CGED-Q

Convert

Mine

Appointment Database

Public Extracts Included

References

About

Releases 4

Packages

Languages

License

adamburkegh/statesnap-miner

Folders and files

Latest commit

History

Repository files navigation

statesnap-miner

Running

Installing

Running the example

Testing

Qing data from the CGED-Q

Convert

Mine

Appointment Database

Public Extracts Included

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages