2DMD dataset

We provide the data in two formats: processed with only the variables used in our paper for ML model evaluation and the full raw VASP output.

Processed

Download .zip (~5Mb) from attachments, or view online in the Data Catalog (scroll down to see the list).

Raw VASP

Use DVC, see instructions

File format

Example data reading notebook

Relative link, GitHub

defects.csv.gz

  1. _id unique structure identifier
  2. descriptor_id identifier of the defect type as specified in descriptors.csv
  3. defect_id unused
  4. energy total potential energy of the system as reported by VASP, eV
  5. energy_per_atom total potential energy of the system divided by the number of atoms, eV
  6. fermi_level Fermi level, eV
  7. homo is highest occupied molecular orbital (HOMO) energy, eV
  8. lumo is lowest unoccupied molecular orbital (LUMO) energy, eV
  9. normalized_homo is HOMO value normalised respective to the host valence band maximum (VBM) (see section "DFT computations" in the paper), eV
  10. normalized_homo is LUMO value normalised respective to the host valence band maximum (VBM) (see section "DFT computations" in the paper), eV
  11. E_1 is the energy of the first Kohn–Sham orbital of the structure with defect (see section "DFT computations" in the paper), eV
  12. homo_lumo_gap is the HOMO-LUMO gap, LUMO - HOMO, eV
  13. total_mag is the total magnetisation
  14. *_{majority,minority} are the corresponding quantities computed for the majority and minority spin channels for materials computed with spin
  15. band_gap OBSOLETE

targets.csv.gz

Same as defects.csv.gz plus additional derivative variables:

  1. formation_energy is the defect formation energy, computed according equation 1 from the paper
  2. formation_energy_per_site is the defect formation energy divided by the number of defects according to equation 2 from the paper
  3. *_{min,max} are the minimim and maximum of quantities with the respect to to different spin channels

initial.tar.gz

The archive initial.tar.gz contains the unrelaxed structures in the CIF format. Names correspond to the unique identifiers _id in defects.csv.gz. Note that the structures were relaxed prior to computing the properties.

descriptors.csv

  1. _id unique identifier of the defect type, corresponds to the descriptor_id column in defects.csv
  2. description is a short semantic abbreviation of the defect type
  3. base is the chemical formula of the pristine material
  4. cell is the supercell size
  5. defects is a dictionary describing each point defect

elements.csv

Contains chemical potentials (in eV) of the elements, to be used in formation energy computation.

initial_structures.csv

Contains the properties of pristine material.

  1. base is the chemical formula of the pristine material
  2. cell_size is the supercell size
  3. energy total potential energy of the system, eV
  4. fermi is the Fermi level, eV
  5. E_1 is the energy of the first Kohn–Sham orbital of the pristine structure (see section "DFT computations" in the paper), eV
  6. E_VBM is the energy of the valence band maximum of pristine structure

unit_cells/*.cif

Unit cells of the pristine materials used to produce the structures in the folder.

Citation

If you use the dataset, please cite the following paper:

Huang, P., Lukin, R., Faleev, M. et al. Unveiling the complex structure-property correlation of defects in 2D materials based on high throughput datasets. npj 2D Mater Appl 7, 6 (2023). https://doi.org/10.1038/s41699-023-00369-1

Attachments

2d-materials-point-defects-all.zip Download (5.14 MB)