We provide the data in two formats: processed with only the variables used in our paper for ML model evaluation and the full raw VASP output.
Download .zip (~5Mb) from attachments, or view online in the Data Catalog (scroll down to see the list).
Use DVC, see instructions
defects.csv.gz
_id
unique structure identifierdescriptor_id
identifier of the defect type as specified in descriptors.csv
defect_id
unusedenergy
total potential energy of the system as reported by VASP, eVenergy_per_atom
total potential energy of the system divided by the number of atoms, eVfermi_level
Fermi level, eVhomo
is highest occupied molecular orbital (HOMO) energy, eVlumo
is lowest unoccupied molecular orbital (LUMO) energy, eVnormalized_homo
is HOMO value normalised respective to the host valence band maximum (VBM) (see section "DFT computations" in the paper), eVnormalized_homo
is LUMO value normalised respective to the host valence band maximum (VBM) (see section "DFT computations" in the paper), eVE_1
is the energy of the first Kohn–Sham orbital of the structure with defect (see section "DFT computations" in the paper), eVhomo_lumo_gap
is the HOMO-LUMO gap, LUMO - HOMO, eVtotal_mag
is the total magnetisation*_{majority,minority}
are the corresponding quantities computed for the majority and minority spin channels for materials computed with spinband_gap
OBSOLETEtargets.csv.gz
Same as defects.csv.gz
plus additional derivative variables:
formation_energy
is the defect formation energy, computed according equation 1 from the paperformation_energy_per_site
is the defect formation energy divided by the number of defects according to equation 2 from the paper*_{min,max}
are the minimim and maximum of quantities with the respect to to different spin channelsinitial.tar.gz
The archive initial.tar.gz
contains the unrelaxed structures in the CIF format. Names correspond to the unique identifiers _id
in defects.csv.gz
. Note that the structures were relaxed prior to computing the properties.
descriptors.csv
_id
unique identifier of the defect type, corresponds to the descriptor_id
column in defects.csv
description
is a short semantic abbreviation of the defect typebase
is the chemical formula of the pristine materialcell
is the supercell sizedefects
is a dictionary describing each point defectelements.csv
Contains chemical potentials (in eV) of the elements, to be used in formation energy computation.
initial_structures.csv
Contains the properties of pristine material.
base
is the chemical formula of the pristine materialcell_size
is the supercell sizeenergy
total potential energy of the system, eVfermi
is the Fermi level, eVE_1
is the energy of the first Kohn–Sham orbital of the pristine structure (see section "DFT computations" in the paper), eVE_VBM
is the energy of the valence band maximum of pristine structureunit_cells/*.cif
Unit cells of the pristine materials used to produce the structures in the folder.
If you use the dataset, please cite the following paper:
Huang, P., Lukin, R., Faleev, M. et al. Unveiling the complex structure-property correlation of defects in 2D materials based on high throughput datasets. npj 2D Mater Appl 7, 6 (2023). https://doi.org/10.1038/s41699-023-00369-1