Folder Structure
The dataset generated by HEDGeOPF consists mainly in CSV files organized in nested subfolders with a well-defined level hierarchy that is explained hereafter. A complete, exemplary dataset with 100 AC-OPF samples is available at this folder. This is generated with two remote workers for the pglib_opf_case5_pjm test case.
Main Level
At the upper level there are as many folders as the number of power systems configurations for which AC-OPF instances are generated.
dataset/
├── C0/
│   ├── bus/
│   ├── .../
│   └── ...
├── C1/
│   └── .../
├── ...
├── map.csv
├── polytope.csv
└── rng_state.binCurrently, only the original power system topology C0 specified by the .m file is considered and there is no routine to generate modified topologies.
At this level, three files are available:
- The rng_state.binfile contains the final RNG state of the simulation. The user can employ it to generate additional AC-OPF instances for the same dataset.
- The polytope.csvfile contains the matrixAand vectorb, as stacked horizontally, that defined the convex polytopeAx <= bfor load sampling.
- The map.csvfile maps the unique identifier (and other global OPF information) of each instance to its location within the nested folder structure. It is organized into labelled columns as shown hereafter.
| Key | Type | Unit | Description | 
|---|---|---|---|
| uid | Int64 | – | Unique identifier of the AC-OPF instance | 
| config | Int64 | – | Index of the power system configuration used to generate the instance | 
| worker | Int64 | – | Index of distributed worker that generated the instance | 
| case | Int64 | – | Index of the case among those generated by the specified worker | 
| termination_status | Int64 | – | Binary termination status | 
| pd_tot | Float64 | [p.u.] | Total load active power of the instance | 
| objective | Float64 | [€] | Objective value of the AC-OPF solution | 
| solve_time | Float64 | [s] | Time taken by the solver to solve the instance | 
The UID can also be used to check results reproducibility. Indeed, the dataset generation is deterministic up to instance ordering in the dataset, since samples are assigned to workers dynamically with distributed computing. UID is assigned by sorting AC-OPF instances based on total load active power pd_tot and objective value objective. Therefore, by ordering instances based on UID it is possible to check if different runs for the same configuration YAML file are identical.
Topology Level
Each topology folder has the same internal structure and consists of a set of subfolders that mimic the organization of PowerModels' Results Data Dictionary, specifically the "solution" one. This means that for each existing PowerModels component (e.g., bus. branch, etc.), there is a corresponding folder containing CSV files for different variable types. Since AC-OPF instances are generated with distributed computing, each remote worker worker processes dynamically a fraction of the total number of cases. These are saved in dedicated CSV files (one per variable) labelled as variable-worker.csv. Consequently, a single AC-OPF instance is split and saved at the same row of multiple CSV files, all generated by the same worker.
C0/
├── bus/
│   ├── va-1.csv
│   ├── va-2.csv
│   ├── ...
│   ├── vm-1.csv
│   ├── vm-2.csv
│   └── ...
├── branch/
│   ├── pf-1.csv
│   ├── pf-2.csv
│   └── ...
├── gen/
│   └── ...
├── load/
│   └── ...
├── .../
├── graph.xlsx
├── info-1.csv
├── info-2.csv
└── ...Each CSV file has:
- as many columns as the number of components for the given variables
- as many rows as the number of feasible AC-OPF instance generated by the given worker plus a row for the component indices
This is exemplified in the table below for the voltage magnitude variable vm at the buses of the pglib_opf_case5_pjm.m test case.
| 1 | 2 | 3 | 4 | 5 | 
|---|---|---|---|---|
| 1.060 | 1.031 | 1.033 | 0.997 | 1.001 | 
| 1.044 | 1.018 | 1.025 | 0.972 | 0.975 | 
| 1.058 | 1.029 | 1.048 | 0.994 | 0.999 | 
| ... | ... | ... | ... | ... | 
The graph.xlsx file contains, in multiple sheets representing the different power system component types, all the static, input features defined in PowerModels' Network Data Dictionary. However, differently from PowerModels' dictionary, information is reported in the XLSX file only for those components (e.g., generators) that are active under the given topology.