Question about the problem encountered in "Cubic Crystal Test.ipynb" under the "examples" folder in the original M3GNet #99
-
|
Could I ask a question that is not in the Matbench framework? It's a question related to the original M3GNet model on https://github.com/materialsvirtuallab/m3gnet GitHub. In M3GNet, in the file data = pd.read_html("http://en.wikipedia.org/wiki/Lattice_constant")[0]
data = data["Crystal structure"][0].
~data["Crystal structure"].isin(
["Hexagonal", "Wurtzite", "Wurtzite (HCP)", "Orthorombic", "Tetragonal perovskite", "Orthorhombic perovskite"]
)
]
data.rename(columns={"Lattice constant (Å)": "a (Å)"}, inplace=True)
data.drop(columns=["Ref."], inplace=True)
data["a (Å)"] = data["a (Å)"].map(float)
data = data[["Material", "Crystal structure", "a (Å)"]]
data = data[data["Material"] ! = "NC0.99"]In the code above: data["a (Å)"] = data["a (Å)"].map(float)This line of statement is to convert the string type to float type. However, the column Lattice constant (Å) (that is, the column a (Å)) in the first table in this url: https://en.wikipedia.org/wiki/Lattice_constant includes multiple string formats, such as 3.567, a = 3.533 c = 5.693, and a = 5.27 b = 5.275 c = 7.464. What should be done about the case where more than one value is included? If we map it to float type directly, then an error will occur. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
|
in the future, such questions are better asked on stackoverflow. you can pass any custom function to import re
import pandas as pd
data = pd.read_html("http://en.wikipedia.org/wiki/Lattice_constant")[0]
def extract_first_float(value: str) -> float:
try:
return float(value)
except ValueError:
match = re.search(r"[-+]?\d*\.\d+|\d+", value)
if match:
return float(match.group())
else:
return float("nan")
data["a (Å)"] = data["a (Å)"].map(extract_first_float) |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the reply, I will ask this kind of question on Stack Overflow in the future. |
Beta Was this translation helpful? Give feedback.
in the future, such questions are better asked on stackoverflow.
you can pass any custom function to
pd.DataFrame.map(). here's an example of converting the first value in each string. you could change the function to return tuples with all floats instead.