-
Notifications
You must be signed in to change notification settings - Fork 227
Open
Labels
Description
As discussed on Zulip, we would like to reduce the size of the mf_hecke_cc table.
- The table is currently one of the largest in the LMFDB, with 270GB of data.
- If we want to start using Zenodo to host backups of LMFDB tables, mf_hecke_cc surpasses the limit (which is 50GB, increased to 200GB upon request).
- While there isn't a hard limit on the total amount of data in the LMFDB, there are soft limits (operational costs for disks on google cloud, size of the fast disks on grace). Reducing the amount of storage spent on mf_hecke_cc would allow us to dedicate more to other priorities.
Here is the current status of the data:
- mf_hecke_cc contains floating point approximations for Fourier coefficients of embedded classical modular forms. This data enables embedded newform pages like 983.2.c.a.2.1.
- There are 1,141,510 newforms in the LMFDB; of these 859,545 are weight 2, trivial character, level larger than 10,000 and we have no embedded newform data.
- For the remaining 281,965 newforms, we store all embeddings, amounting to 14,417,694 embedded newforms. For each embedded newform, we store either 2000, 4000 or 6000 Fourier coefficients (I think the amount depends on the Sturm bound).
- Among these embedded newforms, 13,313,314 arise from newforms with dimension larger than 20 (the cutoff where we stop storing exact Fourier coefficients).
- The lfunc_search and lfunc_lfunctions tables contain 24,123,388 and 24,201,376 L-functions respectively. CMF L-functions make up 14,575,113 of these (of which 14,416,283 are primitive). FWIW, most of the remaining L-functions are from Dirichlet characters.
Here's the plan:
- Make an auxiliary dataset containing all of the current data. It should be possible to access smaller parts of the data rather than having to download a single giant file. I think the ability to specify a weight and level is probably sufficient.
- Update the CMF front end so that it only shows the first 100 an, and includes a link to the auxiliary dataset for users who want more coefficients.
- Change
mf_hecke_ccto only store an up tomax(100, trace_bound). If we want to save even more space, we could only save ap.
This also could be a good time to think about whether to add embedded newforms (and corresponding L-functions) for the modular forms of level larger than 10000 which have been added more recently. I don't think it's worth doing this in cases where the dimension is large, but maybe it's worth doing it for small dimension.